Hi,
I would like to use a regular expression in Java to read those lines
from a file which are not comments and do not start with whitespace.
Commented lines start with #
Currently with grep, I am using the command:
grep -E "^[^#\ \t]" myfile
to get the lines I want, but I am having problems converting this
regular expression for use in Java. I don't get any lines returned.
If I replace the above regular expression with ".*" in my Java code,
then all lines of myfile are returned, as you might expect, so it would
appear that the problem is only with the regular expression shown in the
above grep example.
Please can you help.
Thanks,
Jonny
Bastiaan - 12 Aug 2005 10:20 GMT
> Currently with grep, I am using the command:
>
[quoted text clipped - 7 lines]
> appear that the problem is only with the regular expression shown in the
> above grep example.
What does your Java code look like? You might have to escape the \'s and
other special characters, I myself am very new to Java but escaping of
characters is done in most programming languages.
Bastiaan
Jonny - 12 Aug 2005 19:30 GMT
> > Currently with grep, I am using the command:
> >
[quoted text clipped - 11 lines]
> other special characters, I myself am very new to Java but escaping of
> characters is done in most programming languages.
Thanks for your reply Bastiaan.
I know I have to include \\ for each \
See my replies to the other responses.
Regards,
Jonny
Hemal Pandya - 12 Aug 2005 10:43 GMT
> Hi,
>
[quoted text clipped - 8 lines]
> to get the lines I want, but I am having problems converting this
> regular expression for use in Java. I don't get any lines returned.
Java Matcher "Attempts to match the entire region against the pattern."
(from the javadocs). The search is anchored. So the only lines that
match the regular expression "^[^#\ \t]" are those that contain exactly
one character that is not pound, space or tab.
I am resisting the temptation to give you the exact pattern you are
looking for.
Jonny - 12 Aug 2005 19:34 GMT
> > I would like to use a regular expression in Java to read those lines
> > from a file which are not comments and do not start with whitespace.
[quoted text clipped - 11 lines]
> match the regular expression "^[^#\ \t]" are those that contain exactly
> one character that is not pound, space or tab.
Thanks for your reply, Hemal.
I understand what you have said, so I have to use ^ and $ in Java.
See my response to Mario's reply.
Regards,
Jonny
Hemal Pandya - 13 Aug 2005 02:52 GMT
> > > I would like to use a regular expression in Java to read those lines
> > > from a file which are not comments and do not start with whitespace.
[quoted text clipped - 15 lines]
>
> I understand what you have said, so I have to use ^ and $ in Java.
No, you do not need them, if you are comparing each line individually.
By anchored I meant that the pattern is interpreted as if it already
has ^ and $ around it.
You pattern matches only the first character of a line. It needs to
match the entire line.
> See my response to Mario's reply.
>
> Regards,
> Jonny
Mario Winterer - 12 Aug 2005 12:45 GMT
Hi!
Here's the code snippet that prints all lines that are not comments and do not start with whitespace.
The input is a String (more general: CharSequence) containing the entire file content. DO NOT USE FOR LARGE FILES!
/* BEGIN */
String testLine = "This\n is\na\n#test";
Pattern pattern = Pattern.compile("^[^\\s#].*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(testLine);
while (matcher.find()) {
String l = matcher.group();
System.out.println(l);
}
/* END */
In your case, it might be better to read lines using the BufferedReader's "readLine" method and just test if it starts with
whitespace or "#":
BufferedReader reader = new BufferedReader(new FileReader(yourFile));
try {
String line = null;
while ((line = reader.readLine()) != null) {
if (line.length() == 0) continue; /* skip line in case it is empty (is this correct?) */
char c = line.charAt(0);
if ((c == '#') || Character.isWhitespace(c)) continue;
System.out.println(line);
}
} finally {
reader.close();
}
Best regards,
Tex
> Hi,
>
[quoted text clipped - 18 lines]
> Thanks,
> Jonny
Jonny - 12 Aug 2005 19:40 GMT
> Here's the code snippet that prints all lines that are not comments and
> do not start with whitespace.
[quoted text clipped - 53 lines]
> >
> > Please can you help.
Thanks for a comprehensive response Mario. It is much appreciated.
I can see that I needed to use ^ and $, and also \\s for whitespace.
These were the two problems I was having.
Incidentally, the file I am reading is very small, so I used the
following code to read the file:
String fileAsString = new Scanner(new
File(myFile)).useDelimiter("\\A").next();
where myFile is the path to the file to be read.
Regards,
Jonny