I have complex multi-line string to parse, so I created a complex
regular expression by combining a bunch of simpler regular
expressions, like this:
private static final String WS = " +";
private static final String EOL = " *\n";
private static final String REST_OF_LINE = ".*\n";
private static final String REST_OF_BLOCK = REST_OF_LINE + "(?:" +
WS + REST_OF_LINE + ")*";
private static final String AMOUNT = "\\d+\\.\\d+";
private static final String CURRENCY = "[A-Z]{3}" + AMOUNT;
private static final String FARE = "[A-Z]{3} +\\d*" + EOL
+ WS + CURRENCY + " +" + CURRENCY + EOL
+ WS + AMOUNT + REST_OF_LINE
+ WS + AMOUNT + "[A-Z]*" + EOL
+ " {7}" + REST_OF_LINE;
...
private static final java.util.regex.Pattern PAT =
Pattern.compile( ... );
This works great to recognize valid input, but extracting the data
parsed is not so easy. I wanted to capture it all with capturing
groups, but I ran into two problems: first, the Matcher only stores
the last match for each group,
and second, the groups have to be accessed by index, which would
require keeping track of them in the whole expression.
Is there a more powerful regular expression class out there somewhere,
or a more powerful parsing technology that would help with this
problem? It would be a trivial matter in either Perl (by attaching
code to the sub-expressions) or in C++ (using the SPIRIT parsing
library), but in Java I'm pretty clueless.
Thanks for the help.
Kai Schwebke - 25 Apr 2007 03:51 GMT
kevin cline schrieb:
> I have complex multi-line string to parse, so I created a complex
> regular expression by combining a bunch of simpler regular
> expressions, like this:
...
> Is there a more powerful regular expression class out there somewhere,
> or a more powerful parsing technology that would help with this
> problem?
You may have a look at javacc, a parser generater for Java like
yacc or bison for C (https://javacc.dev.java.net/).
Kai