You got pretty close with your first attempt. The problem is you're
mixing regex with java a java OR instead of sticking within the regex.
Pattern p1 = Pattern.compile("ABC|NOP|HIJ");
if you run this you'll see 3 matches. If you only want the first;
replace the "while" with a simple "if".
Regards,
Bart
> You got pretty close with your first attempt. The problem is you're
> mixing regex with java a java OR instead of sticking within the regex.
[quoted text clipped - 6 lines]
> Regards
> Bart
Thanks.
Consider this harder problem, which im not sure regular expression can
solve.
Imagine the string below. ignore the white space - it shouldnt be there but
i
deliberately put it there so you and others can see what im talking about.
I want to find the first instance of either of the three patterns (FHI,
HIJ,NOP).
But the first instance must be within a block of threes. so that removes
FHI,
leaving HIJ and NOP. Since HIJ comes before NOP, HIJ becomes the output.
String seq = "ABC DEF HIJ KLM NOP";
Pattern p1 = Pattern.compile("FHI|"HIJ"|"NOP");
Matcher m = p1.matcher(seq);
if ( m.find())
{
//do something
}
EXPECT:
HIJ
(1) I have tried putting * but that doesnt work
Pattern p1 = Pattern.compile("***FHI|"***HIJ"|"***NOP");
//compile error
(2) I have tried putting * but that doesnt work
Pattern p1 = Pattern.compile("*FHI|"*HIJ"|"*NOP");
//doesnt give me right output
I know im close like before.
help appeciated.
Cheers
ST
Bart Cremers - 10 Apr 2006 13:38 GMT
You could simply combine the regex operation with a simple modulo
operation on the start of the match. It works in your simple example
case, but might not work for more complex cases:
String seq = "ABCDEFHIJKLMNOP";
Pattern p1 = Pattern.compile("FHI|HIJ|NOP");
Matcher m = p1.matcher(seq);
int start = 0;
while (m.find(start)) {
System.out.printf("%3d - %s", m.start(),
seq.substring(m.start(), m.end()));
if (m.start() % 3 == 0) {
System.out.println(" -> OK");
// maybe break out here
} else {
System.out.println(" -> ignore");
}
start = m.start() + 1;
}
Bart
newsnet customer - 10 Apr 2006 14:40 GMT
> You could simply combine the regex operation with a simple modulo
> operation on the start of the match. It works in your simple example
[quoted text clipped - 20 lines]
>
> Bart
cheers Bart.
you have been really helpful.
If i can't get it to work without using the modulus.
that is, just using the regular expression then i will use your code.
ST
Jussi Piitulainen - 10 Apr 2006 14:49 GMT
> If i can't get it to work without using the modulus.
> that is, just using the regular expression [...]
The following prints the shortest prefix of triples in args[0] that
ends in one of FHI, HIJ and NOP.
Pattern p = Pattern.compile("(...)*?(FHI|HIJ|NOP)");
Matcher m = p.matcher(args[0]);
if (m.find()) {
System.out.println(m.group(0));
Gordon Beaton - 10 Apr 2006 13:49 GMT
> (1) I have tried putting * but that doesnt work
> Pattern p1 = Pattern.compile("***FHI|"***HIJ"|"***NOP");
[quoted text clipped - 4 lines]
>
> I know im close like before.
First, the whole regex must be a single string, enclosed between one
pair of quotation marks. Try to remember this. Neither of your
examples are even compilable.
Second, quantifiers (such as * and ?) can't be used on their own, they
must be preceded by a pattern to modify. So .* will match 0 or more
characters, while \s* will match 0 or more whitespace characters, etc.
Instead of guessing, read the regex documentation and think about what
you're trying to do.
If you have optional whitespace among the stuff you really want to
match, try something like this (untested):
"\\s*((F\\s*H\\s*I)|(H\\s*I\\s*J)|(N\\s*O\\s*P))\\s*"
/gordon

Signature
[ do not email me copies of your followups ]
g o r d o n + n e w s @ b a l d e r 1 3 . s e