I do not understand why the regular expression string in the code
below is giving me lines of text and not paragraphs. I am trying to
get the start and end of the whole, repeated pattern.
The output I am getting is:
Chunk: one
What I was expecting was:
Chunk: one
two
three
Can anyone explain this to me? Thank you, Alan
import java.util.regex.*;
public class TextProcessor
{
public static void main(String[] args)
{
String TestString = "one \n two \n three \n\n Another Paragraph";
System.out.println ( "Chunk: " + getChunk ( TestString, 0 ) );
System.out.println ("\n");
}
private static final Pattern PARA_PATTERN = Pattern.compile("(^.*\\S+.*
$)+", Pattern.MULTILINE);
public static String getChunk ( String InputString, int
StartPosition )
{
String OutputString = "";
Matcher matcher = PARA_PATTERN.matcher ( InputString );
try
{
if ( matcher.find ( StartPosition ) )
{
OutputString = InputString.substring(matcher.start(),
matcher.end());
}
}
catch ( IndexOutOfBoundsException e ) { e.printStackTrace();}
catch ( IllegalStateException e ) { e.printStackTrace();}
return OutputString;
}
}
Ingo Menger - 11 Dec 2007 11:07 GMT
> I do not understand why the regular expression string in the code
> below is giving me lines of text and not paragraphs. I am trying to
[quoted text clipped - 27 lines]
> private static final Pattern PARA_PATTERN = Pattern.compile("(^.*\\S+.*
> $)+", Pattern.MULTILINE);
First, let me say that you gave a very nice problem description.
Therefore, I'll try to answer your question.
The answer "one" makes perfect sense, since you wanted a string that
- consists of one or more substrings that
- start at the beginning of the string OR just after a newline
- contain zero or more characters, followed by one or more nonspace
characters, followed by zero or more characters
- end just before a newline OR the end of the string
Your pattern matches only once, because the anchors ^ and $, according
to the docs, match just before or just after the newline. But the
newline itself is not matched.