Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / December 2007

Tip: Looking for answers? Try searching our database.

RegExp Group Headache

Thread view: 
Alan - 11 Dec 2007 01:00 GMT
I do not understand why the regular expression string in the code
below is giving me lines of text and not paragraphs.  I am trying to
get the start and end of the whole, repeated pattern.

   The output I am getting is:

Chunk: one

What I was expecting was:

Chunk: one
two
three

    Can anyone explain this to me?     Thank you, Alan

import java.util.regex.*;

public class TextProcessor
{

public static void main(String[] args)
{
 String TestString = "one \n two \n three \n\n Another Paragraph";
 System.out.println ( "Chunk: " + getChunk ( TestString, 0 ) );
 System.out.println ("\n");
}

private static final Pattern PARA_PATTERN = Pattern.compile("(^.*\\S+.*
$)+", Pattern.MULTILINE);

public static String getChunk ( String InputString, int
StartPosition )
{
 String OutputString = "";

 Matcher matcher = PARA_PATTERN.matcher ( InputString );
 try
 {
   if ( matcher.find ( StartPosition ) )
   {
     OutputString = InputString.substring(matcher.start(),
matcher.end());
   }
}
catch ( IndexOutOfBoundsException e ) { e.printStackTrace();}
catch ( IllegalStateException     e ) { e.printStackTrace();}

return OutputString;
}

}
Ingo Menger - 11 Dec 2007 11:07 GMT
>    I do not understand why the regular expression string in the code
> below is giving me lines of text and not paragraphs.  I am trying to
[quoted text clipped - 27 lines]
> private static final Pattern PARA_PATTERN = Pattern.compile("(^.*\\S+.*
> $)+", Pattern.MULTILINE);

First, let me say that you gave a very nice problem description.
Therefore, I'll try to answer your question.
The answer "one" makes perfect sense, since you wanted a string that
- consists of one or more substrings that
  - start at the beginning of the string OR just after a newline
  - contain zero or more characters, followed by one or more nonspace
characters, followed by zero or more characters
  - end just before a newline OR the end of the string

Your pattern matches only once, because the anchors ^ and $, according
to the docs, match just before or just after the newline. But the
newline itself is not matched.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.