Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

A big problem about regular expression

Thread view: 
我爱自由 - 11 Apr 2006 03:05 GMT
Now i need to match a string in a text file(actually a stored procedure
file), the code is like below:
private static void test1()
   {
       String regex =
"(-{128})(\\s*\\r\\n\\s*\\r\\n)(-{2})(\\s*)ADD(\\s*)YOUR(\\s*)CODE(\\s*)HERE\\r\\n((.|\\r|\\n)*)(-{2})(\\s*)END(\\s*)OF(\\s*)YOUR(\\s*)CODE\\r\\n\\r\\n(-{128})";
       String text = readTextFromFile("C:\\test.txt");// read the text
file into a string

       Pattern pattern = Pattern.compile(regex);
       Matcher matcher = pattern.matcher(text);

       if(matcher.find())
       {
           System.out.println("matches");
       }
       else
       {
           System.out.println("Not match");
       }
       BufferedReader input = new BufferedReader(new
InputStreamReader(System.in));
       try
       {
           input.readLine();
       }
       catch(IOException ex)
       {

       }

   }
and unfortunately, it fails because of stack overflow, i guess jdk
mathes the regular expression in a recursive way, so when the regex is
some complicated, the stack overflows, is that right? and can someone
give me a explanation and help me solve this problem? thanks
hiwa - 11 Apr 2006 04:15 GMT
Trying with a simple text string like "asdfghjjkl" doesn't get stack
overflow.
However, regarding your current regex string, I have some questions:
(1)Why so many capturing groups?
(2)\s includes \r and \n ... then what to do?
(3)Why not use DOTALL mode?
(4)Simple dot matches any characters in greedy mode, then what to do?
--- ((.|\\r|\\n)*)   ---- after this
我爱自由 - 13 Apr 2006 05:53 GMT
actually, it's a jdk design problem, and somebody has fired a bug in
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4675952
when a regex includes a pattern like "(a|b)*", the StackOverflowError
will occur.
however, since sun doesn't to consider to fix it, i have changed the
pattern as
:(DECLARE\s*@object\s*int\s*--declare\s*the\s*object\s*variable)([\w\W]*)(CAST\(@error\s*as\s*nvarchar\(200\)\))[\r\n\s]*(END);and
it works

(1) i want to match the "DECLARE @object int
--declare the object variable.
.................................
.................................
CAST @error as nvarchar(200))
END
"
(2) it's my fault, i have though that \s cannot match \r and \n
(3) Maybe it's ok to use dot to match all chars.
(4) i just want to match any char groups, and now i alternate it to
(\w\W)*


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.