Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2005

Tip: Looking for answers? Try searching our database.

how can this happen?

Thread view: 
jahhaj - 18 Nov 2005 12:53 GMT
Here's the message I get from a PatternSyntaxException

Unknown character category {Digit} near index 8
\p{Digit}{1,2}
       ^

How can this be? {Digit} is a valid character category, it's in the
javadoc, it's even in the source code. (Incidentally the single \ is
how java reports the error, in the source I have "\\p{Digit}{1,2}")

I'm using java 1.4.2_06 and running under BEA Weblogic 8.1

john
Robert Klemme - 18 Nov 2005 13:13 GMT
> Here's the message I get from a PatternSyntaxException
>
[quoted text clipped - 7 lines]
>
> I'm using java 1.4.2_06 and running under BEA Weblogic 8.1

Works for me.  Also 1.4.2._06, OS is Windows 2k Server, no app server.

   robert
jahhaj - 18 Nov 2005 13:47 GMT
> > Here's the message I get from a PatternSyntaxException
> >
[quoted text clipped - 11 lines]
>
>     robert

Works for me as well when I run as a standalone Java app, baffling.

john
Roedy Green - 18 Nov 2005 15:32 GMT
>Unknown character category {Digit} near index 8
>\p{Digit}{1,2}

you did not post your code so I wrote this SSCCE

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* snippet ™ to demonstrate a problem with regex
*/
public class Regex4
  {
  private static final Pattern p =
Pattern.compile("(\\p{Digit}){1,2}");

  /**
   * test harness
   *
   * @param args not used
   */
  public static void main ( String[] args )
     {

     // format 1
     Matcher m = p.matcher("89");

     m.matches();
     int count = m.groupCount() + 1;

     // display groups found
     for ( int i=0; i<count; i++ )
        {
        System.out.println(m.group(i));
        }

     }
  }

When I ran it on JDK 1.5.0_05
it gave the following results:
89
9
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

jahhaj - 18 Nov 2005 15:50 GMT
> >Unknown character category {Digit} near index 8
> >\p{Digit}{1,2}
>
> you did not post your code so I wrote this SSCCE

My real code is a few lines inside a large J2EE application. I know
that if I extract the code and run it in a different environment then
it will work fine. My interest is in suggestions for what could
possibily be going wrong for the JVM not to recognise a perfectly
standard character category.

If you look at the source for Pattern then the character categories are
looked up in a simple map, in a single place in the code. How could
this go wrong? That's my question.
Robert Klemme - 18 Nov 2005 16:09 GMT
>>> Unknown character category {Digit} near index 8
>>> \p{Digit}{1,2}
[quoted text clipped - 10 lines]
> are looked up in a simple map, in a single place in the code. How
> could this go wrong? That's my question.

Maybe some wired threading or class loading issue...  Just a wild guess.

   robert
jahhaj - 18 Nov 2005 17:01 GMT
> > If you look at the source for Pattern then the character categories
> > are looked up in a simple map, in a single place in the code. How
[quoted text clipped - 3 lines]
>
>     robert

Hmm, I'm no java expert but if you look at the code in Pattern you see
this

   private Node retrieveCategoryNode(String name) {
       if (categories == null) {
           int cns = categoryNodes.length;
           categories = new HashMap((int)(cns/.75) + 1);
           for (int x=0; x<cns; x++)
               categories.put(categoryNames[x], categoryNodes[x]);
       }
       Node n = (Node)categories.get(name);
       if (n != null)
           return n;

       return familyError(name, "Unknown character category {");
   }

categories is a HashMap of the known categories. It's a static member.
The thing that strikes me is that the creation of the map is not
synchronised, so is it possible that one thread could be in the process
of populating the categories when another thread comes along and uses
the part populated map?

As I say, I'm no expert in java. Could someone with more expertise
confirm if this is plausible?
Chris Uppal - 18 Nov 2005 17:43 GMT
> Hmm, I'm no java expert but if you look at the code in Pattern you see
> this
[quoted text clipped - 14 lines]
>
> categories is a HashMap of the known categories. It's a static member.

Ugh!  Unless there's something subtle that I've missed, that code is completely
broken.  It isn't even /nearly/ right (it could at least wait unless the new
HashMap was populated before assigning it to the 'categories' variable -- which
would still be technically incorrect).

That code has been completely replaced in 1.5.0 by something that /is/ correct
(I think).

Can you force the Pattern initalisation to happen early (before any of your
real threads are running) by compiling a throwaway Regex during some sort of
system initialisation phase ?

   -- chris
Roedy Green - 18 Nov 2005 20:30 GMT
>As I say, I'm no expert in java. Could someone with more expertise
>confirm if this is plausible?

Try some code that "warms up" the Pattern class. You might even sleep.

Pattern dummy  Pattern.compile("a");
 
Pattern p = Pattern.compile("(\\p{Digit}){1,2}");

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 18 Nov 2005 20:19 GMT
>> If you look at the source for Pattern then the character categories
>> are looked up in a simple map, in a single place in the code. How
>> could this go wrong? That's my question.

did Bea reimplement Regex for speed and simply failed to test
adequately.
If you can get my code into Bea and get it to fail, you can submit it
as a bug report.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Chris Uppal - 18 Nov 2005 16:52 GMT
> My real code is a few lines inside a large J2EE application. I know
> that if I extract the code and run it in a different environment then
[quoted text clipped - 5 lines]
> looked up in a simple map, in a single place in the code. How could
> this go wrong? That's my question.

The only thing I can think of is that your code is somehow picking up a
different implementation of Pattern when it's runing in your J2EE environment.
Might be worth scanning all the directories, JARs, etc, to see if there are any
candidates for confusion.

   -- chris
Thomas G. Marshall - 18 Nov 2005 17:51 GMT
jahhaj coughed up:

>>> Unknown character category {Digit} near index 8
>>> \p{Digit}{1,2}
[quoted text clipped - 4 lines]
> that if I extract the code and run it in a different environment then
> it will work fine.

Two ideas pulled out of someplace fairly dark:

1. Don't run it in a different environment.  Extract it and keep it as much
as possible in the /same/ environment.

2. Don't "extract" it at all.  Instead /pair down/ the problem code as much
as you can, possibly by putting in the testing code around the issue, and
keep testing until you remove something and see the problem go away.

This is a technique that works very well to expose many things.  Even if
your paired down version ends up looking just like the extracted version you
already attempted, there might be a smidgeon of a detail missing that will
illuminate the problem.

I hope this applies to your issue.  YMM(ofcourse)V.

> My interest is in suggestions for what could
> possibily be going wrong for the JVM not to recognise a perfectly
[quoted text clipped - 3 lines]
> looked up in a simple map, in a single place in the code. How could
> this go wrong? That's my question.

Signature

Onedoctortoanother:"Ifthisismyrectalthermometer,wherethehell'smypen???"

Ravi - 21 Nov 2005 06:42 GMT
hi
u can use "\\d{1,2}"
jahhaj - 21 Nov 2005 09:46 GMT
> Here's the message I get from a PatternSyntaxException
>
> Unknown character category {Digit} near index 8
> \p{Digit}{1,2}
>         ^

Thanks to everyone who replied to my query. Turns out this is a known
bug

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6238699

Really, really poor coding by Sun.

john
Roedy Green - 21 Nov 2005 11:37 GMT
>Thanks to everyone who replied to my query. Turns out this is a known
>bug
>
>http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6238699
>
>Really, really poor coding by Sun.

So the bug is fixed is JDK 1.5 but not in Bea?
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.