
Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
> >Unknown character category {Digit} near index 8
> >\p{Digit}{1,2}
>
> you did not post your code so I wrote this SSCCE
My real code is a few lines inside a large J2EE application. I know
that if I extract the code and run it in a different environment then
it will work fine. My interest is in suggestions for what could
possibily be going wrong for the JVM not to recognise a perfectly
standard character category.
If you look at the source for Pattern then the character categories are
looked up in a simple map, in a single place in the code. How could
this go wrong? That's my question.
Robert Klemme - 18 Nov 2005 16:09 GMT
>>> Unknown character category {Digit} near index 8
>>> \p{Digit}{1,2}
[quoted text clipped - 10 lines]
> are looked up in a simple map, in a single place in the code. How
> could this go wrong? That's my question.
Maybe some wired threading or class loading issue... Just a wild guess.
robert
jahhaj - 18 Nov 2005 17:01 GMT
> > If you look at the source for Pattern then the character categories
> > are looked up in a simple map, in a single place in the code. How
[quoted text clipped - 3 lines]
>
> robert
Hmm, I'm no java expert but if you look at the code in Pattern you see
this
private Node retrieveCategoryNode(String name) {
if (categories == null) {
int cns = categoryNodes.length;
categories = new HashMap((int)(cns/.75) + 1);
for (int x=0; x<cns; x++)
categories.put(categoryNames[x], categoryNodes[x]);
}
Node n = (Node)categories.get(name);
if (n != null)
return n;
return familyError(name, "Unknown character category {");
}
categories is a HashMap of the known categories. It's a static member.
The thing that strikes me is that the creation of the map is not
synchronised, so is it possible that one thread could be in the process
of populating the categories when another thread comes along and uses
the part populated map?
As I say, I'm no expert in java. Could someone with more expertise
confirm if this is plausible?
Chris Uppal - 18 Nov 2005 17:43 GMT
> Hmm, I'm no java expert but if you look at the code in Pattern you see
> this
[quoted text clipped - 14 lines]
>
> categories is a HashMap of the known categories. It's a static member.
Ugh! Unless there's something subtle that I've missed, that code is completely
broken. It isn't even /nearly/ right (it could at least wait unless the new
HashMap was populated before assigning it to the 'categories' variable -- which
would still be technically incorrect).
That code has been completely replaced in 1.5.0 by something that /is/ correct
(I think).
Can you force the Pattern initalisation to happen early (before any of your
real threads are running) by compiling a throwaway Regex during some sort of
system initialisation phase ?
-- chris
Roedy Green - 18 Nov 2005 20:30 GMT
>As I say, I'm no expert in java. Could someone with more expertise
>confirm if this is plausible?
Try some code that "warms up" the Pattern class. You might even sleep.
Pattern dummy Pattern.compile("a");
Pattern p = Pattern.compile("(\\p{Digit}){1,2}");

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 18 Nov 2005 20:19 GMT
>> If you look at the source for Pattern then the character categories
>> are looked up in a simple map, in a single place in the code. How
>> could this go wrong? That's my question.
did Bea reimplement Regex for speed and simply failed to test
adequately.
If you can get my code into Bea and get it to fail, you can submit it
as a bug report.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 18 Nov 2005 16:52 GMT
> My real code is a few lines inside a large J2EE application. I know
> that if I extract the code and run it in a different environment then
[quoted text clipped - 5 lines]
> looked up in a simple map, in a single place in the code. How could
> this go wrong? That's my question.
The only thing I can think of is that your code is somehow picking up a
different implementation of Pattern when it's runing in your J2EE environment.
Might be worth scanning all the directories, JARs, etc, to see if there are any
candidates for confusion.
-- chris
Thomas G. Marshall - 18 Nov 2005 17:51 GMT
jahhaj coughed up:
>>> Unknown character category {Digit} near index 8
>>> \p{Digit}{1,2}
[quoted text clipped - 4 lines]
> that if I extract the code and run it in a different environment then
> it will work fine.
Two ideas pulled out of someplace fairly dark:
1. Don't run it in a different environment. Extract it and keep it as much
as possible in the /same/ environment.
2. Don't "extract" it at all. Instead /pair down/ the problem code as much
as you can, possibly by putting in the testing code around the issue, and
keep testing until you remove something and see the problem go away.
This is a technique that works very well to expose many things. Even if
your paired down version ends up looking just like the extracted version you
already attempted, there might be a smidgeon of a detail missing that will
illuminate the problem.
I hope this applies to your issue. YMM(ofcourse)V.
> My interest is in suggestions for what could
> possibily be going wrong for the JVM not to recognise a perfectly
[quoted text clipped - 3 lines]
> looked up in a simple map, in a single place in the code. How could
> this go wrong? That's my question.

Signature
Onedoctortoanother:"Ifthisismyrectalthermometer,wherethehell'smypen???"