Java Forum / General / April 2006
regular expressions
2rajesh.b@gmail.com - 24 Apr 2006 13:50 GMT The password is at least six characters long.
The password contains characters from at least three of the following five categories: · English uppercase characters (A - Z) · English lowercase characters (a - z) · Base 10 digits (0 - 9) · Non-alphanumeric (for example: !, $, #, or %) · Unicode characters
can u please help me in writing a regular expression for the above condition
Rhino - 24 Apr 2006 14:11 GMT If you read this - http://java.sun.com/docs/books/tutorial/extra/regex/index.html - you should be able to do your own homework.
If that doesn't work, post your best guess to comp.lang.java.help and tell us what parts of the code don't work and someone will probably give you a hint about what you need to do differently.
-- Rhino
The password is at least six characters long.
The password contains characters from at least three of the following five categories: · English uppercase characters (A - Z) · English lowercase characters (a - z) · Base 10 digits (0 - 9) · Non-alphanumeric (for example: !, $, #, or %) · Unicode characters
can u please help me in writing a regular expression for the above condition
Larry Barowski - 24 Apr 2006 17:45 GMT > If you read this - > http://java.sun.com/docs/books/tutorial/extra/regex/index.html - you > should be able to do your own homework. I doubt that it's homework since there isn't a concise regular expression that fits the bill. That would be a pretty goofy homework question. Either way, I would suggest that a regular expression is not the best way to validate that password.
Roedy Green - 24 Apr 2006 18:37 GMT >The password contains characters from at least three of the following >five categories: [quoted text clipped - 6 lines] >can u please help me in writing a regular expression for the above >condition I think it would be easier to solve this with a char loop that with a regex. Regexes are about pattern character order. To you, order does not matter.
Proceed something like this:
invent an Category enum with values UPPERCASE LOWERCASE DIGITS PUNCT UNICODE
Write a method that categorises a char.
Now your code becomes:
int possibilities = Category.values().length; boolean present = new boolean[ possibilities ]; for ( int i=0; i<pwlen; i++ ) { char c = pw[i]; /* or pw.charAt(i)*/ Category cat = Category.categorise( c ); present[ cat.ordinal() ] = true; }
int cats = 0; for (int possibility; possibility<possibilities; possibility++) { if ( present[ possibility ] cats ++; }
if ( cats >= 3 ) System.out.println( "password sufficiently varied");
For code to generate random passwords, see http://mindprod.com/applets/password.html
You might find it easier to generate them that test them.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Oliver Wong - 25 Apr 2006 20:08 GMT > The password is at least six characters long. > [quoted text clipped - 8 lines] > can u please help me in writing a regular expression for the above > condition All of categories are mutually exclusive except for "Unicode characters". And any character that you can get in memory via a program written in Java is a "unicode character", so that last category seems pretty redundant. Perhaps you mean something like a character within Unicode, but outside of ASCII?
I'm asking you for clarification because it sounds like the above requirements were not dreamt up by you, and so you should in turn be asking whoever assigned you with this task for clarification.
- Oliver
Roedy Green - 25 Apr 2006 20:22 GMT > All of categories are mutually exclusive except for "Unicode >characters". And any character that you can get in memory via a program >written in Java is a "unicode character", so that last category seems pretty >redundant. Perhaps you mean something like a character within Unicode, but >outside of ASCII? I think that is what he meant, something like ó or ⇒ You just want to mix up the categories to foil a simple dictionary search.
You could do it pretty easily with a giant switch. Unfortunately switches don't implement ranges, so you have have to code that manually if you don't want to spell it out longhand. default handles the unicode. You might add control character category and reject such passwords. Putting whitespace on either end of a password is not a wise idea.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Oliver Wong - 25 Apr 2006 21:25 GMT >> All of categories are mutually exclusive except for "Unicode >>characters". And any character that you can get in memory via a program [quoted text clipped - 9 lines] > switches don't implement ranges, so you have have to code that > manually if you don't want to spell it out longhand. To test whether a given unicode character is outside of (or inside of, for that matter) ASCII, you could serialize it to ASCII, then re-read the ASCII data back into an in-memory Java string, and check if you still have the same original character that you started with. I believe what most ASCII encoders do for characters outside of ASCII is replace them with the '?' character.
> default handles > the unicode. You might add control character category and reject > such passwords. Putting whitespace on either end of a password is not > a wise idea. I suspect whitespace isn't that big of a problem, because any password validationg system which performs a trim() on the password before processing it is probably very poorly designed. Control characters (e.g. backspace, EOF, etc.) is probably a very bad idea, because different systems will handle them differently. Using outside-of-ASCII characters is also a bit risky for web based authentication, because one day you might be trying to access your site from a terminal which only supports ASCII. As Unicode support becomes more widespread, this will probably be less of an issue.
One particularly bad password system implementation is Microsoft's ".NET Passport" (which actually has very little to do with the .NET platform, to which C# usually compiles). When you create your passport account, your password is silently truncated to something like 12 or 14 characters; but when you validate your password, it doesn't get truncated.
So if I create a new account with the password "1234567890ABCDEF", the database will be updated to say that my password is "1234567890AB", but the website never mentions that truncation has occured. Then when I try to log on with the password "1234567890ABCDEF", it compares "1234567890ABCDEF" (what I wrote) against "1234567890AB" (what's in the DB), sees that they are not equal, and tell me that my password is incorrect.
It took me several days to figure out why my 20 character password wasn't working.
- Oliver
Chris Uppal - 26 Apr 2006 12:46 GMT > To test whether a given unicode character is outside of (or inside of, > for that matter) ASCII, you could serialize it to ASCII, then re-read the > ASCII data back into an in-memory Java string, and check if you still have > the same original character that you started with. What's wrong with just testing whether it's < 128 ?
> So if I create a new account with the password "1234567890ABCDEF", the > database will be updated to say that my password is "1234567890AB", but [quoted text clipped - 3 lines] > DB), sees that they are not equal, and tell me that my password is > incorrect. I think it was Sun who (inspired to a display of the very highest technical standards), mapped my user-id and password to lower case before entering them into the database, but didn't perform the same mapping when checking them later...
-- chris
Oliver Wong - 26 Apr 2006 14:46 GMT >> To test whether a given unicode character is outside of (or inside >> of, [quoted text clipped - 4 lines] > > What's wrong with just testing whether it's < 128 ? Erm, er... I was trying to write code that didn't depend on the internal encoding being UTF-16. Yeah, that's it. More robust and all that. I mean, what if in Java 7, they decide to switch to EBCDIC internally, huh?
- Oliver
Morten Alver - 26 Apr 2006 15:26 GMT >>> To test whether a given unicode character is outside of (or >>> inside of, [quoted text clipped - 10 lines] > that. I mean, what if in Java 7, they decide to switch to EBCDIC > internally, huh? You can also query a CharsetEncoder (which you can get from the newEncoder() method of a Charset) whether it can encode a char or a CharSequence, using the canEncode() method. This is useful in general for detecting whether the charset you are using supports all the characters you'd like to write.
-- Morten
Roedy Green - 26 Apr 2006 20:35 GMT > Erm, er... I was trying to write code that didn't depend on the internal >encoding being UTF-16. Yeah, that's it. More robust and all that. I mean, >what if in Java 7, they decide to switch to EBCDIC internally, huh? Surely the use of Unicode is cast in stone in the JLS somewhere. If they changed that encoding, thousands of programs would break because Java uses \uxxxx to encode literals.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Oliver Wong - 27 Apr 2006 17:26 GMT >> Erm, er... I was trying to write code that didn't depend on the internal >>encoding being UTF-16. Yeah, that's it. More robust and all that. I mean, [quoted text clipped - 3 lines] > they changed that encoding, thousands of programs would break because > Java uses \uxxxx to encode literals. The javac compiler could still accept input of the form \uxxxx, and translate to some sort of EBCDIC representation to be emitted to the classfiles. But yes, I believe somewhere in the JLS, Unicode is explicitly mentioned (though I'm too lazy to verify this right now).
- Oliver
Roedy Green - 27 Apr 2006 19:44 GMT > The javac compiler could still accept input of the form \uxxxx, and >translate to some sort of EBCDIC representation to be emitted to the >classfiles. But yes, I believe somewhere in the JLS, Unicode is explicitly >mentioned (though I'm too lazy to verify this right now). Java is carefully specifies the language so that the internal representation of anything is none of your business, and you can't find out by writing a program (e.g. they could use UTF-8 for strings for example). However, the Unicodeness is built into the language in that \uxxxx in the source code will come out with DataOutputStream write char to that same binary number, and that \u0xxx will map onto the right ASCII subset of Unicode to produce all the Java keywords.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 27 Apr 2006 13:11 GMT [me:]
> > What's wrong with just testing whether it's < 128 ? > > Erm, er... I was trying to write code that didn't depend on the > internal encoding being UTF-16. Yeah, that's it. More robust and all > that. ;-)
>I mean, what if in Java 7, they decide to switch to EBCDIC > internally, huh? Like the way they changed from "it's pure Unicode data without any encoding" to "Ha-ha! Fooled you! It's actually encoded as UTF-16"...
-- chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|