Java Forum / General / March 2006
Parsing "February 24th, 2006" to java.util.Date
stevengarcia@yahoo.com - 28 Mar 2006 19:40 GMT how does one write a SimpleDateFormat pattern to take into account the "th" or the "nd" that might be present on any date?
March 1st, 2006 March 2nd, 2006 March 3rd, 2006 March 4th, 2006
I'm not sure how to write a mask that can take into acct "st", "nd", "rd", "th".
Thanks for your help.
Dave Mandelin - 28 Mar 2006 20:08 GMT I don't think SimpleDate format can do it. I'd use a regexp to remove those characters.
-- Need to get from a Foo object to a Bar object in Java? Ask Prospector: http://snobol.cs.berkeley.edu Want to play tabletop RPGs over the internet? Check out Koboldsoft RPZen: http://www.koboldsoft.com
stevengarcia@yahoo.com - 28 Mar 2006 22:46 GMT Anyone other takers?
Roedy Green - 28 Mar 2006 23:25 GMT >Anyone other takers? // Parsing a Date of the form: "February 24th, 2006"
import java.text.DecimalFormat; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date;
public class ParseDate { private static final SimpleDateFormat pattern = new SimpleDateFormat( "MMM dd'th', yyyy" );
/** * test harness * * @param args not used */ public static void main ( String[] args ) {
String dateString = "February 24th, 2006"; int where; if ( (where = dateString.indexOf( "st," ) ) >= 0 ) { dateString = dateString.substring( 0, where) + "th," + dateString.substring( where + 3 ); } else if ( (where = dateString.indexOf( "nd," ) ) >= 0 ) { dateString = dateString.substring( 0, where) + "th," + dateString.substring( where + 3 ); } Date d = null; try { d = pattern.parse( dateString ); } catch ( ParseException e ) { System.err.println( "oops:" + dateString ); }
System.out.println( d );
} }
With JDK 1.5 you could use String.replace( "nd," ,"th," );
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
James McGill - 29 Mar 2006 01:23 GMT > >Anyone other takers?
> With JDK 1.5 you could use String.replace( "nd," ,"th," ); Localizing it to handle e.g., "-ieme, -ere", or "-zig"... seems like there's a case to be made for I18n-ized ordinal number parsing... Hard-coding strings for "-st", "-nd", "-rd", "-th" just smells bad, in a language that puts such emphasis on i18n.
Oliver Wong - 29 Mar 2006 21:40 GMT >> >Anyone other takers? > [quoted text clipped - 4 lines] > Hard-coding strings for "-st", "-nd", "-rd", "-th" just smells bad, in a > language that puts such emphasis on i18n. In some languages, the entire word is changed when going from number to ordinal, rather than just having a suffix added. It's like how the word "one" changes to "first" in English (note that the two words have zero letters in common).
So yeah, this is a non-trivial problem, and it'd probably be a great boon to programmers if a standardized i18n API call existed for this. But the syntax wouldn't be as simple as "MMM dd[ordinal-suffix], yyyy", but rather, something like "MMM [pure-ordinal-or-number-followed-by-ordinal-suffix], yyyy".
- Oliver
Twisted - 29 Mar 2006 22:18 GMT Even though "first" is utterly different from "one", "1st" is just "1" with a suffix.
RFE: add getSuffixFor(int) and getWordFor(int) to Locale? Typically, there'll be some special cases for small enough integers (and an illegal argument exception if argument <= 0?) and a simple algorithm for larger integers. (In the case of English, starting at 20.) The English algorithm for suffixes is especially simple, as it's just
if (arg > 10 && arg < 14) return "th"; switch (arg%10) { case 1: return "st"; case 2: return "nd"; case 3: return "rd"; default: return "th"; }
(The only special cases are 11th, 12th, and 13th instead of 11st, 12nd, and 13rd.)
The word one in English is similar -- you special-case 11, 12, and 13 ("eleventh, twelfth, thirteenth" -- note you can't just add "th" or you get "twelveth" for 12), and for the rest, you turn the LSD into an ending "-first", "-second", "-third", or "-" + number's name + "th", and the remaining digits into a beginning, e.g. "three hundred and seventy", generating e.g. "three hundred and seventh-sixth".
Doing this for other languages is left as an exercise for the reader.
:) -- I am the terror that flaps in the net! I am the leaky faucet in the kitchen of crime! I am TWISTED!
Oliver Wong - 29 Mar 2006 22:32 GMT > Even though "first" is utterly different from "one", "1st" is just "1" > with a suffix. Yeah, my point was that English (and most Latin/European languages) have this "feature" that you can add a suffix to a arabic numeral (e.g. '1', '2', '3') to turn them into ordinals (e.g. '1st', '2nd', '3rd'), but this is not true for ALL languages.
Then I tried to give an analogy, but took an example from English. Admittedly, that might be confusing, but I felt if I had used any other language, I could not expect most readers here to relate to the example.
> RFE: add getSuffixFor(int) and getWordFor(int) to Locale? Would getSuffixFor() return null, or throw an exception, for a Locale for which these concepts of suffix don't exist?
Also, with "getWordFor(int)", there exists some languages where "the word for a number" changes depending on what you are counting. For example, in French, you might say "un homme" to mean "one man", but "une femme" to mean "one woman". The word varies depending on the gender of the thing you are counting. In Japanese, you vary the word depending on whether you're counting something round, something flat, something pointy, etc.
- Oliver
James McGill - 29 Mar 2006 22:40 GMT > Also, with "getWordFor(int)", there exists some languages where > "the [quoted text clipped - 7 lines] > you're > counting something round, something flat, something pointy, etc. Yes, this is the kind of stuff I think about whenever I notice that people believe we've reached some sort of plateau in technology. We have MUCH further left to go than we've come. I hope the comfortable equilibrium compromise we're in right now doesn't destroy us with complacency.
Chris Uppal - 30 Mar 2006 10:23 GMT > Yeah, my point was that English (and most Latin/European languages) > have this "feature" that you can add a suffix to a arabic numeral (e.g. > '1', '2', '3') to turn them into ordinals (e.g. '1st', '2nd', '3rd'), but > this is not true for ALL languages. And even in English the pattern isn't uniform. I would feel very odd talking about the "thousand and first dalmatian" -- at some point (at least in British English) the pattern reverts to "zillion-and-oneth".
But for the case in question -- where we are talking about number names for days in a month -- I don't see why the whole lot can't be hard-wired into the language/calendar-specific localisation.
Maybe there are languages where the number names for days don't follow a (feasibly) computable pattern, and don't fit into a table-driven approach either, but they must surely be in the tiny minority.
-- chris
Twisted - 31 Mar 2006 06:45 GMT Bloody hell. Might as well just go ahead and solve the NLP first then.
-- I am the terror that flaps in the net! I am the bent prong on the power cable of crime! I am TWISTED!
Roedy Green - 29 Mar 2006 22:19 GMT > So yeah, this is a non-trivial problem, and it'd probably be a great >boon to programmers if a standardized i18n API call existed for this. But >the syntax wouldn't be as simple as "MMM dd[ordinal-suffix], yyyy", but >rather, something like "MMM >[pure-ordinal-or-number-followed-by-ordinal-suffix], yyyy". this is related to the problem of expressing numbers in words.
See http://mindprod.com/applets/inwords.html
It handles English ordinals in words.
Ordinals are used much less frequently than I remember them being used as a child. Perhaps the irregularity discouraged their use in computers.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Twisted - 29 Mar 2006 00:48 GMT S'funny you should pick my 30th birthday as your example date (in the Subject)...
-- I am the terror that flaps in the net! I am the broken software with the awful user interface that the boss forces everyone to use! I am TWISTED!
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|