Hello,
I have a text file like that:
2005-10-17;AXC dfgh k;29,26;275 2005-10-17;KLCACM Rfhekksn Allerg
FGH;9,65;434 2005-10-17;TYhdkdkj F12;50,5;276 2005-10-17
I'd like to extarct the values like that:
2005-10-17
AXC dfgh k
29,26
275
2005-10-17
KLCACM Rfhekksn Allerg FGH
but the code below only produces:
Found a match: 2005-10-17;AXC
g1: 2005-10-17
g2:
Found a match: 2005-10-17;KLCACM
g1: 2005-10-17
g2:
Found a match: 2005-10-17;TYhdkdkj
g1: 2005-10-17
g2:
Any idee on how I could achieve this? i.e. a record in the file is
<date>;<name>;<value>;<value><space>
<date>; and so on...
code:
String regex = "([0-9]{4}-[0-9]{2}-[0-9]{2});(\\w*)*";
String targetString = "2005-10-17;AXC dfgh k;29,26;275
2005-10-17;KLCACM Rfhekksn Allerg FGH;9,65;434 2005-10-17;TYhdkdkj
F12;50,5;276 2005-10-17";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(targetString);
while (matcher.find()) {
System.out.println("Found a match: " + matcher.group(0) +
"\ng1: " + matcher.group(1) +
"\ng2: " + matcher.group(2)
//+"\ng3: " + matcher.group(3)
);
}
Roedy Green - 20 Oct 2005 15:27 GMT
>Any idee on how I could achieve this? i.e. a record in the file is
><date>;<name>;<value>;<value><space>
><date>; and so on...
You could do it with CSVReader telling it the separator is a ;.
see http://mindprod.com/products1.html#CSV

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.
carlos@gkpwdun.com - 20 Oct 2005 15:36 GMT
>I have a text file like that:
>2005-10-17;AXC dfgh k;29,26;275 2005-10-17;KLCACM Rfhekksn Allerg
[quoted text clipped - 7 lines]
>2005-10-17
>KLCACM Rfhekksn Allerg FGH
String data = "2005-10-17;AXC dfgh k;29,26;275 2005-10-17;" +
"KLCACM Rfhekksn Allerg FGH;9,65;434 2005-10-17;" +
"TYhdkdkj F12;50,5;276 2005-10-17";
String[] dataArray = data.split(";");
for (int i = 0 ; i < dataArray.length ; i++)
{
System.err.println(dataArray[i]);
}
gbgkille69@spray.se - 21 Oct 2005 06:20 GMT
many thanks but that won't work as the record ends with a number and a
space and not a ;
with split the output would be:
29,26
275 2005-10-17
KLCACM...
or in other words every record starts with a date. I'd like to use a
regexp inorder to keep the program simple, i.e. so that I can passa a
regexp as one single input paramter. furthermore the regexp will do a
syntx check for me which split wouldn't do.
any idea?
Roedy Green - 21 Oct 2005 08:18 GMT
>or in other words every record starts with a date. I'd like to use a
>regexp inorder to keep the program simple, i.e. so that I can passa a
>regexp as one single input paramter. furthermore the regexp will do a
>syntx check for me which split wouldn't do.
you can take it apart yourself with repeated indexOfs with only a few
lines of code. I am presuming there are no fields with embedded ;
with some quoting convention.
You can also take it apart char by char with a finite state automaton.
See http://mindprod.com/jgloss/finitestate.html
If you were of the habit of using hammers to kill mosquitoes, you
could write a parser. See http://mindprod.com/jgloss/parser.html

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.
carlos@gkpwdun.com - 21 Oct 2005 11:32 GMT
>many thanks but that won't work as the record ends with a number and a
>space and not a ;
[quoted text clipped - 8 lines]
>regexp as one single input paramter. furthermore the regexp will do a
>syntx check for me which split wouldn't do.
Ok, so ' 2005-' could be used as delimiter.
I came up with targetString.split(" [0-9]{4}-"); to match
this, but unfortunately, this expression does not include the
delimiter, so the year gets lost.
Does anyone know how to alter this expression in such a
way that it also includes the delimiter ?
wang - 21 Oct 2005 11:37 GMT
use java.util.StringTokenizer.
Alan Krueger - 23 Oct 2005 00:08 GMT
> use java.util.StringTokenizer.
No, you don't want to do that. In the usual case, StringTokenizer
treats multiple occurrences of a delimiter as a single token break. In
a delimited format like the OP described, this may break if any of the
records can be an empty string.
Alan Krueger - 23 Oct 2005 00:09 GMT
> Ok, so ' 2005-' could be used as delimiter.
That will work for a couple of months, if that.
Nigel Wade - 21 Oct 2005 11:32 GMT
> Hello,
>
[quoted text clipped - 42 lines]
> );
> }
Is the content of the file entirely contained within one line? Or are there
multiple records per line, or one record per line?
It would make more sense to me to split the file/string into individual records,
then process each record. This should be much simpler to handle, and the code
should be easier to understand. After creating a set of records, you can split
each one into fields with the ";" field separator.

Signature
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : nmw@ion.le.ac.uk
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
Roedy Green - 22 Oct 2005 02:00 GMT
>Is the content of the file entirely contained within one line? Or are there
>multiple records per line, or one record per line?
[quoted text clipped - 3 lines]
>should be easier to understand. After creating a set of records, you can split
>each one into fields with the ";" field separator.
Don't be afraid to use some custom logic to handle the stuff awkward
in regex and get the regex to do only what it does naturally.
Similarly, don't be afraid to use two or more regexes, one to find the
big pattern and others to take that pattern apart rather than trying
to do it all in one does-everything-but-eat regex.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.