Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2007

Tip: Looking for answers? Try searching our database.

newbie Java regexp question

Thread view: 
mitchmcc@yahoo.com - 02 Jul 2007 19:31 GMT
Below is a small test program I wrote to try and
do a simple parse of an XML expression, where I
can extract the tag(s) and the data on a single
line.  Yes, I know about the other ways to parse
real XML, but I am trying to learn Java only.  My
test case is very simple (see below).  The problem
seems to be something tricky about the fact that
I am reading the input from the console.

I have tried the regexp in all of the following forms:

    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

In Windows cmd.exe, none of these match when I enter

    <t1>foo</t1>

as standard input.

Any advice would be greatly appreciated.

Mitch

-----------------------------------------------------------------------------------------------

import java.io.*;
import java.net.*;
import java.util.regex.*;

public class test {
   public static void main(String[] args) throws IOException {

   PrintWriter out = null;
   BufferedReader stdIn = null;
    String server = "";
    String userInput;

   stdIn = new BufferedReader(new InputStreamReader(System.in));

   // read arguments
    if(args.length == 1) {
       server = args[0];
    } else {
       System.out.println("no args");
    }

// this one works, but is not really what I want
//    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)<(\\S+)>");

// this one is the correct one that won't match unless the closing tag
matches
// the opening tag, but I cannot get it to work with input from the
console...
    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

    Matcher m1 = p1.matcher("<t1>foo</t1>\r\n");
   System.out.println("matched test string = " + m1.matches());

   while ((userInput = stdIn.readLine()) != null) {

       System.out.println("got user input: " + userInput + " length " +
userInput.length());

       // Now see if the pattern matches

       Matcher m = p1.matcher(userInput);

       System.out.println("matched = " + m.matches());

        System.out.println("numGroups found: " + m.groupCount() + "\n");

        // If there were matches, print out the groups found

        if (m.matches()) {

            for (int j = 1; j <= m.groupCount(); j++) {
                System.out.println("group " + m.group(j) + " found\n");
            }  // end for
        }  // end if

    }  // end while

    stdIn.close();

    }  // end main

}  // end class test
david.karr - 02 Jul 2007 21:45 GMT
On Jul 2, 11:31 am, "mitch...@yahoo.com" <mitch...@yahoo.com> wrote:
> Below is a small test program I wrote to try and
> do a simple parse of an XML expression, where I
> can extract the tag(s) and the data on a single
> line.  Yes, I know about the other ways to parse
> real XML, but I am trying to learn Java only.

You're going to be following all sorts of gnarly twisty passages if
you try to avoid not learning XML.  The functionality for parsing XML
is easily available in standard Java libraries.

Feel free to explore regular expressions as an intellectual exercise,
but it's a waste of time if you're actually trying to produce real
code to parse XML.
timjowers - 02 Jul 2007 21:46 GMT
On Jul 2, 2:31 pm, "mitch...@yahoo.com" <mitch...@yahoo.com> wrote:
> Below is a small test program I wrote to try and
> do a simple parse of an XML expression, where I
[quoted text clipped - 85 lines]
>
> }  // end class test

It works.

       Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");

you may be putting a whitespace in the text of the element. Try
revising the regexp to look for anything not the terminator. E.g. this
works as is:
<i>test</i>

  Yet this does not.
<i>test two</i>

TimJOwers
kaldrenon - 02 Jul 2007 21:58 GMT
> E.g. this
> works as is:
[quoted text clipped - 4 lines]
>
> TimJOwers

Which could easily be fixed by replacing the (\\S+) in the middle with
(.?) or (.+), I believe.
Roedy Green - 02 Jul 2007 22:18 GMT
On Mon, 02 Jul 2007 11:31:08 -0700, "mitchmcc@yahoo.com"
<mitchmcc@yahoo.com> wrote, quoted or indirectly quoted someone who
said :

>    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
>    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
>    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

You have 4 things that have to work for your regex as a whole to work.
Chop your pattern down to just match <t1> then when you get the
working add the next bit.

Instead of trying all possibilities of \n, have a look at your string
and see what is on the end. use charAt to examine it.

see http://mindprod.com/jgloss/regex.html

.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.