Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / December 2007

Tip: Looking for answers? Try searching our database.

Regex Replacement: Replacing text with an empty string

Thread view: 
Hal Vaughan - 24 Dec 2007 04:33 GMT
I'm trying to clean up some comments in web pages.  I'm using regexes to do
a lot of the work, but I've run into a problem.  Toward the end of the
process, I'm trying to replace any remaining HTML tags with an empty
string, as in no spaces, nothing, just "".  If I replace the HTML tags with
a space or other characters it works, but it won't work with an empty
string.  (I also tried at mindprod.com, one of the first places for Java
info, but the site is down.)

Here's a snippet to explain what I'm doing:

//sDesc is the string with the text I'm working on
       String sTag = "<.*?>";
       Pattern pTag = Pattern.compile(sTag);
       Matcher lineMatch = pTag.matcher(sDesc);
       sDesc = lineMatch.replaceAll("");

If I use " " in that last line, it works fine, but whenever I use "", the
HTML tags are NOT replaced.

I know it deeply offends people if any code is posted that isn't ready to be
compiled and run as is, but I think this is more about how regexes work
than a specific piece of code.  I've searched for "empty string" in
connection with regex replacement (and using different terms), but I
haven't found anything about this.  In most cases, I find something talking
about accidently matching empty strings.  I would also think there's a
better term than empty string to apply to this.  Is there?

Why is it that a replace with a space works but with an empty string it
doesn't?

Thanks!

Hal
Patricia Shanahan - 24 Dec 2007 04:44 GMT
...
> Here's a snippet to explain what I'm doing:
>
[quoted text clipped - 10 lines]
> compiled and run as is, but I think this is more about how regexes work
> than a specific piece of code.  
...

I'm afraid you do need to prepare a test case that is a complete
program. I took your code and attempted to reproduce the problem, but it
works perfectly. There is something else about your program that is not
inherent in the snippet that is causing the problem.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexReplaceTest {
  public static void main(String[] args) {
    String sDesc = "XXX<I'm a Tag>YYY";
    System.out.println(sDesc);
    String sTag = "<.*?>";
    Pattern pTag = Pattern.compile(sTag);
    Matcher lineMatch = pTag.matcher(sDesc);
    sDesc = lineMatch.replaceAll("");
    System.out.println(sDesc);
  }
}

output:

XXX<I'm a Tag>YYY
XXXYYY

Patricia
Hal Vaughan - 24 Dec 2007 04:51 GMT
> ...
>> Here's a snippet to explain what I'm doing:
[quoted text clipped - 17 lines]
> works perfectly. There is something else about your program that is not
> inherent in the snippet that is causing the problem.

Okay.  No problem.  I haven't touched regexes in Java until this past week
and, try as I could, the empty string just would not work.  I've found that
there are a LOT of things in any language that are often taken as
understood by people working in it but can trip up someone who hasn't
worked with that feature before.  I figured this was one of them.  All the
matching I did and experimented with worked great, until I used empty
strings.

What I can't see is how other code would effect a regex, but I'll play
around and see what I get.

Thanks!

Hal
Hal Vaughan - 24 Dec 2007 05:17 GMT
> ...
>> Here's a snippet to explain what I'm doing:
[quoted text clipped - 17 lines]
> works perfectly. There is something else about your program that is not
> inherent in the snippet that is causing the problem.

All I needed, and this was a BIG help, was to find out there was no issue
with using a null.

Believe it or not, I just added this line:

       String newDesc = sDesc;

Then I used newDesc in every place where I used sDesc before.

It works now.  No idea why that makes a difference and, honestly, I don't
have time to pursue it.

Thanks for the verification this isn't just some obscure point I had never
heard of.

Hal
Daniel Pitts - 25 Dec 2007 17:38 GMT
>> ...
>>> Here's a snippet to explain what I'm doing:
[quoted text clipped - 34 lines]
>
> Hal
I doubt that was all you needed to do, and I suspect that it didn't
actually fix your problem.  My guess is that somewhere along the line
you fixed the underlying problem, and didn't know it.

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Hal Vaughan - 26 Dec 2007 02:33 GMT
...
>> Believe it or not, I just added this line:
>>
[quoted text clipped - 12 lines]
> actually fix your problem.  My guess is that somewhere along the line
> you fixed the underlying problem, and didn't know it.

I wouldn't be surprised, but this is "time off" programming to mess with a
few things I've never tried before (regexes being just one of them).  If it
were for work, I'd be busting my tail to go through every line and see what
fixed it, but since this project will be cut up into pieces for other ones,
I'll see what happens later.

Hal
James - 26 Dec 2007 04:15 GMT
> ...
>>> Believe it or not, I just added this line:
[quoted text clipped - 5 lines]
>>> It works now.  No idea why that makes a difference and, honestly, I
>>> don't have time to pursue it.
[snip]

In my limited experience, the problem to be tends to be related to scope
when changing a variable name solves it.
Signature

James
*Note: Remove every other letter for correct email address

SadRed - 24 Dec 2007 04:52 GMT
> I'm trying to clean up some comments in web pages.  I'm using regexes to do
> a lot of the work, but I've run into a problem.  Toward the end of the
[quoted text clipped - 29 lines]
>
> Hal

> this is more about how regexes work than a specific piece of code
That is other way around. This statement is a flavor of arrogance. The
replaceAll() with empty string works flawless. Fault is on your code
or input, not on the Java regex. Post an SSCCE with a small example
input. See: http://homepage1.nifty.com/algafield/sscce.html
Roedy Green - 24 Dec 2007 09:01 GMT
On Sun, 23 Dec 2007 23:33:26 -0500, Hal Vaughan
<hal@thresholddigital.com> wrote, quoted or indirectly quoted someone
who said :

> I would also think there's a
>better term than empty string to apply to this.

a String of 0 chars is called an "empty String", as distinct from
null.  The difference is the source of all manner of bugs in
professionally written code.  Programmers writing Javadoc  tend to be
fuzzy about whether a method can accept/produce an empty/null String.

Since either flavour of String is often rare, the bug won't show up in
routine testing.

Eiffel has design by contract to formally describe assertions on
method inputs and outputs. See
http://mindprod.com/jgloss/designbycontract.html

I would like to have formal ways of describing String types with
whether they can be null or empty, with exceptions if they are when
they shouldn't be.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Hal Vaughan - 24 Dec 2007 20:33 GMT
> On Sun, 23 Dec 2007 23:33:26 -0500, Hal Vaughan
> <hal@thresholddigital.com> wrote, quoted or indirectly quoted someone
[quoted text clipped - 7 lines]
> professionally written code.  Programmers writing Javadoc  tend to be
> fuzzy about whether a method can accept/produce an empty/null String.

I knew null was the wrong term, since that's an entirely different thing (I
can do (if myString == null) but not (if myString == "")).  I was not sure
if "empty string" was the actual technical term.  I've heard people
say "null string" and I know what they mean, but I also know that's wrong.
I wasn't sure if there was a better term to Google than "empty string."

> Since either flavour of String is often rare, the bug won't show up in
> routine testing.
>
> Eiffel has design by contract to formally describe assertions on
> method inputs and outputs. See
> http://mindprod.com/jgloss/designbycontract.html

I don't know why, but for several hours yesterday, your site was down, or at
least inaccessible from my location.  It was the first place I tried
looking for an answer.  I figured if there were a quirk I needed to know
about with regexes and empty strings, I'd find it mentioned there.

> I would like to have formal ways of describing String types with
> whether they can be null or empty, with exceptions if they are when
> they shouldn't be.

I can certainly see the need for that!

Thanks!  As always, you have some good and useful information!

Hal


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.