Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

Scanning Strings to replace with links.

Thread view: 
questionmarc420@msn.com - 24 Jan 2006 20:31 GMT
hi again, :D
i have large strings containing many paragraphs. The string is to be
displayed. I want to have all substrings that start with "html" and
"www."   wraped in <a></a> tags.

here is the code i have:

data_body = data_body.replaceAll("((https?|ftp)://|mailto:)[^\\s<]+",
"<A TARGET=\"_new\" HREF=\"$0\">$0</a>");

 data_body = data_body.replaceAll("(www.)[^\\s<]+", "<A
TARGET=\"_new\" HREF=\"http://$0\">$0</a>");

my problem is:
if a link is "http://www." the substring is replaced twice. meaning the
link would appear something like this:
< a HREF=http://<a HREFwww.>http://www.</a/a>>
or soemthing along those lines.
i tried putting the www. in the first expression but that does not work
because it only links to the locahost.

i also tried adding a space before the "www." so it would be like "
www."
this almost worked however there came to be spaces in the link so it
would not work in the browser.

if anyone understands and can help, please do.
if you are unclear on anything praticualr please tell me to explain.
thanks
-morc

oh and also sometimes if a link is placed in parentheses in the text
then it tends to add them to the link. if anyone knows of a way to
exclude the parentheses please share. :D thanks
javabuddha@gmail.com - 24 Jan 2006 22:59 GMT
Good point, I didn't consider that when I posted the HTMLEncode
function.  Anyway
the topic is updated now.  This is the part that is relevant to your
questions, you
are on the right track, just need to switch the order:

str = str.replaceAll("([\\s\\(])www\\.", "$1http://www.");
str = str.replaceAll("((https?|ftp)://|mailto:)[^\\s<\\(\\)]+", "<A
HREF=\"$0\">$0</a>");

Good luck,

Matt
morc - 25 Jan 2006 20:39 GMT
thanks. what javabud wrote worked flawlessy
thanks alot guys.
-morc
Roedy Green - 25 Jan 2006 02:00 GMT
>data_body = data_body.replaceAll("((https?|ftp)://|mailto:)[^\\s<]+",
>"<A TARGET=\"_new\" HREF=\"$0\">$0</a>");
>
>  data_body = data_body.replaceAll("(www.)[^\\s<]+", "<A
>TARGET=\"_new\" HREF=\"http://$0\">$0</a>");

The simplest way would be to  convert naked www. to http://www. first
then apply your http: -> <a href transform. You want to search for
"not // followed by www. " You seem to have a good grasp of regex
already so I will leave you to compose the string. If you have
trouble, see http://mindprod.com/jgloss/regex.html

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 25 Jan 2006 02:01 GMT
>data_body = data_body.replaceAll("((https?|ftp)://|mailto:)[^\\s<]+",
>"<A TARGET=\"_new\" HREF=\"$0\">$0</a>");
>
>  data_body = data_body.replaceAll("(www.)[^\\s<]+", "<A
>TARGET=\"_new\" HREF=\"http://$0\">$0</a>");

the other way to do it, which you might find easier is to scan for
strings with indexof, and compose your results in a StringBuilder as
you go.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.