Hi,
Can anyone help me in solving this problem.
I have an example input:
sometext<b><i>some text</i></b>
the input may vary i.e. 1 tag is opened & not closed, some mismatches
To do:
1.check for few html tags like b,i,u
2.opening and closing of tags must be in proper order without
overlaping.
I have to write a java code to validate this.
Can anyone help me..
Thanks in Advance..
Regards,
Pradeep.
Mark Thomas - 17 Apr 2006 12:13 GMT
> Hi,
>
[quoted text clipped - 15 lines]
> Regards,
> Pradeep.
I'd use a finite state machine - googling that might get you started.
Mark
Martin Gregorie - 17 Apr 2006 12:42 GMT
>> Hi,
>>
[quoted text clipped - 17 lines]
>>
> I'd use a finite state machine - googling that might get you started.
Or a stack.

Signature
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Tim Smith - 18 Apr 2006 05:57 GMT
> > 2.opening and closing of tags must be in proper order without
> > overlaping.
...
> I'd use a finite state machine - googling that might get you started.
Wait a second...isn't checking for closing tags being in the right order
and for tags not overlapping equivalent to the problem of recognizing
palindromes? And isn't that one of the classic examples of something
that you can't do with a finite state machine?

Signature
--Tim Smith
Oliver Wong - 18 Apr 2006 14:27 GMT
>> > 2.opening and closing of tags must be in proper order without
>> > overlaping.
[quoted text clipped - 6 lines]
> palindromes? And isn't that one of the classic examples of something
> that you can't do with a finite state machine?
You're right. It can't be done with a finite state machine. You'd need
an infinite state machine (or a stack machine, or something equally
powerful, etc.)
- Oliver
Venkatesh - 17 Apr 2006 13:42 GMT
U can just make use of stack and java pattern matching package
(java.util.regex) ....
Here is the code to find tags in given html string:
private static final String HTML_TAG_PATTERN = "<[^>]*>";
private static final Pattern searchPattern =
Pattern.compile(HTML_TAG_PATTERN);
private Matcher m = null;
private String m_htmlStr = null;
private boolean m_initDone = false;
public void init(String htmlStr){
m_htmlStr = htmlStr;
m = searchPattern.matcher(m_htmlStr);
m_initDone = true;
}
private String getNextTag() throws Exception {
if (!m_initDone) {
throw new Exception("Not yet initialized ....");
}
String tagToReturn = null;
if (m.find()) {
tagToReturn = m_htmlStr.substring(m.start(), m.end());
}
return tagToReturn;
}
So, make use of a stack and push all the start tags and selectively pop
them up whenever u find an end tag and compare to find if the start and
end tags match.
Hope this helps
-Venkatesh
Greg R. Broderick - 17 Apr 2006 15:07 GMT
[posted and mailed]
> To do:
> 1.check for few html tags like b,i,u
[quoted text clipped - 3 lines]
> I have to write a java code to validate this.
> Can anyone help me..
Use a stack data structure.
Scan through the text looking for HTML tags.
When you encounter a start tag, push it on the stack.
When you encounter an end tag, pop the top element from the stack and
compare it to the end tag.
Cheers
GRB

Signature
---------------------------------------------------------------------
Greg R. Broderick [rot13] terto@oynpxubyvb.qlaqaf.bet
A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------
Oliver Wong - 17 Apr 2006 18:04 GMT
> Hi,
>
[quoted text clipped - 10 lines]
> I have to write a java code to validate this.
> Can anyone help me..
You might be interested in HTML Tidy:
http://www.w3.org/People/Raggett/tidy/
- Oliver
Martin Gregorie - 17 Apr 2006 18:36 GMT
>> Hi,
>>
[quoted text clipped - 13 lines]
> You might be interested in HTML Tidy:
> http://www.w3.org/People/Raggett/tidy/
Agreed. If you're writing HTML you should not be without it. However, I
think you'll find the latest versions are here:
http://tidy.sourceforge.net/
The new C version is worth having and there's a Java version too.

Signature
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |