
Signature
Paul Lutus
http://www.arachnoid.com
> On the contrary. Those methods are more than powerful enough to handle the
> described task. How do I know? That is how the class responsible for this
> task does it.
Which class? How do you handle:
<img
src="pippo"
/>
>>On the other hand, XML parsing turns up to introduce
>>even more problems than I am trying to solve
>
> Name them.
I just did. The stupid parser try to open an HTTP connection to retrieve the DTD
>>(as an aside, wasn't
>>XML supposed to be simple?)
>
> No, that is a myth.
If this is a myth, it is one that the XML industry has contributed
to fuel. Have a look at the first line of:
http://www.w3.org/XML/
"Extensible Markup Language (XML) is a simple, very flexible text format
derived from SGML"
> XML is supposed to eliminate unnecessary duplication and
> provide a way to standardize data structures. If the data structures are
> complex, so is the XML representation.
the problem is that a lot of complexity is there also for mega-simple stuff.
>>Is there an easy way to achieve my goal? XML parsing or regexps?
>
> What can I say? Yes? XML parsing and regular expressions seem to be part of
> the same topic.
So, how do you handle:
<img
src="pippo"
/>
with regexps in Java?
Luca
Stefan Schulz - 28 Sep 2004 12:42 GMT
>>> On the other hand, XML parsing turns up to introduce
>>> even more problems than I am trying to solve
>> Name them.
>
> I just did. The stupid parser try to open an HTTP connection to retrieve
> the DTD
Use a non-validating Parser then. :)

Signature
Whom the gods wish to destroy they first call promising.
luca - 28 Sep 2004 14:46 GMT
> Use a non-validating Parser then. :)
Which one? even SAX goes for the DTD!!!
Also, be careful, because what I found out by discussing
with XML gurus is that even non-validating parsers are required
to go after the DTD if they see one according to XML specs!!!!!
Luca
Tor Iver Wilhelmsen - 28 Sep 2004 19:06 GMT
> Which one? even SAX goes for the DTD!!!
What does your DOCTYPE look like?
Try setting a custom EntityResolver that doesn't return null:
http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html
Stefan Schulz - 28 Sep 2004 12:55 GMT
> So, how do you handle:
>
> <img
> src="pippo"
> />
From the top of my head:
"<\p{Space}*img\p{Space}+src=\"\p{Graph}+\"\p{Space}*>" should match pretty
any img tag that has no alts, height etc attributes. How to add them...
look at the
alternative Operator (It is the | )

Signature
Whom the gods wish to destroy they first call promising.
luca - 28 Sep 2004 14:49 GMT
> From the top of my head:
>
> "<\p{Space}*img\p{Space}+src=\"\p{Graph}+\"\p{Space}*>" should match pretty
> any img tag that has no alts, height etc attributes. How to add them...
> look at the
> alternative Operator (It is the | )
but this is not good enough for me (this is why I went for XML parsing
in the first place). All I know about my mark-up is that it's well-formed,
but I don't know anything about the order or the availability
of other attributes:
<img
src="pippo"
/>
<img alt="pippo"
height="25"
src="pippo"
/>
<img src="pippo" alt="pippo"
height="25" />
<img height="35" src="pippo" />
this are all good. BTW the XML guys claimed confidently that RegExps
are, generally speaking, not powerful enough to parse XML!
Luca
Stefan Schulz - 28 Sep 2004 15:14 GMT
>> From the top of my head:
>> "<\p{Space}*img\p{Space}+src=\"\p{Graph}+\"\p{Space}*>" should match
[quoted text clipped - 8 lines]
> but I don't know anything about the order or the availability
> of other attributes:
Well, in that case do what i said: Within the tag, make an alternative of
all the
possible attributes (refer to the DTD for the List of allowed attributes).
> this are all good. BTW the XML guys claimed confidently that RegExps
> are, generally speaking, not powerful enough to parse XML!
Generally speaking, this is true. In this particular case, you can however
do it,
since the only thing XML can do that Regular expressions can not is build
trees.
img tags, however, are necessarily leaves on the document tree.

Signature
Whom the gods wish to destroy they first call promising.
Daniel Sjöblom - 28 Sep 2004 16:00 GMT
> BTW the XML guys claimed confidently that RegExps
> are, generally speaking, not powerful enough to parse XML!
They aren't. Parsing XML requires a stack (or more precisely, the parser
needs to remember all the previous states that led to the current
state.) Regular languages can be parsed without remembering state.
However, some of the available regular expression packages contain
constructs that are quite a bit more powerful than real regular expressions.

Signature
Daniel Sjöblom
Remove _NOSPAM to reply by mail
Paul Lutus - 28 Sep 2004 16:51 GMT
> > On the contrary. Those methods are more than powerful enough to handle
> > the
[quoted text clipped - 6 lines]
> src="pippo"
> />
To what does "you" refer? Existing classes, or your own classes? The answer
in both cases is "easily", but that is beside the point.
>>>On the other hand, XML parsing turns up to introduce
>>>even more problems than I am trying to solve
[quoted text clipped - 3 lines]
> I just did. The stupid parser try to open an HTTP connection to retrieve
> the DTD
How long will this take? I already told you -- write your own parsing class.
>>>(as an aside, wasn't
>>>XML supposed to be simple?)
[quoted text clipped - 8 lines]
> "Extensible Markup Language (XML) is a simple, very flexible text format
> derived from SGML"
And simple languages can be used to convey complex ideas. If that were not
true, the language would be abandoned.
>> XML is supposed to eliminate unnecessary duplication and
>> provide a way to standardize data structures. If the data structures are
>> complex, so is the XML representation.
>
> the problem is that a lot of complexity is there also for mega-simple
> stuff.
No, not really. Simple tasks can be handled using simple XML. Complex tasts
require complex XML.
>>>Is there an easy way to achieve my goal? XML parsing or regexps?
>>
[quoted text clipped - 8 lines]
>
> with regexps in Java?
Trivially:
String result = original.replaceAll("\\n+"," ");
Working example:
public class Test {
public static void main(String[]args)
{
String a = "<img\n"
+ "src=\"pippo\"\n"
+ "/>";
String b = a.replaceAll("\\n+"," ");
System.out.println(a + " -> " + b);
}
}
Result:
<img
src="pippo"
/> -> <img src="pippo" />
Wow, that was really hard!

Signature
Paul Lutus
http://www.arachnoid.com
jmm-list-gn - 28 Sep 2004 19:12 GMT
> Which class? How do you handle:
>
> <img
> src="pippo"
> />
Use the getAttribute() method for Element (org.w3c.dom).

Signature
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)