Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2006

Tip: Looking for answers? Try searching our database.

XML Not good for Big Files (vs Flat Files)

Thread view: 
Homer - 04 Apr 2006 16:27 GMT
I am a little bit tired of this obsession people have with XML and XML
technology. Please share your thoughts and let me know if I am thinking
in a wrong way. I believe some people are over using XML all over the
place. Nowadays Canadian Government is pushing XML to its organization
as standard for data/file transfer. Huge files moving between companies
now include tones of XML Tags repeating all over the file and slowing
down networks and crashing applications because of size.
I am not objecting to the whole technology. I know advantages of XML
and using it all the times for Config files or our web oriented
applications but using it as standard for moving big files is going too
far. Here is the example:

John,Smith,5555555,37 Finch Ave.

Is now:

<FirstName>John</FirstName>
<LastName>Smith</LastName>
<PhoneNum>5555555</PhoneNum>
<Address>37 Finch Ave.</Address>

And Tags are repeating and repeating:

<FirstName>....</FirstName>
<LastName>....</LastName>
<PhoneNum>....</PhoneNum>
<Address>....</Address>

<FirstName>....</FirstName>
<LastName>....</LastName>
<PhoneNum>....</PhoneNum>
<Address>....</Address>

Please let me know what you think.

Regards,

Homer
James McGill - 04 Apr 2006 16:50 GMT
> And Tags are repeating and repeating:

XML markup does tend to bloat the data.  

I personally believe you should use serializable objects that can be
represented according to an XML schema when that's appropriate, but that
also can be serialized into a tightly packed format when that is
appropriate as well.  So I should be able to marshal/unmarshal the
serialized object to and from XML, but I should also be able to stream
that object without marshalling it -- and the other end should be able
to unmarshal to xml, validate according to the schema, etc.  

Likewise, database bindings should be informed by the xml schema, but
the XML markup shouldn't be what you store in the db.  
mtp - 04 Apr 2006 17:01 GMT
> I am a little bit tired of this obsession people have with XML and XML
> technology. Please share your thoughts and let me know if I am thinking
[quoted text clipped - 3 lines]
> now include tones of XML Tags repeating all over the file and slowing
> down networks and crashing applications because of size.

you can use indexing, binary XML, or compression

> I am not objecting to the whole technology. I know advantages of XML
> and using it all the times for Config files or our web oriented
[quoted text clipped - 23 lines]
>
> Please let me know what you think.

may be one of the computing service wanted more money for his service
with this big project ?

may be everybody think "newer is better" ?
cherukan@gmail.com - 04 Apr 2006 17:06 GMT
> I am a little bit tired of this obsession people have with XML and XML
> technology. Please share your thoughts and let me know if I am thinking
[quoted text clipped - 34 lines]
>
> Homer

Yes that does seem like a network killer. It depends on what the
intended use of the file is, on the other end and the client receiving
it, if they *have to* use XML, certain optimizations can be done for
just the transfer part...

<header>
 <firstName>A15</firstName>
 <lastName>A15</lastName>
 <phone>A10</phone>
 <address>A10</address>
</header>
<data>
[[CDATA
 <!-- fixed width data goes here -->
]]
</data>

OR

<header>
 <fieldSeparator>;</fieldSeparator>
 <field>firstName</field>
 <field>lastName</field>
 <field>phone</field>
 <field>address</field>
</header>
<data>
[[CDATA
 <!-- delimited data goes here -->
]]
</data>

OR  a combination of the above.

In short, XML should be preferred only if documentation and
discoverability are more important than performance.
James McGill - 04 Apr 2006 17:19 GMT
> OR  a combination of the above.

You're almost touching on the big problem:  Misconception of what it
means to be "standard".

XML has (several) standardized markup frameworks, but it is silent as to
content or utilization.  It is ridiculous for a government entity to
demand that "XML" be "the standard" for data interchange.  They need to
bless certain schemas if that's their goal, but it also needs to be
abstract enough that systems can be designed efficiently.  

In your examples, the designers can claim that they are using "XML", and
therefore "are standardized" on it, but the three examples we've seen so
far are not at all interchangeable...
RC - 04 Apr 2006 17:11 GMT
> Please let me know what you think.

XML is never designed to replace database server.

You can use XML file transfer portion of data
from a database.
 i.e.

SELECT lastname,fistname,phonenumber,address
FROM phonebook
WHERE state = 'NY' AND city = 'somewhere';

A flat file like this

William|John|12345678|84 5th Ave

I don't know which column is last name, first name.
3rd column is person ID or phone number?

You need let the programmers know what column is what.

Next time if some one change flat file format to

85 5th Ave|John|William|12345678

Then your database will incorrect after updated.

True XML creates large file size.
But it makes our life easier.

You can make up your own tags
<lastName> or <Last_Name>, etc.
the tags can be in English, Spanish, French, Russian, Japanese, etc.
Alex Hunsley - 05 Apr 2006 00:04 GMT
>> Please let me know what you think.
>
[quoted text clipped - 14 lines]
> I don't know which column is last name, first name.
> 3rd column is person ID or phone number?

That's what a header field would be for.

> You need let the programmers know what column is what.
>
[quoted text clipped - 3 lines]
>
> Then your database will incorrect after updated.

Presumably the header field will reflect the change.
Yeah, it's an extra thing to go wrong, admittedly...

> True XML creates large file size.
> But it makes our life easier.
>
> You can make up your own tags
> <lastName> or <Last_Name>, etc.
> the tags can be in English, Spanish, French, Russian, Japanese, etc.
Monique Y. Mudama - 05 Apr 2006 05:24 GMT
> Presumably the header field will reflect the change.  Yeah, it's an
> extra thing to go wrong, admittedly...

Yeah ... the markup format is nice if partial data is considered
better than no data at all ...

Signature

monique

Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html

Timbo - 04 Apr 2006 17:39 GMT
> John,Smith,5555555,37 Finch Ave.
>
[quoted text clipped - 4 lines]
> <PhoneNum>5555555</PhoneNum>
> <Address>37 Finch Ave.</Address>

It's true that the XML data in your example is bulky, but what it
has that the unstructured doesn't have is meta-level information,
such as "John" the first name of someone. If the parties involved
(ie. that sender and receiver of this information) have an
agreement as to the meaning of "FirstName", then they are sharing
more than just text... it has some implicit meaning. If you send
it unstructured, then the receiver has to know how to parse the
data into this agreed meaning, which means it needs to know the
format of the data.

Then, on the other hand, if the data is just stored in a database
or something with no definition of the what the tags mean, then I
agree with you... using XML is of little use.
Homer - 04 Apr 2006 19:08 GMT
I guess these responses are proving of my point. You know all that the
best solution for transferring huge files between two parties is simple
flat file that both sender/receiver have agreed upon file format and
using secure line. But you still defend adding tons of tags to a file
that both sender/receiver are familiar with the format. I believe lots
of people are using XML because it's cool and new. And these people
give advise to companies and organizations.

Some points about your suggestions:

1- Marshalling/Object Stream: Too Advance for places like government.
2- Have Mixed XML/Raw Data: Then what is the point of having XML at the
top? Unless you are sending the file to an unknown place that doesn't
know what is getting.
3- Compression: There is no good standard for compression (Unix is not
really ZIP friendly unless you add some opensource or buy Zip product)
and Mainframe is another story. Even for Windows you need to buy the
product (or use open source that most companies don't like). Also why
make file size triple and then compress it?

Let me give you another example of coolness (sorry, it's a bit off
the topic but it's about coolness):

I got a job in telecommunication company (cell phone) to convert their
code from C to C++ because OO was so cool those days but application
was working with no problem.
I did my job, converted the code/building class library for one year,
and left the company.

One year later they hired bunch of other people to come and convert the
whole thing to Java because Java was the Best.

3 years later they hired me again to convert everything again to J2EE
because J2EE is (guess what) the Best.

Regards,

Homer
James McGill - 04 Apr 2006 19:32 GMT
> I believe lots
> of people are using XML because it's cool and new.

It's anything but "cool".  And as for it being "new", XML isn't old
enough to vote, but SGML is.  If you aren't seeing the benefits of
logical structure and validation, standardized processing, etc.,
that may be because you aren't exploiting those things in your
application.  

One of your complaints is directly counter to an explicit design goal,
from the beginning of the XML spec: "Terseness in XML markup is of
minimal importance."

XML markup is deliberately intended to favor clarity to conciseness.

But most of your complaint seems to derive from the fact that you work
in a bureaucratic government situation, where you have no authority to
make decisions, and where there is a limited backchannel for your
recommendations.  That is unfortunate, but isn't it a choice you made
when you went to work for a government?

I've always been led to believe that the Canadian government is a
prototype of efficiency and reason, one that should make Americans feel
ashamed.  Are you suggesting that it too may be clogged with
bureaucratic nonsense?  I would be shocked to hear that!
Homer - 04 Apr 2006 20:06 GMT
Very good guess but no, I don't work for government. All I am saying
is in these cases sender and receiver both knows the file format by
heart. They know and their application knows. That's how they were
moving files in past and if they want to establish a new file transfer
they will let each other know about upcoming file format for sure.
There is no reason to send the file format along with each file every
time they have a file transfer (unless you are wearing name tag in your
home so your family know your name).
James McGill - 04 Apr 2006 20:25 GMT
> All I am saying
> is in these cases sender and receiver both knows the file format by
> heart.  They know and their application knows.

The interesting thing with XML is that in its case, the *document*
knows.  In a well designed system, the DTD can change and applications
can cope.   

>There is no reason to send the file format along with each file every
>time they have a file transfer

But you aren't sending the file format.  You're sending a notice with a
URI that locatest the format (schema, dtd, etc.), and then sending data
that's marked up according to that format.  

>(unless you are wearing name tag in your
>home so your family know your name).

Or like wearing a badge at a workplace, perhaps?
Jon Martin Solaas - 05 Apr 2006 07:25 GMT
> Very good guess but no, I don't work for government. All I am saying
> is in these cases sender and receiver both knows the file format by
[quoted text clipped - 4 lines]
> time they have a file transfer (unless you are wearing name tag in your
> home so your family know your name).

Ofcourse, but in other cases, when the file-format has to be
communicated, nobody knows it by heart, the data need to be
hierarchical, the receiver need to validate and perhaps transform to
another format, and not to mention implementing the apps to do so, xml
is useful. When a new fileformat is to be used, xsd comes in handy, and
also allows for automatic validation. In many orgranisations
misunderstandings occur, bugs are made and so on, so validation is nice.

XML was cool when I was a student 10 years ago. Now it's just convenient.

Maybe you should get more out. It's the people outside that doesn't know
your name :-)
Martin Gregorie - 04 Apr 2006 20:45 GMT
> I guess these responses are proving of my point. You know all that the
> best solution for transferring huge files between two parties is simple
[quoted text clipped - 3 lines]
> of people are using XML because it's cool and new. And these people
> give advise to companies and organizations.

Here's another thought: use ASN.1 encoding. Have a look here
<http://asn1.elibel.tm.fr/> if you haven't heard of it.

It does virtually everything XML does in terms of tagged fields and the
ability to completely omit optional fields and structures, but it uses
binary tags and can encapsulate binary data. Like XML you can take a
data description (written in BNF notation) and use it to generate file
encoders and decoders, or you can write fast interpretive decoders (as I
have). Its a standard in the telecoms industry, where its routinely used
to transfer multi-megabyte files as well as individual short messages.

Java ASN.1 schema compilers are available.

Translating a file between ASN.1 and XML should be a doddle: the site I
mentioned has a tool for doing just that.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Roedy Green - 04 Apr 2006 22:47 GMT
On Tue, 04 Apr 2006 20:45:14 +0100, Martin Gregorie
<martin@see.sig.for.address> wrote, quoted or indirectly quoted
someone who said :

>Translating a file between ASN.1 and XML should be a doddle

what part of the world does "doddle" derive from?  It just means
"easy"?
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Monique Y. Mudama - 05 Apr 2006 05:28 GMT
> On Tue, 04 Apr 2006 20:45:14 +0100, Martin Gregorie
><martin@see.sig.for.address> wrote, quoted or indirectly quoted
[quoted text clipped - 4 lines]
> what part of the world does "doddle" derive from?  It just means
> "easy"?

I had a mental image of a toddler, er, toddling along.  No idea if
that's actually what was meant.  In the context of my brain, it meant
"so easy a toddler could do it."

Signature

monique

Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html

Chris Uppal - 05 Apr 2006 11:49 GMT
> > what part of the world does "doddle" derive from?  It just means
> > "easy"?
>
> I had a mental image of a toddler, er, toddling along.  No idea if
> that's actually what was meant.  In the context of my brain, it meant
> "so easy a toddler could do it."

The word's common in British English.  I don't know about other
dialects/flavours.

The word "doddle" does derive from "toddle", according to the OED, where
"toddle" means the halting walk of an infant or elderly/infirm person.  A
doddle, however, is just something that is easy -- as the OED puts it: "a
'walk-over'".

   -- chris
Chris Uppal - 05 Apr 2006 11:45 GMT
> Here's another thought: use ASN.1 encoding. Have a look here
> <http://asn1.elibel.tm.fr/> if you haven't heard of it.

I can't understand why something as simple as data exchange (not /information/
exchange which is vastly more difficult) should require nine standards
documents which between them add up to book length.  Nor why it should require
a book written about it.   Why do people have to make things so /complicated/ ?

XML is, if anything, even worse.

Even YAML is way too complicated, albeit not in the same league as ASN.1 or
XML.

   -- chris
Joe Attardi - 04 Apr 2006 20:56 GMT
> I believe lots of people are using XML because it's cool and new. And these people
> give advise to companies and organizations.
XML isn't new. It's been around almost ten years. The first working
draft for the XML spec was put together in November of 1996.

> 3- Compression: There is no good standard for compression (Unix is not
> really ZIP friendly unless you add some opensource or buy Zip product)
Gzip? In fact IIRC, the gzip algorithm takes advantage of strings that
are repeated over and over (like the tag names) that help with its
compression.

> (or use open source that most companies don't like).
That most companies don't like? I don't think you researched this much
before making this statement. Look how many of the huge players (Sun,
IBM, etc.) have strong support for open source. In addition, open
source is being adopted all over the place.

> Let me give you another example of coolness (sorry, it's a bit off
> the topic but it's about coolness):
It's not just because XML is "the cool thing". It's perfectly suited
for the exchange of data like this. The data describes itself!
Monique Y. Mudama - 04 Apr 2006 21:11 GMT
> I guess these responses are proving of my point. You know all that
> the best solution for transferring huge files between two parties is
> simple flat file that both sender/receiver have agreed upon file
> format and using secure line. But you still defend adding tons of
> tags to a file that both sender/receiver are familiar with the
> format.

I guess that you are wrong.  I guess that the word "best" is meaningless
unless it is qualified by something.  If you want a format that is best
at clarity, then flat files lose.  I guess that you don't really
understand when to use XML, and that it doesn't really matter because
you don't have the authority to change things in the environment in
which it's causing you trouble, so you've developed a grudge against
XML rather than against whoever decided to use it inappropriately or
whoever decided to create an excessively verbose schema.

> I believe lots of people are using XML because it's cool and
> new. And these people give advise to companies and organizations.

XML isn't new enough to offer the glamour factor you think it has.

Signature

monique

Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html

Chris Uppal - 05 Apr 2006 11:45 GMT
> XML isn't new enough to offer the glamour factor you think it has.

Remember that we are talking about a government here.  Being only a decade
behind the times is damned impressive !

   -- chris
Monique Y. Mudama - 05 Apr 2006 14:57 GMT
>> XML isn't new enough to offer the glamour factor you think it has.
>
> Remember that we are talking about a government here.  Being only a
> decade behind the times is damned impressive !

Now, now.  In 1999 I worked on a US govt project (I think it was DoD, or
maybe DISA) to create an XML repository to share across govt branches.

I also spent 1998 through erm, a a couple of years ago working on Java
systems for some defense related stuff.  I think when we started we
were using 1.1.7, and it did take a looooong time to convince the
customer to upgrade, but after that it wasn't too hard to keep moving.
I remember getting bitten by glob imports + that new List class,
engendering a hatred of glob imports that continues to this day.

Some govt customers are very into new technology (almost to the point
of silliness -- they want to reimplement in the new stuff even if
there's no direct benefit and resources would be better spent
improving the rest of the app).

Signature

monique

Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html

James McGill - 05 Apr 2006 19:44 GMT
> Remember that we are talking about a government here.  

The Canadian government, which I've been led to understand is the most
progressive on Earth, etc.  
Roedy Green - 05 Apr 2006 22:17 GMT
On Wed, 05 Apr 2006 11:44:13 -0700, James McGill
<jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone
who said :

>The Canadian government, which I've been led to understand is the most
>progressive on Earth, etc.  

A government has with a smaller population to serve has a huge
advantage when it comes to being light on its feet.  I worked for a
Canadian crown corporation writing an RFP for about a million dollars
worth of computer equipment.  I was in Seattle for a New Year's eve
party and met a guy doing something similar there. We both bitched
about all the silly regulations and petty legalities. We decided to
swap RFPs to see who had it worse. His was ten times thicker.

The thing that blows my mind about the US bureacracy is that crooks
have managed to embezzle trillions of dollars over the last decade and
hardly anyone even knows about it.  See
http://mindprod.com/politics/iraqeconomics.html near the bottom.
Mastermind crooks pulled off the heist of the century and it did not
even make the front page.

The amount of activity and the amounts of money or so huge that nobody
stays on top of what is going on. Further the amounts of money are so
huge that corruption and coverup are guaranteed.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

James McGill - 05 Apr 2006 22:48 GMT
> The thing that blows my mind about the US bureacracy is that crooks
> have managed to embezzle trillions of dollars over the last decade and
> hardly anyone even knows about it.

Controversial opinion, informed by partisan bias, and not one that I
necessarily disagree with.  Take it to alt.politics (where I read your
posts and often correspond).

So, what's the ASN.1 equivalent of JAXB?  
Roedy Green - 06 Apr 2006 01:53 GMT
On Wed, 05 Apr 2006 14:48:20 -0700, James McGill
<jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone
who said :

>So, what's the ASN.1 equivalent of JAXB?  
since XML and ASN.1 are interconvertible, if you have something that
needs XML, you fluff and use it.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Timbo - 05 Apr 2006 15:03 GMT
> I guess these responses are proving of my point. You know all that the
> best solution for transferring huge files between two parties is simple
> flat file that both sender/receiver have agreed upon file format and
> using secure line. But you still defend adding tons of tags to a file
> that both sender/receiver are familiar with the format.

My guess is that you don't really understand either my post, or
XML. It's not the FORMAT of XML, it's the fact that it contains
MEANING. So, if the sender and receiver have a shared ontology
that says that FirstName is someone's first name, then the data
<FirstName>John<FirstName> is more than just some text with the
value "John"... it is saying that "John" is his first name. So
rather than just having raw data, you have information that is
useful to the receiver. Moreso, for a third-party to use this
information, you need only to give them the shared definitions,
rather them give them the format and the meaning.
Chris Uppal - 05 Apr 2006 16:06 GMT
> My guess is that you don't really understand either my post, or
> XML. It's not the FORMAT of XML, it's the fact that it contains
> MEANING.

But it doesn't.  The meaning comes from the /interpretation/ of the data, not
from its transmission form.  The parties sharing data must come to an agreement
about the meaning before they can share information.  Once they have done that,
deciding on a shared format is pretty trivial whether they use XML, ASN.1,
YAML, CSV, or a custom format.

   -- chris
Oliver Wong - 05 Apr 2006 17:00 GMT
>> My guess is that you don't really understand either my post, or
>> XML. It's not the FORMAT of XML, it's the fact that it contains
[quoted text clipped - 8 lines]
> deciding on a shared format is pretty trivial whether they use XML, ASN.1,
> YAML, CSV, or a custom format.

   [In this post, I will group "XML", "ASN.1", "YAML", and "CSV with
headers" all under a single group which I will call "XML"; basically, this
"XML" group means data with metadata tags. As for "CSV without headers" and
"custom format", I'm going to group them together as "typical binary file".]

   I'd say it's somewhere in between Timbos and Chris' claims [with the
distortion of Chris' claim as described above]. If you plonked a typical
"binary" file onto my desktop (e.g. perhaps ripping a random file from a
Playstation DVD), and told me to try to interpret it, I could get out my hex
editor, and look around for human-readable strings, and from there maybe
look for end-of-string markers, or some sort of length-of-string headers,
and then from there try to figure out markers for other datatypes, but I'd
probably wouldn't get very far.

   Give me a typical XML file though, and I could probably come up with an
interpretation that is near the original, depending on how the elements and
attributes are named. If they file contains a reference to a DTD or XSD,
then I could navigate over to that URL and gain even more information.

   - Oliver
Chris Uppal - 06 Apr 2006 10:30 GMT
>     Give me a typical XML file though, and I could probably come up with
> an interpretation that is near the original, depending on how the
> elements and attributes are named.

Difficult to see how this is an advantage for production purposes.

> If they file contains a reference to a
> DTD or XSD, then I could navigate over to that URL and gain even more
> information.

Now that is a real advantage.  Note that the XML is not "self-describing", but
it's certainly a good attribute of the format that it can include a link to a
description.

   -- chris
Mark Thornton - 06 Apr 2006 17:18 GMT
>>    Give me a typical XML file though, and I could probably come up with
>>an interpretation that is near the original, depending on how the
>>elements and attributes are named.
>
> Difficult to see how this is an advantage for production purposes.

Some data suppliers change their format very regularly. Using XML gives
fewer surprises of this kind and it is then easier to guess the meaning
of a change and easier to ignore irrelevant changes.

I get geographic mapping information in a variety of formats. Although
bulky, the XML based data is the easiest to use. The bulk is usually
dealt with by compression, which in the case of gzip is trivial to
handle in Java.

Mark Thornton
Timbo - 05 Apr 2006 17:01 GMT
>>My guess is that you don't really understand either my post, or
>>XML. It's not the FORMAT of XML, it's the fact that it contains
[quoted text clipped - 3 lines]
> from its transmission form.  The parties sharing data must come to an agreement
> about the meaning before they can share information.

??? Which was exactly what I said in the sentence after the one
you quoted! :-) In hindsight, MEANING wasn't the correct word...
and I'm not sure of what IS the correct word...

> Once they have done that,
> deciding on a shared format is pretty trivial whether they use XML, ASN.1,
> YAML, CSV, or a custom format.

Sure, you can send it in a CSV format, but to keep the meta-data,
then it would be:
  FirstName=John, LastName=Smith, Phone=55555, etc,

where you basically have the tags in the CSV, and you are then
facing the same problems as the original poster was complaining
about. It's not the syntax of XML that is useful (frankly, I find
it tediously difficult to follow when I am forced too), it's the
fact that it provides an easy way to store meta-data, and there
are lots of nice tools to support this. It's this meta-information
that the original poster does not like.
Chris Uppal - 06 Apr 2006 10:35 GMT
> > > My guess is that you don't really understand either my post, or
> > > XML. It's not the FORMAT of XML, it's the fact that it contains
[quoted text clipped - 7 lines]
> ??? Which was exactly what I said in the sentence after the one
> you quoted! :-)

Then you shouldn't have shouted so loud -- my ears were still ringing and I
missed the next few words you said ;-)

> In hindsight, MEANING wasn't the correct word...
> and I'm not sure of what IS the correct word...

I think "formatting" is probably the right word.  There's no meaning in the
tags -- it might /look/ as if there's meaning, and well-chosen tags certainly
help if you are ever in the unfortunate position of having to read or edit XML
by hand, but there's nothing real there.

Perhaps I'd accept "mnemonics"...

   -- chris
Timbo - 06 Apr 2006 12:11 GMT
>>>>My guess is that you don't really understand either my post, or
>>>>XML. It's not the FORMAT of XML, it's the fact that it contains
[quoted text clipped - 10 lines]
> Then you shouldn't have shouted so loud -- my ears were still ringing and I
> missed the next few words you said ;-)

I wanted emphasise those two words, and many people still use
text-based newsreaders, so I don't use italics :)

>>In hindsight, MEANING wasn't the correct word...
>>and I'm not sure of what IS the correct word...
[quoted text clipped - 3 lines]
> help if you are ever in the unfortunate position of having to read or edit XML
> by hand, but there's nothing real there.

Ah, ok... we have actually got our shared definitions crossed :-)

"Formatting" is definately not the word I want. I think "meaning"
is the correct word, but "contains" is misleading. When I say that
using XML format "contains" meaning, I mean that it "has a"
meaning, not that the meaning is self-evident from the tags. That
is, the XML that is passed has a meaning that can be interpreted
by the receiver, if it shares the same definitions as the sender.

In ontological teams, "John, Smith, 555,.." is just a list of
instances of concepts, with no relation to their concepts. This
makes their meaning, at worst, impossible to derive, at best,
ambiguous. Whereas, <Person> ... <Person> is an instance of a
concept, but tagging it with its concept Person allows the
receiver to derive meaning and reason about this information.

How this information is formated is not really relevant, as long
as the "is-a" relations (and others) are present.
Stefan Ram - 06 Apr 2006 14:15 GMT
>How this information is formated is not really relevant, as
>long as the "is-a" relations (and others) are present.

 When a new document type is to be defined, when should one
 choose child elements and when attributes?

 The criterion that makes sense regarding the meaning can not
 be used in XML due to syntactic restrictions.

 An element is describing something. A description is an
 assertion. An assertion might contain unary predicates or
 binary relations.

 Comparing this structure of assertions with the structure
 of XML, it seems to be natural to represent unary predicates
 with types and binary relations with attributes.

 Say, "x" is a rose and belongs to Jack. The assertion is:

rose( x ) ^ owner( x, "Jack" )

 This is written in XML as:

<rose owner="Jack" />

 Thus, my answer would be: use element types for unary
 predicates and attributes for binary relations.

 Unfortunately, in XML, this is not always possible, because in
 XML:

   - there might be at most one type per element,

   - there might be at most one attribute value per attribute
     name, and

   - attribute values are not allowed to be structured in
     XML.

 Therefore, the designers of XML document types are forced to
 abuse element /types/, to describe the /relation/ of an
 element to its parent element.

 This /is/ an abuse, because the designation "element type"
 obviously is supposed to give the /type of an element/,
 i.e., a property which is intrinsic to the element alone
 and has nothing to do with its relation to other elements.

 The document type designers, however, are being forced to
 commit this abuse, to reinvent poorly the missing structured
 attribute values using the means of XML. If a rose has two
 owners, the following element is not allowed in XML:
 
<rose owner="Jack" owner="Jill" />
 
 One is made to use representations such as the following:

<rose>
 <owner>Jack</owner>
 <owner>Jill</owner></rose>

 Here the notion "element type" suggests that it is marked that
 Jack is "an owner", in the sense that "owner" is supposed to
 be the type (the kind) of Jack.
 
 The intention of the author, however, is that "owner" is
 supposed to give the /relation/ to the containing element
 "rose".  This is the natural field of application for
 attributes, as the meaning of the word "attribute" outside of
 XML makes clear, but it is not possible to use them for this
 purpose in XML.

 An alternative solution might be the following notation.

<rose owner="Alexander Marie" />

 Here a /new/ mini language (not XML anymore) is used within an
 attribute value, which, of course, can not be checked anymore
 by XML validators. This is really done so, for example, in
 XHTML, where classes are written this way.

 So in its main language XHTML, the W3C has to abandon XML
 even to write class attributes. This is not such a good
 accomplishment given that the W3C was able to use the
 experience made with SGML and HTML when designing XML and that
 XHTML is one of the most prominent XML applications.

 The needless restrictions of XML inhibit the meaningful use of
 syntax. This makes many document type designers wondering,
 when attributes and when elements are supposed to be used,
 which actually is an evidence of incapacity for the design of
 XML, that does not have many more notations than attributes
 and elements. And now the W3C failed to give even these two
 notations a clear and meaningful dedication!

 Without the restrictions described, XML alone would have
 nearly the expressive power of RDF/XML, which has to repair
 painfully some of the errors made in the XML-design.

 Now, some recommend to /always/ use subelements, because one
 can never know, whether an attribute value that seems to be
 unstructured today might need to become structured tomorrow.
 (Or it is recommended to use attributes only when one is quite
 confident that they never will need to be structured.) Now, this
 recommendation does not even try to make a sense out of
 attributes, but just explains how to circumvent the obstacles
 the W3C has built into XML.
 
 Others recommend to use attributes for something they
 call "metadata".

 Others use an XML editor that happens to make the input of
 attributes more comfortable than the input of elements and
 seriously suggest, therefore, to use as many attributes as
 possible.

 Still others have studied how to use CSS to format XML
 documents and are using this to give recommendations about
 when to use attributes and when to use subelements.

 Of course: Mixing all these criteria (structured vs.
 unstructured, data vs. "metadata", by CSS, by the ease of
 editing, ...) often will give conflicting recommendations.

 Other notations than XML have solved the problem by either
 omitting attributes altogether or by allowing structured
 attributes. I believe that notations with structured
 attributes, which also allow multiple element types and
 multiple attribute values for the same attribute name,
 are helpful.
Oliver Wong - 06 Apr 2006 17:29 GMT
>  Say, "x" is a rose and belongs to Jack. The assertion is:
>
[quoted text clipped - 3 lines]
>
> <rose owner="Jack" />

[...]
>  If a rose has two
>  owners, the following element is not allowed in XML:
[quoted text clipped - 17 lines]
>  XML makes clear, but it is not possible to use them for this
>  purpose in XML.

   How about something like:

<rose id="x" ownedBy="Jack"/>
<rose id="x" ownedBy="Jill"/>

or

<ownership owned="rose" owner="Jack"/>
<ownership owned="rose" owner="Jill"/>

or

<Person id="Jack">
 <belongings>
   <rose id="x"/>
   <!--Possibly other stuff-->
 </belongings>
</Person>
<Person id="Jill">
 <belongings>
   <rose id="x"/>
   <!--Possibly other stuff-->
 </belongings>
</Person>

depending on what exactly is the main message being conveyed (i.e. the XML
different documents here all say the same thing, but they put emphasis on
different things: the roses, the persons, or the ownership-relationships
themselves).

   - Oliver
Stefan Ram - 07 Apr 2006 05:14 GMT
>><rose owner="Jack" owner="Jill" />
><rose id="x" ownedBy="Jack"/>
><rose id="x" ownedBy="Jill"/>

 While your suggestion might be possible for Prolog-like
 databases of assertions, it might be difficult to apply
 it to text markup, where one actually would like to write:

     <p>He met
     <span class="name" class="person">Peter Miller</span> in
     <span class="name" class="town">London</span>.</p>

 It could be written in XML as:

     <p>He met
     <span id="563">Peter Miller</span> in
     <span id="564">London</span>.</p>
     <attribute idref="563" class="name"/>
     <attribute idref="563" class="person"/>
     <attribute idref="564" class="name"/>
     <attribute idref="564" class="town"/>

 But this looks as if it might be more difficult to maintain.

 NB: If "id" was declared as an »ID attribute« in the DTD, then

><rose id="x" ownedBy="Jack"/>
><rose id="x" ownedBy="Jill"/>

 might not be valid XML, because in XML »ID values must
 uniquely identify the elements which bear them« is a validity
 constraint. But here, »id« might be declared as an »IDREF
 attribute«.

>depending on what exactly is the main message being conveyed
>(i.e. the XML different documents here all say the same thing,
>but they put emphasis on different things: the roses, the
>persons, or the ownership-relationships themselves).

 ... and some of these choices then will be restricted by the
 restrictions of XML. For example, when one wants to put
 emphasis on the roses by mapping each rose to an XML element,
 some of the restrictions mentioned in my previous post apply.
Oliver Wong - 07 Apr 2006 15:08 GMT
>  NB: If "id" was declared as an »ID attribute« in the DTD, then
>
[quoted text clipped - 5 lines]
>  constraint. But here, »id« might be declared as an »IDREF
>  attribute«.

   Right, sorry.

>>depending on what exactly is the main message being conveyed
>>(i.e. the XML different documents here all say the same thing,
[quoted text clipped - 5 lines]
>  emphasis on the roses by mapping each rose to an XML element,
>  some of the restrictions mentioned in my previous post apply.

   You could "declare" a rose "x", and then start describing it, e.g.

<rose id="x"/>
<roseOwnership idref="x" owner="Jack"/>
<roseOwnership idref="x" owner="Jill"/>

   You seem not to like having information implied via parent-child
relationship, but I didn't quite understand why. I suspect the
rose-emphasized XML would more likely traditionally be written as something
like

<rose>
 <owners>
   <person idref="Jack"/>
   <person idref="Jill"/>
 </owners>
 <!-- perhaps other elements describing the rose here -->
</rose>

   - Oliver
Stefan Ram - 07 Apr 2006 20:16 GMT
>You seem not to like having information implied via
>parent-child relationship, but I didn't quite understand why.

 I have no problem with the parent-child relationship, but with
 the (ab)use of the /type/ of the child to name the /relation/
 to its parent (instead of the type of the child as the
 designation »type« implies). Using the /type/ to name the
 /relation/ contradicts its designation »type«.

>I suspect the rose-emphasized XML would more likely
>traditionally be written as something like
[quoted text clipped - 5 lines]
>  <!-- perhaps other elements describing the rose here -->
></rose>

 Possibly I can clarify my intentions by using another
 language with structured attributes. In my language »Unotal«
 one can write:

< &rose owner=< &person Jack > owner=< &person Jill >>

 Here, »owner« can be recognized as the name of a /binary/
 relation by the following »=«, while »rose« can be recognized
 as the name of a /unary/ relation (like a type) by the
 preceding »&«. In Unotal, this is always so, so it is
 easier to read.

 In XML, element types are sometimes used for /unary/ relations
 (sometimes for real types, as the name implies), but sometimes
 (ab)used for /binary/ relations (to specify the parent-child
 relationship). So when reading a child element type in XML,
 one does not know, whether it gives the type of this element
 or names the relationship to its parent.

                           ~~~

 I am working on a implementation of a reader and writer for
 Unotal in Java, and have a small application that uses this to
 implement Unotal as its file storage format in Java:

http://www.purl.org/stefan_ram/pub/joodo
 
 The Java source code for the Unotal implementation will be
 released later, but a description of Unotal is available at:

http://www.purl.org/stefan_ram/pub/unotal_en

 This page also contains the Unotal syntax specification, which
 is written in Unotal itself and then was automatically
 translated to HTML and ASCII from there.
Roedy Green - 05 Apr 2006 22:24 GMT
>My guess is that you don't really understand either my post, or
>XML. It's not the FORMAT of XML, it's the fact that it contains
>MEANING. So, if the sender and receiver have a shared ontology
>that says that FirstName is someone's first name, then the data
><FirstName>John<FirstName> i

Evan a csv file with a first line using field names contains the same
amount of information for a file like the one shown as the obese XML.

What the raw XML provides is not particularly useful information. You
can glean that by inspecting the file.Information you want which is
missing is how validated are each of the fields.  What guarantees
exist on values, what are the complete set of possibilities of each
enumeration and what do they mean.

Since the early DOS days I have been exporting data to people in
several formats, SQL, CSV, and fixed length ascii fields.  I generate
a separate human-readable  "schema" file that describes the field,
including limits and its length and offset.

No body has ever had trouble interpreting one of the files.

for a FLAT file there is no need to use tags.  That is only when you
have a structrured file.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Timbo - 06 Apr 2006 08:48 GMT
> for a FLAT file there is no need to use tags.  That is only when you
> have a structrured file.

Yes, sure. For tables etc, XML is of little value. I absolutely
agree, and I would use something like CSV for that.
Oliver Wong - 04 Apr 2006 17:44 GMT
>I am a little bit tired of this obsession people have with XML and XML
> technology. Please share your thoughts and let me know if I am thinking
[quoted text clipped - 30 lines]
>
> Please let me know what you think.

   If your complaint is file size during network transfer, compress the
file before sending it.

   If your complaint is file size during parsing, use SAX instead of DOM,
and don't keep the whole file in memory at once.

   Use the right tool for the job. If for whatever problem you're trying to
solve, you've got a better tool than XML, then use it. But if the problem is
"The government requires me to use XML", then I can't think of a better tool
than XML to solve that particular problem (except maybe emmigration ;)).

   - Oliver
James McGill - 04 Apr 2006 17:56 GMT
> except maybe emmigration

You say that as though anyone would ever leave the utopian paradise that
is Canada...
Lasse Reichstein Nielsen - 04 Apr 2006 17:58 GMT
> I am a little bit tired of this obsession people have with XML and XML
> technology.

Hear hear!  
Seems some people think XML is the solution to all problems.  
I'd rather classify it as the lowest common denominator for exchanging
tree-structured data - and definitly not something fit for humans to
read or write directly.

> John,Smith,5555555,37 Finch Ave.
>
[quoted text clipped - 6 lines]
>
> And Tags are repeating and repeating:

> Please let me know what you think.

Apart from what everybody else have said, zipping such a file
should yield a *very* high compression factor.

/L
Signature

Lasse Reichstein Nielsen  -  lrn@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
 'Faith without judgement merely degrades the spirit divine.'

Joe Attardi - 04 Apr 2006 18:29 GMT
> John,Smith,5555555,37 Finch Ave.
>
[quoted text clipped - 4 lines]
> <PhoneNum>5555555</PhoneNum>
> <Address>37 Finch Ave.</Address>

Yes but, now we know what all the data means. Your example is quite
clear, but what about this one:

Lawrence,David,Maynard,MA

Could mean several things:
(1) Lawrence David lives in Maynard, MA.
(2) David Lawrence lives in Maynard, MA
(3) David Maynard lives in Lawrence, MA
(4) Maynard David lives in Lawrence, MA
etc. You see where I'm going with this.

Where
<FirstName>Lawrence</FirstName>
<LastName>David</LastName>
<City>Maynard</City>
<State>MA</State>

leaves no question.

Yes, we as humans know intuitively that city and state go together. But
for an application using this data, there has to be some specification
defined and all systems that use it must be aware of it.
Oliver Wong - 04 Apr 2006 22:24 GMT
>> John,Smith,5555555,37 Finch Ave.
>>
[quoted text clipped - 9 lines]
>
> Lawrence,David,Maynard,MA

   Ah, obviously a list of 4 arbitrary strings, i.e. (in SQL terms):

CREATE TABLE foo {
 bar VARCHAR(255)
}

INSERT INTO foo VALUES ("Lawrence"),("David"),("Maynard"),("MA").

> Could mean several things:
> (1) Lawrence David lives in Maynard, MA.

Oops, okay, it's one record. Well, maybe it means.

Lawrence D. Maynard, who has an Masters in Arts. (Or perhaps it uses last
name first, i.e. David M. Lawrence, Masters in Arts).

   Or maybe (s)he's a Medical Assitant? Or (s)he lives in Madagascar?

> (2) David Lawrence lives in Maynard, MA
> (3) David Maynard lives in Lawrence, MA
> (4) Maynard David lives in Lawrence, MA
> etc. You see where I'm going with this.

   Hmm, looks like I was way off... Not being an American, I am not
familiar with American city names, nor American State abbreviations. If only
you had used XML!

   - Oliver
Steve Wampler - 04 Apr 2006 22:44 GMT
>    Hmm, looks like I was way off... Not being an American, I am not
> familiar with American city names, nor American State abbreviations. If
> only you had used XML!

No problem:

   <f1>John</f1>
   <f2>Smith</f2>
   <f3>5555555</f3>
   <f4>37 Finch Ave.</f4>

There, that should make people happy :)
(Of course, given this group, maybe the tags should be in Klingon...)
Chris Uppal - 05 Apr 2006 11:43 GMT
> No problem:
>
[quoted text clipped - 4 lines]
>
> There, that should make people happy :)

Slightly OT, but I believe that the Best Practise for handling addresses is
just have line1, line2, line3 and so on, rather than trying to identify the
"meaning" of each line.  There is much less consistency across address formats
than most programmers (or schema designers) realise.  So an XML format like
yours might be the best you can (or should) do.

   -- chris
Oliver Wong - 05 Apr 2006 15:32 GMT
>>    Hmm, looks like I was way off... Not being an American, I am not
>> familiar with American city names, nor American State abbreviations. If
[quoted text clipped - 9 lines]
> There, that should make people happy :)
> (Of course, given this group, maybe the tags should be in Klingon...)

   Well, at least with this notation, I wouldn't have made my initial
mistake of thinking I was dealing with 4 records which seemed to be
arbitrary strings.

   Give the tag names, I can see I am dealing with a single record with 4
fields.

   So we're making progress here, but perhaps the tag names could have been
better chosen.

   And if there were an XSD along with this, I could check wether f3 was
purely numeric, or if it could contain arbitrary string data as well.

   - Oliver
Steve Wampler - 05 Apr 2006 16:13 GMT
>>>    Hmm, looks like I was way off... Not being an American, I am not
>>> familiar with American city names, nor American State abbreviations. If
[quoted text clipped - 16 lines]
>    Give the tag names, I can see I am dealing with a single record with
> 4 fields.

Really?  I wouldn't have thought so.  What makes you think 'f' stands
for 'field'?  Maybe these are four new flavours of Ben&Jerry's ice cream.
(Not that I'd buy any of them...)

The point is that the tag names are, ultimately, just strings.  We might
think we understand what they mean (and can be right a high percentage of
the time if the strings are well chosen), but in the end, they mean
whatever the code at each end that defines the semantics (not the syntax)
to be.  That codes *still* has to agree at both ends, just as it does
with "John,Smith,5555555,37 Finch Ave.".  I haven't seen anything in XML
that does more than provide a guarantee that the syntax is right.
Joe Attardi - 05 Apr 2006 16:27 GMT
> I haven't seen anything in XML
> that does more than provide a guarantee that the syntax is right.

Hierarchical data, dude. What if someone has more than one phone
number? With the comma-delimited flat file approach, it's not readily
apparent how you could implement that.

<Person>
     <PhoneNumber>...</PhoneNumber>
     <PhoneNumber>...</PhoneNumber>
...
</Person>

we can have as many PhoneNumbers as we want that are associated with a
person, and because it's all hierarchical we can just walk up the
hierarchy to see who these PhoneNumbers belong to.
Steve Wampler - 05 Apr 2006 16:38 GMT
>> I haven't seen anything in XML
>> that does more than provide a guarantee that the syntax is right.
[quoted text clipped - 12 lines]
> person, and because it's all hierarchical we can just walk up the
> hierarchy to see who these PhoneNumbers belong to.

Eh?  That's still syntax.  Are you saying all syntax is non-hierarchical?

People have represented hierarchical data in many ways *well before XML*,
including, yes, flat files - and it's not that hard.  It's still a syntax issue.
Heck, even arbitrary graph data (hardly "hierarchical") has many syntactic
representations, including flat files.

Look, I *like* XML *for some things*, but wish people would take the time
to recognize what it is and want it isn't, please.
Roedy Green - 05 Apr 2006 22:31 GMT
>Hierarchical data, dude. What if someone has more than one phone
>number? With the comma-delimited flat file approach, it's not readily
[quoted text clipped - 3 lines]
>      <PhoneNumber>...</PhoneNumber>
>      <PhoneNumber>...</PhoneNumber>

You use a comma to represent any field which is not present.  You
don't just have a list of phone numbers, you assign them specific
functions.. You have something like this:

cell
home
work
800
fax
messages
emergency

the other way you do it is to have a separate phone numbers file (this
is SQL-think). Then you can have an arbitrary number of phone numbers.

the phone number file has the form

account#, phone

If you are exporting data only to import SQL again, this is a much
more convenient format than XML hierarchy.  SQL does not handle
variable numbers of things well directly, so you end up having to
write a complicated mess of XML export and import handling code, as
well as the process taking 100 times longer than it need do.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Andrew McDonagh - 05 Apr 2006 22:48 GMT
>> Hierarchical data, dude. What if someone has more than one phone
>> number? With the comma-delimited flat file approach, it's not readily
[quoted text clipped - 3 lines]
>>      <PhoneNumber>...</PhoneNumber>
>>      <PhoneNumber>...</PhoneNumber>
        <Pet>
            <Type>Dog</Type>
            <CuteName>Spot</CuteName>

> You use a comma to represent any field which is not present.  You
> don't just have a list of phone numbers, you assign them specific
> functions.. You have something like this:

One of XML file greatest advantage over CSV, flatfile, etc., is that it
supports schema evolution without requiring code changes.

Due to the nature of applications looking for the XML nodes they know
about, they ignore all other nodes.  So In the Person node example,
should we need to add a child node <Pets>, we can without harming the
existing app.
Jhair Tocancipa Triana - 08 Apr 2006 14:04 GMT
>> I haven't seen anything in XML
>> that does more than provide a guarantee that the syntax is right.

> Hierarchical data, dude. What if someone has more than one phone
> number? With the comma-delimited flat file approach, it's not readily
> apparent how you could implement that.

> <Person>
>       <PhoneNumber>...</PhoneNumber>
>       <PhoneNumber>...</PhoneNumber>
> ...
> </Person>

> we can have as many PhoneNumbers as we want that are associated with a
> person, and because it's all hierarchical we can just walk up the
> hierarchy to see who these PhoneNumbers belong to.

For decades you can achieve the same result in the example you state
using two files (one for the persons and other for the phone numbers)
and joining its contents (e.g. after loading them to a relational
database).

So XML offers nothing new in the scenario you describe...

Signature

--Jhair

Oliver Wong - 10 Apr 2006 18:35 GMT
>>> I haven't seen anything in XML
>>> that does more than provide a guarantee that the syntax is right.
[quoted text clipped - 19 lines]
>
> So XML offers nothing new in the scenario you describe...

   To be fair, Joe Attardi's example wasn't meant to show something "new",
but rather to show XML providing something more than a guarantee that the
syntax is right. In this respect, I think Joe's example is successful (in
that it demonstrates hierarchal data in addition to syntax).

   - Oliver
Steve Wampler - 10 Apr 2006 21:42 GMT
>    To be fair, Joe Attardi's example wasn't meant to show something
> "new", but rather to show XML providing something more than a guarantee
> that the syntax is right. In this respect, I think Joe's example is
> successful (in that it demonstrates hierarchal data in addition to syntax).

Eh? (again)  Are you really claiming that you cannot syntactically represent
hierarchical data?  Please explain how context-free grammars represent
arithmetic expressions if hierarchy isn't syntax.
Oliver Wong - 10 Apr 2006 22:10 GMT
>>    To be fair, Joe Attardi's example wasn't meant to show something
>> "new", but rather to show XML providing something more than a guarantee
[quoted text clipped - 3 lines]
>
> Eh? (again)

   Whether the "syntax is right" and whether the data is hierarchal are two
orthogonal concepts, IMHO. I should have said "in addition to guarantee of
correct syntax" instead of just "in addition to syntax".

> Are you really claiming that you cannot syntactically represent
> hierarchical data?

   No.

> Please explain how context-free grammars represent
> arithmetic expressions if hierarchy isn't syntax.

   Isn't syntax simply the list of allowable keywords and their parameters?
I don't think syntax in itself is sufficient to represent hierarchy. You
need something like grammatical rules that can reference each other.

E.g., this, syntax, is not enough:

'(', ')', '+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'

You also need this, a grammar:

EXP -> INT | INT OP INT | '(' EXP ')'
INT -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
OP -> '+' | '-'

   - Oliver
Steve Wampler - 10 Apr 2006 22:17 GMT
>    Isn't syntax simply the list of allowable keywords and their
> parameters? I don't think syntax in itself is sufficient to represent
[quoted text clipped - 10 lines]
> INT -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
> OP -> '+' | '-'

No.  Syntax *is* grammar.  You're mixing lexics and syntax.  Semantics
is the meaning attached to a syntax.  (Lexics is one aspect of syntax,
corresponding to the leaf nodes in the grammer.)
Timbo - 05 Apr 2006 16:55 GMT
> I haven't seen anything in XML
> that does more than provide a guarantee that the syntax is right.

Ok, so say you are writing an application that deploys an agent to
find you the best prices for CDs on the web. If you share the same
ontological definition of CD attributes, you could have the
following album embedded in a webpage:

<Album>
  <Artist> Stevie Wonder </Artist>
  <Title> Innervisions </Title>
  <Producer> .. </Producer>
  <Track number=1 name=".."/>
  <Track number=2 name=".."/>
  ... etc..
  <Price> £5</Price>
</Album>

Compare that to the text:

Stevie Wonder, Innervisions, 1: ..., 2: ..., £5

You can see that clearly, any online CD store that follows the XML
definition in the first one (which could be defined in a schema)
would be easier to browse than one that has free text, especially
if some CDs have data that others don't, such as accompanying
musicians. You could find the grammar for the free text, write a
parser for it (or download one), and interpret the parsed data,
but simply sharing the set of definitions is more straightforward.
Steve Wampler - 05 Apr 2006 18:01 GMT
>> I haven't seen anything in XML
>> that does more than provide a guarantee that the syntax is right.
[quoted text clipped - 25 lines]
> one), and interpret the parsed data, but simply sharing the set of
> definitions is more straightforward.

Hmmm, I, as a human, find the second form *much* easier to browse.  I can pick
out the actual content *much* faster.  Granted, I might prefer something like:

   Steve Wonder: Innervisions ($9.25)
        1: ....
        2: ....
        3: ....

but that would depend on whether I'm more interested in the artist and album or
the details of the album content.  (Great price, by the way!)

Of course, you're talking about computer handling of the data, where your points
are more valid.  That's *still* syntax though.
Oliver Wong - 05 Apr 2006 19:19 GMT
>>> I haven't seen anything in XML
>>> that does more than provide a guarantee that the syntax is right.
[quoted text clipped - 43 lines]
> points
> are more valid.  That's *still* syntax though.

   I find Timo's XML version as easy to read as Timbo's CSV version.
However, I do find Steve's "custom" version easier to read over the other
two, as a human.

   However, another nice thing about XML over the other two formats is that
there is a standardize escaping mechanism. Artists are... well...
artistic... and they sometimes do crazy things. In CSV, or the custom
format, how do you distinguish being an album whose name is the empty
string, and an album whose name is the single space character? What if the
album contains a colon in it? What if the artist name contains a colon in
it? What if the album name contains an open-parenthesis and dollar sign in
it, but no close-parenthesis? Etc.

   As purely digital music becomes more popular (e.g. songs existing only
as OGG or MP3 files, and no physical albums, so no cover art nescessary),
you could have tech-savy artists define the names of their tracks to be the
newline character for some specific platform, for example. Maybe I'll go
write a song right now whose name is the value of the Java literal String
expression "\u0000\r\n\u0008\r\n\n". For clarity, the name of my song is 7
characters long, and is not intended to be pronounced (there will be no
lyrics in the song).

   With XML, it's possible to express unambiguously any possible string of
characters (using, e.g., entity-references). With CSV or the custom format,
you'd have to invent an escaping-system, and then I, as a human, would have
to learn about your escaping system to either be able to read the data
myself, or to implement a program which can parse the data.

   - Oliver
Roedy Green - 05 Apr 2006 22:36 GMT
>    With XML, it's possible to express unambiguously any possible string of
>characters (using, e.g., entity-references).

You have made a much better case for binary strings that don't need
fancy XML escaping than you have for XML.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Oliver Wong - 05 Apr 2006 23:26 GMT
>>    With XML, it's possible to express unambiguously any possible string
>> of
>>characters (using, e.g., entity-references).
>
> You have made a much better case for binary strings that don't need
> fancy XML escaping than you have for XML.

   The problem with a "straight-to-binary" approach is that you'd have to
use custom tools to process the data. With XML, you can use a generic XML
editor, or worse case, a simple text-editor.

   I don't "mind" ASN.1 so much if only the editors were more readily
available. From my perspective, it's almost the same as using gzip to unzip
a file yielding an XML document, and then using an XML Editor on the
resulting XML document.

   - Oliver
Roedy Green - 06 Apr 2006 02:12 GMT
>    The problem with a "straight-to-binary" approach is that you'd have to
>use custom tools to process the data. With XML, you can use a generic XML
>editor, or worse case, a simple text-editor.

No you don't. You use an ASN schema and a binary parser.  It is just
like XML only compact.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

James McGill - 06 Apr 2006 07:00 GMT
> No you don't. You use an ASN schema and a binary parser.  It is just
> like XML only compact.

Nobody is going to use ASN just for fun.  It's so obviously a product of
some 1980s multi-tiered management bureaucracy, it's not even funny.
Don't get me wrong -- I appreciate the strong typing and hard guarantees
that are possible within the framework.  There are ASN constructs for
things that would be a major pain in any representation (like the stuff
dealing with Sets -- I understand the value in data binding
applications).

But it's not *fun*.  At no level is it easy to work with.  It's
something you use because your boss pays you to work with it, and it's
NOT something you use simply because you enjoy it.  
Chris Uppal - 06 Apr 2006 10:39 GMT
> > No you don't. You use an ASN schema and a binary parser.  It is just
> > like XML only compact.
>
> Nobody is going to use ASN just for fun.  It's so obviously a product of
> some 1980s multi-tiered management bureaucracy, it's not even funny.

Doesn't the same thing apply to XML ?

   -- chris
Oliver Wong - 06 Apr 2006 16:05 GMT
>> Nobody is going to use ASN just for fun.  It's so obviously a product of
>> some 1980s multi-tiered management bureaucracy, it's not even funny.
>
> Doesn't the same thing apply to XML ?

   I use XML "just for fun", in the sense that I've used it in situations
where my boss isn't paying me to use it (including the situations where I'm
my own boss). See many of my postings to this newsgroup for example. I'll
often use "xml-like" syntax to show what's Java code versus what's prose.

   - Oliver
Chris Uppal - 07 Apr 2006 08:52 GMT
>     I use XML "just for fun", in the sense that [...]

And I thought /I/ was strange !

;-)

   -- chris
James McGill - 07 Apr 2006 10:14 GMT
> >     I use XML "just for fun", in the sense that [...]
>
> And I thought /I/ was strange !

Well, my point was that I use XML schema for things like configuring
games, communication between online game clients, the save game format,
the parameters of the model, etc.  Strictly for fun.  I know that ASN.1
(for example) offers some very formal grammars that happen to be
accepted as industry standards; but I am quite certain that it's
anything but a pleasant framework to design with.  But I'm biased, since
pretty much all my messages are a few Kilobytes, and really, no amount
of bloat that results from the markup is going to make enough difference
that it overtakes RPC over HTTP or File IO as the limiting factor.

To be fair, the discussion of ASN.1 started in response to a proposition
to use XML for a degenerate case where it's probably not the appropriate
markup encoding to use.

Also, it's quite likely that when someone's golden hammer fails, he
might be tempted to reinvent the wheel (badly), rather than use a
different hammer for that problem.  And that's why an amateur might need
to be nudged in the direction of another alternative that he might never
have heard about otherwise.  I can respect that.  

Now somebody is going to come out of the woodwork claiming that yacc is
fun.  
Chris Uppal - 07 Apr 2006 11:17 GMT
[me:]
> > And I thought /I/ was strange !
[...]
> Now somebody is going to come out of the woodwork claiming that yacc is
> fun.

Yacc /is/ fun.

(I said I was strange ;-)

   -- chris
Roedy Green - 07 Apr 2006 18:49 GMT
On Fri, 07 Apr 2006 02:14:42 -0700, James McGill
<jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone
who said :

>.  I know that ASN.1
>(for example) offers some very formal grammars that happen to be
>accepted as industry standards; but I am quite certain that it's
>anything but a pleasant framework to design with.

the claim is you don't have to. You can use an XML schema.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

James McGill - 07 Apr 2006 19:30 GMT
> the claim is you don't have to. You can use an XML schema.

I guess the question is, why would you then add another layer of
complexity, if you've already got an XSD that models your data to your
satisfaction?  I realize that if I was working for you, you would insist
on a tightly packed, formalized wire format.  That's cool.  I've had to
do similar things to map between an XML represenation of DNS data, and
the ietf wire format for the records.  I don't think an ASN model would
be any weirder than that.
Chris Uppal - 06 Apr 2006 10:38 GMT
>     However, another nice thing about XML over the other two formats is
> that there is a standardize escaping mechanism. Artists are... well...
> artistic... and they sometimes do crazy things.

All the file formats I can think of have well-defined escape mechanisms (in
CSV, unfortunately, you have a choice of about 10 and it's difficult to be sure
that all parties are agreed on which is in use).  XML has one too.  That's
hardly an advantage for XML (especially when its mechanism is so crappy).

What the world needed, but didn't get, was a well-designed, standardised[*]
escape mechanism which could be used in almost any file format....

([*] if only by convention)

   -- chris
Oliver Wong - 06 Apr 2006 16:10 GMT
>>     However, another nice thing about XML over the other two formats is
>> that there is a standardize escaping mechanism. Artists are... well...
[quoted text clipped - 5 lines]
> sure
> that all parties are agreed on which is in use).

   So to me, this means that CSV does NOT have well-defiend escape
mechanisms. That is, if your requirements are "support an 'export to CSV'
functionality", it wouldn't be unusual to forbid "crazy things" appearing in
your document model (or else just not worrying about it and letti