Java Forum / General / May 2006
XML Not good for Big Files (vs Flat Files)
Homer - 04 Apr 2006 16:27 GMT I am a little bit tired of this obsession people have with XML and XML technology. Please share your thoughts and let me know if I am thinking in a wrong way. I believe some people are over using XML all over the place. Nowadays Canadian Government is pushing XML to its organization as standard for data/file transfer. Huge files moving between companies now include tones of XML Tags repeating all over the file and slowing down networks and crashing applications because of size. I am not objecting to the whole technology. I know advantages of XML and using it all the times for Config files or our web oriented applications but using it as standard for moving big files is going too far. Here is the example:
John,Smith,5555555,37 Finch Ave.
Is now:
<FirstName>John</FirstName> <LastName>Smith</LastName> <PhoneNum>5555555</PhoneNum> <Address>37 Finch Ave.</Address>
And Tags are repeating and repeating:
<FirstName>....</FirstName> <LastName>....</LastName> <PhoneNum>....</PhoneNum> <Address>....</Address>
<FirstName>....</FirstName> <LastName>....</LastName> <PhoneNum>....</PhoneNum> <Address>....</Address>
Please let me know what you think.
Regards,
Homer
James McGill - 04 Apr 2006 16:50 GMT > And Tags are repeating and repeating: XML markup does tend to bloat the data.
I personally believe you should use serializable objects that can be represented according to an XML schema when that's appropriate, but that also can be serialized into a tightly packed format when that is appropriate as well. So I should be able to marshal/unmarshal the serialized object to and from XML, but I should also be able to stream that object without marshalling it -- and the other end should be able to unmarshal to xml, validate according to the schema, etc.
Likewise, database bindings should be informed by the xml schema, but the XML markup shouldn't be what you store in the db.
mtp - 04 Apr 2006 17:01 GMT > I am a little bit tired of this obsession people have with XML and XML > technology. Please share your thoughts and let me know if I am thinking [quoted text clipped - 3 lines] > now include tones of XML Tags repeating all over the file and slowing > down networks and crashing applications because of size. you can use indexing, binary XML, or compression
> I am not objecting to the whole technology. I know advantages of XML > and using it all the times for Config files or our web oriented [quoted text clipped - 23 lines] > > Please let me know what you think. may be one of the computing service wanted more money for his service with this big project ?
may be everybody think "newer is better" ?
cherukan@gmail.com - 04 Apr 2006 17:06 GMT > I am a little bit tired of this obsession people have with XML and XML > technology. Please share your thoughts and let me know if I am thinking [quoted text clipped - 34 lines] > > Homer Yes that does seem like a network killer. It depends on what the intended use of the file is, on the other end and the client receiving it, if they *have to* use XML, certain optimizations can be done for just the transfer part...
<header> <firstName>A15</firstName> <lastName>A15</lastName> <phone>A10</phone> <address>A10</address> </header> <data> [[CDATA <!-- fixed width data goes here --> ]] </data>
OR
<header> <fieldSeparator>;</fieldSeparator> <field>firstName</field> <field>lastName</field> <field>phone</field> <field>address</field> </header> <data> [[CDATA <!-- delimited data goes here --> ]] </data>
OR a combination of the above.
In short, XML should be preferred only if documentation and discoverability are more important than performance.
James McGill - 04 Apr 2006 17:19 GMT > OR a combination of the above. You're almost touching on the big problem: Misconception of what it means to be "standard".
XML has (several) standardized markup frameworks, but it is silent as to content or utilization. It is ridiculous for a government entity to demand that "XML" be "the standard" for data interchange. They need to bless certain schemas if that's their goal, but it also needs to be abstract enough that systems can be designed efficiently.
In your examples, the designers can claim that they are using "XML", and therefore "are standardized" on it, but the three examples we've seen so far are not at all interchangeable...
RC - 04 Apr 2006 17:11 GMT > Please let me know what you think. XML is never designed to replace database server.
You can use XML file transfer portion of data from a database. i.e.
SELECT lastname,fistname,phonenumber,address FROM phonebook WHERE state = 'NY' AND city = 'somewhere';
A flat file like this
William|John|12345678|84 5th Ave
I don't know which column is last name, first name. 3rd column is person ID or phone number?
You need let the programmers know what column is what.
Next time if some one change flat file format to
85 5th Ave|John|William|12345678
Then your database will incorrect after updated.
True XML creates large file size. But it makes our life easier.
You can make up your own tags <lastName> or <Last_Name>, etc. the tags can be in English, Spanish, French, Russian, Japanese, etc.
Alex Hunsley - 05 Apr 2006 00:04 GMT >> Please let me know what you think. > [quoted text clipped - 14 lines] > I don't know which column is last name, first name. > 3rd column is person ID or phone number? That's what a header field would be for.
> You need let the programmers know what column is what. > [quoted text clipped - 3 lines] > > Then your database will incorrect after updated. Presumably the header field will reflect the change. Yeah, it's an extra thing to go wrong, admittedly...
> True XML creates large file size. > But it makes our life easier. > > You can make up your own tags > <lastName> or <Last_Name>, etc. > the tags can be in English, Spanish, French, Russian, Japanese, etc. Monique Y. Mudama - 05 Apr 2006 05:24 GMT > Presumably the header field will reflect the change. Yeah, it's an > extra thing to go wrong, admittedly... Yeah ... the markup format is nice if partial data is considered better than no data at all ...
 Signature monique
Ask smart questions, get good answers: http://www.catb.org/~esr/faqs/smart-questions.html
Timbo - 04 Apr 2006 17:39 GMT > John,Smith,5555555,37 Finch Ave. > [quoted text clipped - 4 lines] > <PhoneNum>5555555</PhoneNum> > <Address>37 Finch Ave.</Address> It's true that the XML data in your example is bulky, but what it has that the unstructured doesn't have is meta-level information, such as "John" the first name of someone. If the parties involved (ie. that sender and receiver of this information) have an agreement as to the meaning of "FirstName", then they are sharing more than just text... it has some implicit meaning. If you send it unstructured, then the receiver has to know how to parse the data into this agreed meaning, which means it needs to know the format of the data.
Then, on the other hand, if the data is just stored in a database or something with no definition of the what the tags mean, then I agree with you... using XML is of little use.
Homer - 04 Apr 2006 19:08 GMT I guess these responses are proving of my point. You know all that the best solution for transferring huge files between two parties is simple flat file that both sender/receiver have agreed upon file format and using secure line. But you still defend adding tons of tags to a file that both sender/receiver are familiar with the format. I believe lots of people are using XML because it's cool and new. And these people give advise to companies and organizations.
Some points about your suggestions:
1- Marshalling/Object Stream: Too Advance for places like government. 2- Have Mixed XML/Raw Data: Then what is the point of having XML at the top? Unless you are sending the file to an unknown place that doesn't know what is getting. 3- Compression: There is no good standard for compression (Unix is not really ZIP friendly unless you add some opensource or buy Zip product) and Mainframe is another story. Even for Windows you need to buy the product (or use open source that most companies don't like). Also why make file size triple and then compress it?
Let me give you another example of coolness (sorry, it's a bit off the topic but it's about coolness):
I got a job in telecommunication company (cell phone) to convert their code from C to C++ because OO was so cool those days but application was working with no problem. I did my job, converted the code/building class library for one year, and left the company.
One year later they hired bunch of other people to come and convert the whole thing to Java because Java was the Best.
3 years later they hired me again to convert everything again to J2EE because J2EE is (guess what) the Best.
Regards,
Homer
James McGill - 04 Apr 2006 19:32 GMT > I believe lots > of people are using XML because it's cool and new. It's anything but "cool". And as for it being "new", XML isn't old enough to vote, but SGML is. If you aren't seeing the benefits of logical structure and validation, standardized processing, etc., that may be because you aren't exploiting those things in your application.
One of your complaints is directly counter to an explicit design goal, from the beginning of the XML spec: "Terseness in XML markup is of minimal importance."
XML markup is deliberately intended to favor clarity to conciseness.
But most of your complaint seems to derive from the fact that you work in a bureaucratic government situation, where you have no authority to make decisions, and where there is a limited backchannel for your recommendations. That is unfortunate, but isn't it a choice you made when you went to work for a government?
I've always been led to believe that the Canadian government is a prototype of efficiency and reason, one that should make Americans feel ashamed. Are you suggesting that it too may be clogged with bureaucratic nonsense? I would be shocked to hear that!
Homer - 04 Apr 2006 20:06 GMT Very good guess but no, I don't work for government. All I am saying is in these cases sender and receiver both knows the file format by heart. They know and their application knows. That's how they were moving files in past and if they want to establish a new file transfer they will let each other know about upcoming file format for sure. There is no reason to send the file format along with each file every time they have a file transfer (unless you are wearing name tag in your home so your family know your name).
James McGill - 04 Apr 2006 20:25 GMT > All I am saying > is in these cases sender and receiver both knows the file format by > heart. They know and their application knows. The interesting thing with XML is that in its case, the *document* knows. In a well designed system, the DTD can change and applications can cope.
>There is no reason to send the file format along with each file every >time they have a file transfer But you aren't sending the file format. You're sending a notice with a URI that locatest the format (schema, dtd, etc.), and then sending data that's marked up according to that format.
>(unless you are wearing name tag in your >home so your family know your name). Or like wearing a badge at a workplace, perhaps?
Jon Martin Solaas - 05 Apr 2006 07:25 GMT > Very good guess but no, I don't work for government. All I am saying > is in these cases sender and receiver both knows the file format by [quoted text clipped - 4 lines] > time they have a file transfer (unless you are wearing name tag in your > home so your family know your name). Ofcourse, but in other cases, when the file-format has to be communicated, nobody knows it by heart, the data need to be hierarchical, the receiver need to validate and perhaps transform to another format, and not to mention implementing the apps to do so, xml is useful. When a new fileformat is to be used, xsd comes in handy, and also allows for automatic validation. In many orgranisations misunderstandings occur, bugs are made and so on, so validation is nice.
XML was cool when I was a student 10 years ago. Now it's just convenient.
Maybe you should get more out. It's the people outside that doesn't know your name :-)
Martin Gregorie - 04 Apr 2006 20:45 GMT > I guess these responses are proving of my point. You know all that the > best solution for transferring huge files between two parties is simple [quoted text clipped - 3 lines] > of people are using XML because it's cool and new. And these people > give advise to companies and organizations. Here's another thought: use ASN.1 encoding. Have a look here <http://asn1.elibel.tm.fr/> if you haven't heard of it.
It does virtually everything XML does in terms of tagged fields and the ability to completely omit optional fields and structures, but it uses binary tags and can encapsulate binary data. Like XML you can take a data description (written in BNF notation) and use it to generate file encoders and decoders, or you can write fast interpretive decoders (as I have). Its a standard in the telecoms industry, where its routinely used to transfer multi-megabyte files as well as individual short messages.
Java ASN.1 schema compilers are available.
Translating a file between ASN.1 and XML should be a doddle: the site I mentioned has a tool for doing just that.
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
Roedy Green - 04 Apr 2006 22:47 GMT On Tue, 04 Apr 2006 20:45:14 +0100, Martin Gregorie <martin@see.sig.for.address> wrote, quoted or indirectly quoted someone who said :
>Translating a file between ASN.1 and XML should be a doddle what part of the world does "doddle" derive from? It just means "easy"?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Monique Y. Mudama - 05 Apr 2006 05:28 GMT > On Tue, 04 Apr 2006 20:45:14 +0100, Martin Gregorie ><martin@see.sig.for.address> wrote, quoted or indirectly quoted [quoted text clipped - 4 lines] > what part of the world does "doddle" derive from? It just means > "easy"? I had a mental image of a toddler, er, toddling along. No idea if that's actually what was meant. In the context of my brain, it meant "so easy a toddler could do it."
 Signature monique
Ask smart questions, get good answers: http://www.catb.org/~esr/faqs/smart-questions.html
Chris Uppal - 05 Apr 2006 11:49 GMT > > what part of the world does "doddle" derive from? It just means > > "easy"? > > I had a mental image of a toddler, er, toddling along. No idea if > that's actually what was meant. In the context of my brain, it meant > "so easy a toddler could do it." The word's common in British English. I don't know about other dialects/flavours.
The word "doddle" does derive from "toddle", according to the OED, where "toddle" means the halting walk of an infant or elderly/infirm person. A doddle, however, is just something that is easy -- as the OED puts it: "a 'walk-over'".
-- chris
Chris Uppal - 05 Apr 2006 11:45 GMT > Here's another thought: use ASN.1 encoding. Have a look here > <http://asn1.elibel.tm.fr/> if you haven't heard of it. I can't understand why something as simple as data exchange (not /information/ exchange which is vastly more difficult) should require nine standards documents which between them add up to book length. Nor why it should require a book written about it. Why do people have to make things so /complicated/ ?
XML is, if anything, even worse.
Even YAML is way too complicated, albeit not in the same league as ASN.1 or XML.
-- chris
Joe Attardi - 04 Apr 2006 20:56 GMT > I believe lots of people are using XML because it's cool and new. And these people > give advise to companies and organizations. XML isn't new. It's been around almost ten years. The first working draft for the XML spec was put together in November of 1996.
> 3- Compression: There is no good standard for compression (Unix is not > really ZIP friendly unless you add some opensource or buy Zip product) Gzip? In fact IIRC, the gzip algorithm takes advantage of strings that are repeated over and over (like the tag names) that help with its compression.
> (or use open source that most companies don't like). That most companies don't like? I don't think you researched this much before making this statement. Look how many of the huge players (Sun, IBM, etc.) have strong support for open source. In addition, open source is being adopted all over the place.
> Let me give you another example of coolness (sorry, it's a bit off > the topic but it's about coolness): It's not just because XML is "the cool thing". It's perfectly suited for the exchange of data like this. The data describes itself!
Monique Y. Mudama - 04 Apr 2006 21:11 GMT > I guess these responses are proving of my point. You know all that > the best solution for transferring huge files between two parties is > simple flat file that both sender/receiver have agreed upon file > format and using secure line. But you still defend adding tons of > tags to a file that both sender/receiver are familiar with the > format. I guess that you are wrong. I guess that the word "best" is meaningless unless it is qualified by something. If you want a format that is best at clarity, then flat files lose. I guess that you don't really understand when to use XML, and that it doesn't really matter because you don't have the authority to change things in the environment in which it's causing you trouble, so you've developed a grudge against XML rather than against whoever decided to use it inappropriately or whoever decided to create an excessively verbose schema.
> I believe lots of people are using XML because it's cool and > new. And these people give advise to companies and organizations. XML isn't new enough to offer the glamour factor you think it has.
 Signature monique
Ask smart questions, get good answers: http://www.catb.org/~esr/faqs/smart-questions.html
Chris Uppal - 05 Apr 2006 11:45 GMT > XML isn't new enough to offer the glamour factor you think it has. Remember that we are talking about a government here. Being only a decade behind the times is damned impressive !
-- chris
Monique Y. Mudama - 05 Apr 2006 14:57 GMT >> XML isn't new enough to offer the glamour factor you think it has. > > Remember that we are talking about a government here. Being only a > decade behind the times is damned impressive ! Now, now. In 1999 I worked on a US govt project (I think it was DoD, or maybe DISA) to create an XML repository to share across govt branches.
I also spent 1998 through erm, a a couple of years ago working on Java systems for some defense related stuff. I think when we started we were using 1.1.7, and it did take a looooong time to convince the customer to upgrade, but after that it wasn't too hard to keep moving. I remember getting bitten by glob imports + that new List class, engendering a hatred of glob imports that continues to this day.
Some govt customers are very into new technology (almost to the point of silliness -- they want to reimplement in the new stuff even if there's no direct benefit and resources would be better spent improving the rest of the app).
 Signature monique
Ask smart questions, get good answers: http://www.catb.org/~esr/faqs/smart-questions.html
James McGill - 05 Apr 2006 19:44 GMT > Remember that we are talking about a government here. The Canadian government, which I've been led to understand is the most progressive on Earth, etc.
Roedy Green - 05 Apr 2006 22:17 GMT On Wed, 05 Apr 2006 11:44:13 -0700, James McGill <jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone who said :
>The Canadian government, which I've been led to understand is the most >progressive on Earth, etc. A government has with a smaller population to serve has a huge advantage when it comes to being light on its feet. I worked for a Canadian crown corporation writing an RFP for about a million dollars worth of computer equipment. I was in Seattle for a New Year's eve party and met a guy doing something similar there. We both bitched about all the silly regulations and petty legalities. We decided to swap RFPs to see who had it worse. His was ten times thicker.
The thing that blows my mind about the US bureacracy is that crooks have managed to embezzle trillions of dollars over the last decade and hardly anyone even knows about it. See http://mindprod.com/politics/iraqeconomics.html near the bottom. Mastermind crooks pulled off the heist of the century and it did not even make the front page.
The amount of activity and the amounts of money or so huge that nobody stays on top of what is going on. Further the amounts of money are so huge that corruption and coverup are guaranteed.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
James McGill - 05 Apr 2006 22:48 GMT > The thing that blows my mind about the US bureacracy is that crooks > have managed to embezzle trillions of dollars over the last decade and > hardly anyone even knows about it. Controversial opinion, informed by partisan bias, and not one that I necessarily disagree with. Take it to alt.politics (where I read your posts and often correspond).
So, what's the ASN.1 equivalent of JAXB?
Roedy Green - 06 Apr 2006 01:53 GMT On Wed, 05 Apr 2006 14:48:20 -0700, James McGill <jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone who said :
>So, what's the ASN.1 equivalent of JAXB? since XML and ASN.1 are interconvertible, if you have something that needs XML, you fluff and use it.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Timbo - 05 Apr 2006 15:03 GMT > I guess these responses are proving of my point. You know all that the > best solution for transferring huge files between two parties is simple > flat file that both sender/receiver have agreed upon file format and > using secure line. But you still defend adding tons of tags to a file > that both sender/receiver are familiar with the format. My guess is that you don't really understand either my post, or XML. It's not the FORMAT of XML, it's the fact that it contains MEANING. So, if the sender and receiver have a shared ontology that says that FirstName is someone's first name, then the data <FirstName>John<FirstName> is more than just some text with the value "John"... it is saying that "John" is his first name. So rather than just having raw data, you have information that is useful to the receiver. Moreso, for a third-party to use this information, you need only to give them the shared definitions, rather them give them the format and the meaning.
Chris Uppal - 05 Apr 2006 16:06 GMT > My guess is that you don't really understand either my post, or > XML. It's not the FORMAT of XML, it's the fact that it contains > MEANING. But it doesn't. The meaning comes from the /interpretation/ of the data, not from its transmission form. The parties sharing data must come to an agreement about the meaning before they can share information. Once they have done that, deciding on a shared format is pretty trivial whether they use XML, ASN.1, YAML, CSV, or a custom format.
-- chris
Oliver Wong - 05 Apr 2006 17:00 GMT >> My guess is that you don't really understand either my post, or >> XML. It's not the FORMAT of XML, it's the fact that it contains [quoted text clipped - 8 lines] > deciding on a shared format is pretty trivial whether they use XML, ASN.1, > YAML, CSV, or a custom format. [In this post, I will group "XML", "ASN.1", "YAML", and "CSV with headers" all under a single group which I will call "XML"; basically, this "XML" group means data with metadata tags. As for "CSV without headers" and "custom format", I'm going to group them together as "typical binary file".]
I'd say it's somewhere in between Timbos and Chris' claims [with the distortion of Chris' claim as described above]. If you plonked a typical "binary" file onto my desktop (e.g. perhaps ripping a random file from a Playstation DVD), and told me to try to interpret it, I could get out my hex editor, and look around for human-readable strings, and from there maybe look for end-of-string markers, or some sort of length-of-string headers, and then from there try to figure out markers for other datatypes, but I'd probably wouldn't get very far.
Give me a typical XML file though, and I could probably come up with an interpretation that is near the original, depending on how the elements and attributes are named. If they file contains a reference to a DTD or XSD, then I could navigate over to that URL and gain even more information.
- Oliver
Chris Uppal - 06 Apr 2006 10:30 GMT > Give me a typical XML file though, and I could probably come up with > an interpretation that is near the original, depending on how the > elements and attributes are named. Difficult to see how this is an advantage for production purposes.
> If they file contains a reference to a > DTD or XSD, then I could navigate over to that URL and gain even more > information. Now that is a real advantage. Note that the XML is not "self-describing", but it's certainly a good attribute of the format that it can include a link to a description.
-- chris
Mark Thornton - 06 Apr 2006 17:18 GMT >> Give me a typical XML file though, and I could probably come up with >>an interpretation that is near the original, depending on how the >>elements and attributes are named. > > Difficult to see how this is an advantage for production purposes. Some data suppliers change their format very regularly. Using XML gives fewer surprises of this kind and it is then easier to guess the meaning of a change and easier to ignore irrelevant changes.
I get geographic mapping information in a variety of formats. Although bulky, the XML based data is the easiest to use. The bulk is usually dealt with by compression, which in the case of gzip is trivial to handle in Java.
Mark Thornton
Timbo - 05 Apr 2006 17:01 GMT >>My guess is that you don't really understand either my post, or >>XML. It's not the FORMAT of XML, it's the fact that it contains [quoted text clipped - 3 lines] > from its transmission form. The parties sharing data must come to an agreement > about the meaning before they can share information. ??? Which was exactly what I said in the sentence after the one you quoted! :-) In hindsight, MEANING wasn't the correct word... and I'm not sure of what IS the correct word...
> Once they have done that, > deciding on a shared format is pretty trivial whether they use XML, ASN.1, > YAML, CSV, or a custom format. Sure, you can send it in a CSV format, but to keep the meta-data, then it would be: FirstName=John, LastName=Smith, Phone=55555, etc,
where you basically have the tags in the CSV, and you are then facing the same problems as the original poster was complaining about. It's not the syntax of XML that is useful (frankly, I find it tediously difficult to follow when I am forced too), it's the fact that it provides an easy way to store meta-data, and there are lots of nice tools to support this. It's this meta-information that the original poster does not like.
Chris Uppal - 06 Apr 2006 10:35 GMT > > > My guess is that you don't really understand either my post, or > > > XML. It's not the FORMAT of XML, it's the fact that it contains [quoted text clipped - 7 lines] > ??? Which was exactly what I said in the sentence after the one > you quoted! :-) Then you shouldn't have shouted so loud -- my ears were still ringing and I missed the next few words you said ;-)
> In hindsight, MEANING wasn't the correct word... > and I'm not sure of what IS the correct word... I think "formatting" is probably the right word. There's no meaning in the tags -- it might /look/ as if there's meaning, and well-chosen tags certainly help if you are ever in the unfortunate position of having to read or edit XML by hand, but there's nothing real there.
Perhaps I'd accept "mnemonics"...
-- chris
Timbo - 06 Apr 2006 12:11 GMT >>>>My guess is that you don't really understand either my post, or >>>>XML. It's not the FORMAT of XML, it's the fact that it contains [quoted text clipped - 10 lines] > Then you shouldn't have shouted so loud -- my ears were still ringing and I > missed the next few words you said ;-) I wanted emphasise those two words, and many people still use text-based newsreaders, so I don't use italics :)
>>In hindsight, MEANING wasn't the correct word... >>and I'm not sure of what IS the correct word... [quoted text clipped - 3 lines] > help if you are ever in the unfortunate position of having to read or edit XML > by hand, but there's nothing real there. Ah, ok... we have actually got our shared definitions crossed :-)
"Formatting" is definately not the word I want. I think "meaning" is the correct word, but "contains" is misleading. When I say that using XML format "contains" meaning, I mean that it "has a" meaning, not that the meaning is self-evident from the tags. That is, the XML that is passed has a meaning that can be interpreted by the receiver, if it shares the same definitions as the sender.
In ontological teams, "John, Smith, 555,.." is just a list of instances of concepts, with no relation to their concepts. This makes their meaning, at worst, impossible to derive, at best, ambiguous. Whereas, <Person> ... <Person> is an instance of a concept, but tagging it with its concept Person allows the receiver to derive meaning and reason about this information.
How this information is formated is not really relevant, as long as the "is-a" relations (and others) are present.
Stefan Ram - 06 Apr 2006 14:15 GMT >How this information is formated is not really relevant, as >long as the "is-a" relations (and others) are present. When a new document type is to be defined, when should one choose child elements and when attributes?
The criterion that makes sense regarding the meaning can not be used in XML due to syntactic restrictions.
An element is describing something. A description is an assertion. An assertion might contain unary predicates or binary relations.
Comparing this structure of assertions with the structure of XML, it seems to be natural to represent unary predicates with types and binary relations with attributes.
Say, "x" is a rose and belongs to Jack. The assertion is:
rose( x ) ^ owner( x, "Jack" )
This is written in XML as:
<rose owner="Jack" />
Thus, my answer would be: use element types for unary predicates and attributes for binary relations.
Unfortunately, in XML, this is not always possible, because in XML:
- there might be at most one type per element,
- there might be at most one attribute value per attribute name, and
- attribute values are not allowed to be structured in XML.
Therefore, the designers of XML document types are forced to abuse element /types/, to describe the /relation/ of an element to its parent element.
This /is/ an abuse, because the designation "element type" obviously is supposed to give the /type of an element/, i.e., a property which is intrinsic to the element alone and has nothing to do with its relation to other elements.
The document type designers, however, are being forced to commit this abuse, to reinvent poorly the missing structured attribute values using the means of XML. If a rose has two owners, the following element is not allowed in XML: <rose owner="Jack" owner="Jill" /> One is made to use representations such as the following:
<rose> <owner>Jack</owner> <owner>Jill</owner></rose>
Here the notion "element type" suggests that it is marked that Jack is "an owner", in the sense that "owner" is supposed to be the type (the kind) of Jack. The intention of the author, however, is that "owner" is supposed to give the /relation/ to the containing element "rose". This is the natural field of application for attributes, as the meaning of the word "attribute" outside of XML makes clear, but it is not possible to use them for this purpose in XML.
An alternative solution might be the following notation.
<rose owner="Alexander Marie" />
Here a /new/ mini language (not XML anymore) is used within an attribute value, which, of course, can not be checked anymore by XML validators. This is really done so, for example, in XHTML, where classes are written this way.
So in its main language XHTML, the W3C has to abandon XML even to write class attributes. This is not such a good accomplishment given that the W3C was able to use the experience made with SGML and HTML when designing XML and that XHTML is one of the most prominent XML applications.
The needless restrictions of XML inhibit the meaningful use of syntax. This makes many document type designers wondering, when attributes and when elements are supposed to be used, which actually is an evidence of incapacity for the design of XML, that does not have many more notations than attributes and elements. And now the W3C failed to give even these two notations a clear and meaningful dedication!
Without the restrictions described, XML alone would have nearly the expressive power of RDF/XML, which has to repair painfully some of the errors made in the XML-design.
Now, some recommend to /always/ use subelements, because one can never know, whether an attribute value that seems to be unstructured today might need to become structured tomorrow. (Or it is recommended to use attributes only when one is quite confident that they never will need to be structured.) Now, this recommendation does not even try to make a sense out of attributes, but just explains how to circumvent the obstacles the W3C has built into XML. Others recommend to use attributes for something they call "metadata".
Others use an XML editor that happens to make the input of attributes more comfortable than the input of elements and seriously suggest, therefore, to use as many attributes as possible.
Still others have studied how to use CSS to format XML documents and are using this to give recommendations about when to use attributes and when to use subelements.
Of course: Mixing all these criteria (structured vs. unstructured, data vs. "metadata", by CSS, by the ease of editing, ...) often will give conflicting recommendations.
Other notations than XML have solved the problem by either omitting attributes altogether or by allowing structured attributes. I believe that notations with structured attributes, which also allow multiple element types and multiple attribute values for the same attribute name, are helpful.
Oliver Wong - 06 Apr 2006 17:29 GMT > Say, "x" is a rose and belongs to Jack. The assertion is: > [quoted text clipped - 3 lines] > > <rose owner="Jack" /> [...]
> If a rose has two > owners, the following element is not allowed in XML: [quoted text clipped - 17 lines] > XML makes clear, but it is not possible to use them for this > purpose in XML. How about something like:
<rose id="x" ownedBy="Jack"/> <rose id="x" ownedBy="Jill"/>
or
<ownership owned="rose" owner="Jack"/> <ownership owned="rose" owner="Jill"/>
or
<Person id="Jack"> <belongings> <rose id="x"/> <!--Possibly other stuff--> </belongings> </Person> <Person id="Jill"> <belongings> <rose id="x"/> <!--Possibly other stuff--> </belongings> </Person>
depending on what exactly is the main message being conveyed (i.e. the XML different documents here all say the same thing, but they put emphasis on different things: the roses, the persons, or the ownership-relationships themselves).
- Oliver
Stefan Ram - 07 Apr 2006 05:14 GMT >><rose owner="Jack" owner="Jill" /> ><rose id="x" ownedBy="Jack"/> ><rose id="x" ownedBy="Jill"/> While your suggestion might be possible for Prolog-like databases of assertions, it might be difficult to apply it to text markup, where one actually would like to write:
<p>He met <span class="name" class="person">Peter Miller</span> in <span class="name" class="town">London</span>.</p>
It could be written in XML as:
<p>He met <span id="563">Peter Miller</span> in <span id="564">London</span>.</p> <attribute idref="563" class="name"/> <attribute idref="563" class="person"/> <attribute idref="564" class="name"/> <attribute idref="564" class="town"/>
But this looks as if it might be more difficult to maintain.
NB: If "id" was declared as an »ID attribute« in the DTD, then
><rose id="x" ownedBy="Jack"/> ><rose id="x" ownedBy="Jill"/> might not be valid XML, because in XML »ID values must uniquely identify the elements which bear them« is a validity constraint. But here, »id« might be declared as an »IDREF attribute«.
>depending on what exactly is the main message being conveyed >(i.e. the XML different documents here all say the same thing, >but they put emphasis on different things: the roses, the >persons, or the ownership-relationships themselves). ... and some of these choices then will be restricted by the restrictions of XML. For example, when one wants to put emphasis on the roses by mapping each rose to an XML element, some of the restrictions mentioned in my previous post apply.
Oliver Wong - 07 Apr 2006 15:08 GMT > NB: If "id" was declared as an »ID attribute« in the DTD, then > [quoted text clipped - 5 lines] > constraint. But here, »id« might be declared as an »IDREF > attribute«. Right, sorry.
>>depending on what exactly is the main message being conveyed >>(i.e. the XML different documents here all say the same thing, [quoted text clipped - 5 lines] > emphasis on the roses by mapping each rose to an XML element, > some of the restrictions mentioned in my previous post apply. You could "declare" a rose "x", and then start describing it, e.g.
<rose id="x"/> <roseOwnership idref="x" owner="Jack"/> <roseOwnership idref="x" owner="Jill"/>
You seem not to like having information implied via parent-child relationship, but I didn't quite understand why. I suspect the rose-emphasized XML would more likely traditionally be written as something like
<rose> <owners> <person idref="Jack"/> <person idref="Jill"/> </owners> <!-- perhaps other elements describing the rose here --> </rose>
- Oliver
Stefan Ram - 07 Apr 2006 20:16 GMT >You seem not to like having information implied via >parent-child relationship, but I didn't quite understand why. I have no problem with the parent-child relationship, but with the (ab)use of the /type/ of the child to name the /relation/ to its parent (instead of the type of the child as the designation »type« implies). Using the /type/ to name the /relation/ contradicts its designation »type«.
>I suspect the rose-emphasized XML would more likely >traditionally be written as something like [quoted text clipped - 5 lines] > <!-- perhaps other elements describing the rose here --> ></rose> Possibly I can clarify my intentions by using another language with structured attributes. In my language »Unotal« one can write:
< &rose owner=< &person Jack > owner=< &person Jill >>
Here, »owner« can be recognized as the name of a /binary/ relation by the following »=«, while »rose« can be recognized as the name of a /unary/ relation (like a type) by the preceding »&«. In Unotal, this is always so, so it is easier to read.
In XML, element types are sometimes used for /unary/ relations (sometimes for real types, as the name implies), but sometimes (ab)used for /binary/ relations (to specify the parent-child relationship). So when reading a child element type in XML, one does not know, whether it gives the type of this element or names the relationship to its parent.
~~~
I am working on a implementation of a reader and writer for Unotal in Java, and have a small application that uses this to implement Unotal as its file storage format in Java:
http://www.purl.org/stefan_ram/pub/joodo The Java source code for the Unotal implementation will be released later, but a description of Unotal is available at:
http://www.purl.org/stefan_ram/pub/unotal_en
This page also contains the Unotal syntax specification, which is written in Unotal itself and then was automatically translated to HTML and ASCII from there.
Roedy Green - 05 Apr 2006 22:24 GMT >My guess is that you don't really understand either my post, or >XML. It's not the FORMAT of XML, it's the fact that it contains >MEANING. So, if the sender and receiver have a shared ontology >that says that FirstName is someone's first name, then the data ><FirstName>John<FirstName> i Evan a csv file with a first line using field names contains the same amount of information for a file like the one shown as the obese XML.
What the raw XML provides is not particularly useful information. You can glean that by inspecting the file.Information you want which is missing is how validated are each of the fields. What guarantees exist on values, what are the complete set of possibilities of each enumeration and what do they mean. Since the early DOS days I have been exporting data to people in several formats, SQL, CSV, and fixed length ascii fields. I generate a separate human-readable "schema" file that describes the field, including limits and its length and offset.
No body has ever had trouble interpreting one of the files.
for a FLAT file there is no need to use tags. That is only when you have a structrured file.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Timbo - 06 Apr 2006 08:48 GMT > for a FLAT file there is no need to use tags. That is only when you > have a structrured file. Yes, sure. For tables etc, XML is of little value. I absolutely agree, and I would use something like CSV for that.
Oliver Wong - 04 Apr 2006 17:44 GMT >I am a little bit tired of this obsession people have with XML and XML > technology. Please share your thoughts and let me know if I am thinking [quoted text clipped - 30 lines] > > Please let me know what you think. If your complaint is file size during network transfer, compress the file before sending it.
If your complaint is file size during parsing, use SAX instead of DOM, and don't keep the whole file in memory at once.
Use the right tool for the job. If for whatever problem you're trying to solve, you've got a better tool than XML, then use it. But if the problem is "The government requires me to use XML", then I can't think of a better tool than XML to solve that particular problem (except maybe emmigration ;)).
- Oliver
James McGill - 04 Apr 2006 17:56 GMT > except maybe emmigration You say that as though anyone would ever leave the utopian paradise that is Canada...
Lasse Reichstein Nielsen - 04 Apr 2006 17:58 GMT > I am a little bit tired of this obsession people have with XML and XML > technology. Hear hear! Seems some people think XML is the solution to all problems. I'd rather classify it as the lowest common denominator for exchanging tree-structured data - and definitly not something fit for humans to read or write directly.
> John,Smith,5555555,37 Finch Ave. > [quoted text clipped - 6 lines] > > And Tags are repeating and repeating:
> Please let me know what you think. Apart from what everybody else have said, zipping such a file should yield a *very* high compression factor.
/L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Joe Attardi - 04 Apr 2006 18:29 GMT > John,Smith,5555555,37 Finch Ave. > [quoted text clipped - 4 lines] > <PhoneNum>5555555</PhoneNum> > <Address>37 Finch Ave.</Address> Yes but, now we know what all the data means. Your example is quite clear, but what about this one:
Lawrence,David,Maynard,MA
Could mean several things: (1) Lawrence David lives in Maynard, MA. (2) David Lawrence lives in Maynard, MA (3) David Maynard lives in Lawrence, MA (4) Maynard David lives in Lawrence, MA etc. You see where I'm going with this.
Where <FirstName>Lawrence</FirstName> <LastName>David</LastName> <City>Maynard</City> <State>MA</State>
leaves no question.
Yes, we as humans know intuitively that city and state go together. But for an application using this data, there has to be some specification defined and all systems that use it must be aware of it.
Oliver Wong - 04 Apr 2006 22:24 GMT >> John,Smith,5555555,37 Finch Ave. >> [quoted text clipped - 9 lines] > > Lawrence,David,Maynard,MA Ah, obviously a list of 4 arbitrary strings, i.e. (in SQL terms):
CREATE TABLE foo { bar VARCHAR(255) }
INSERT INTO foo VALUES ("Lawrence"),("David"),("Maynard"),("MA").
> Could mean several things: > (1) Lawrence David lives in Maynard, MA. Oops, okay, it's one record. Well, maybe it means.
Lawrence D. Maynard, who has an Masters in Arts. (Or perhaps it uses last name first, i.e. David M. Lawrence, Masters in Arts).
Or maybe (s)he's a Medical Assitant? Or (s)he lives in Madagascar?
> (2) David Lawrence lives in Maynard, MA > (3) David Maynard lives in Lawrence, MA > (4) Maynard David lives in Lawrence, MA > etc. You see where I'm going with this. Hmm, looks like I was way off... Not being an American, I am not familiar with American city names, nor American State abbreviations. If only you had used XML!
- Oliver
Steve Wampler - 04 Apr 2006 22:44 GMT > Hmm, looks like I was way off... Not being an American, I am not > familiar with American city names, nor American State abbreviations. If > only you had used XML! No problem:
<f1>John</f1> <f2>Smith</f2> <f3>5555555</f3> <f4>37 Finch Ave.</f4>
There, that should make people happy :) (Of course, given this group, maybe the tags should be in Klingon...)
Chris Uppal - 05 Apr 2006 11:43 GMT > No problem: > [quoted text clipped - 4 lines] > > There, that should make people happy :) Slightly OT, but I believe that the Best Practise for handling addresses is just have line1, line2, line3 and so on, rather than trying to identify the "meaning" of each line. There is much less consistency across address formats than most programmers (or schema designers) realise. So an XML format like yours might be the best you can (or should) do.
-- chris
Oliver Wong - 05 Apr 2006 15:32 GMT >> Hmm, looks like I was way off... Not being an American, I am not >> familiar with American city names, nor American State abbreviations. If [quoted text clipped - 9 lines] > There, that should make people happy :) > (Of course, given this group, maybe the tags should be in Klingon...) Well, at least with this notation, I wouldn't have made my initial mistake of thinking I was dealing with 4 records which seemed to be arbitrary strings.
Give the tag names, I can see I am dealing with a single record with 4 fields.
So we're making progress here, but perhaps the tag names could have been better chosen.
And if there were an XSD along with this, I could check wether f3 was purely numeric, or if it could contain arbitrary string data as well.
- Oliver
Steve Wampler - 05 Apr 2006 16:13 GMT >>> Hmm, looks like I was way off... Not being an American, I am not >>> familiar with American city names, nor American State abbreviations. If [quoted text clipped - 16 lines] > Give the tag names, I can see I am dealing with a single record with > 4 fields. Really? I wouldn't have thought so. What makes you think 'f' stands for 'field'? Maybe these are four new flavours of Ben&Jerry's ice cream. (Not that I'd buy any of them...)
The point is that the tag names are, ultimately, just strings. We might think we understand what they mean (and can be right a high percentage of the time if the strings are well chosen), but in the end, they mean whatever the code at each end that defines the semantics (not the syntax) to be. That codes *still* has to agree at both ends, just as it does with "John,Smith,5555555,37 Finch Ave.". I haven't seen anything in XML that does more than provide a guarantee that the syntax is right.
Joe Attardi - 05 Apr 2006 16:27 GMT > I haven't seen anything in XML > that does more than provide a guarantee that the syntax is right. Hierarchical data, dude. What if someone has more than one phone number? With the comma-delimited flat file approach, it's not readily apparent how you could implement that.
<Person> <PhoneNumber>...</PhoneNumber> <PhoneNumber>...</PhoneNumber> ... </Person>
we can have as many PhoneNumbers as we want that are associated with a person, and because it's all hierarchical we can just walk up the hierarchy to see who these PhoneNumbers belong to.
Steve Wampler - 05 Apr 2006 16:38 GMT >> I haven't seen anything in XML >> that does more than provide a guarantee that the syntax is right. [quoted text clipped - 12 lines] > person, and because it's all hierarchical we can just walk up the > hierarchy to see who these PhoneNumbers belong to. Eh? That's still syntax. Are you saying all syntax is non-hierarchical?
People have represented hierarchical data in many ways *well before XML*, including, yes, flat files - and it's not that hard. It's still a syntax issue. Heck, even arbitrary graph data (hardly "hierarchical") has many syntactic representations, including flat files.
Look, I *like* XML *for some things*, but wish people would take the time to recognize what it is and want it isn't, please.
Roedy Green - 05 Apr 2006 22:31 GMT >Hierarchical data, dude. What if someone has more than one phone >number? With the comma-delimited flat file approach, it's not readily [quoted text clipped - 3 lines] > <PhoneNumber>...</PhoneNumber> > <PhoneNumber>...</PhoneNumber> You use a comma to represent any field which is not present. You don't just have a list of phone numbers, you assign them specific functions.. You have something like this:
cell home work 800 fax messages emergency
the other way you do it is to have a separate phone numbers file (this is SQL-think). Then you can have an arbitrary number of phone numbers.
the phone number file has the form
account#, phone
If you are exporting data only to import SQL again, this is a much more convenient format than XML hierarchy. SQL does not handle variable numbers of things well directly, so you end up having to write a complicated mess of XML export and import handling code, as well as the process taking 100 times longer than it need do.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Andrew McDonagh - 05 Apr 2006 22:48 GMT >> Hierarchical data, dude. What if someone has more than one phone >> number? With the comma-delimited flat file approach, it's not readily [quoted text clipped - 3 lines] >> <PhoneNumber>...</PhoneNumber> >> <PhoneNumber>...</PhoneNumber> <Pet> <Type>Dog</Type> <CuteName>Spot</CuteName>
> You use a comma to represent any field which is not present. You > don't just have a list of phone numbers, you assign them specific > functions.. You have something like this: One of XML file greatest advantage over CSV, flatfile, etc., is that it supports schema evolution without requiring code changes.
Due to the nature of applications looking for the XML nodes they know about, they ignore all other nodes. So In the Person node example, should we need to add a child node <Pets>, we can without harming the existing app.
Jhair Tocancipa Triana - 08 Apr 2006 14:04 GMT >> I haven't seen anything in XML >> that does more than provide a guarantee that the syntax is right.
> Hierarchical data, dude. What if someone has more than one phone > number? With the comma-delimited flat file approach, it's not readily > apparent how you could implement that.
> <Person> > <PhoneNumber>...</PhoneNumber> > <PhoneNumber>...</PhoneNumber> > ... > </Person>
> we can have as many PhoneNumbers as we want that are associated with a > person, and because it's all hierarchical we can just walk up the > hierarchy to see who these PhoneNumbers belong to. For decades you can achieve the same result in the example you state using two files (one for the persons and other for the phone numbers) and joining its contents (e.g. after loading them to a relational database).
So XML offers nothing new in the scenario you describe...
 Signature --Jhair
Oliver Wong - 10 Apr 2006 18:35 GMT >>> I haven't seen anything in XML >>> that does more than provide a guarantee that the syntax is right. [quoted text clipped - 19 lines] > > So XML offers nothing new in the scenario you describe... To be fair, Joe Attardi's example wasn't meant to show something "new", but rather to show XML providing something more than a guarantee that the syntax is right. In this respect, I think Joe's example is successful (in that it demonstrates hierarchal data in addition to syntax).
- Oliver
Steve Wampler - 10 Apr 2006 21:42 GMT > To be fair, Joe Attardi's example wasn't meant to show something > "new", but rather to show XML providing something more than a guarantee > that the syntax is right. In this respect, I think Joe's example is > successful (in that it demonstrates hierarchal data in addition to syntax). Eh? (again) Are you really claiming that you cannot syntactically represent hierarchical data? Please explain how context-free grammars represent arithmetic expressions if hierarchy isn't syntax.
Oliver Wong - 10 Apr 2006 22:10 GMT >> To be fair, Joe Attardi's example wasn't meant to show something >> "new", but rather to show XML providing something more than a guarantee [quoted text clipped - 3 lines] > > Eh? (again) Whether the "syntax is right" and whether the data is hierarchal are two orthogonal concepts, IMHO. I should have said "in addition to guarantee of correct syntax" instead of just "in addition to syntax".
> Are you really claiming that you cannot syntactically represent > hierarchical data? No.
> Please explain how context-free grammars represent > arithmetic expressions if hierarchy isn't syntax. Isn't syntax simply the list of allowable keywords and their parameters? I don't think syntax in itself is sufficient to represent hierarchy. You need something like grammatical rules that can reference each other.
E.g., this, syntax, is not enough:
'(', ')', '+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
You also need this, a grammar:
EXP -> INT | INT OP INT | '(' EXP ')' INT -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' OP -> '+' | '-'
- Oliver
Steve Wampler - 10 Apr 2006 22:17 GMT > Isn't syntax simply the list of allowable keywords and their > parameters? I don't think syntax in itself is sufficient to represent [quoted text clipped - 10 lines] > INT -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' > OP -> '+' | '-' No. Syntax *is* grammar. You're mixing lexics and syntax. Semantics is the meaning attached to a syntax. (Lexics is one aspect of syntax, corresponding to the leaf nodes in the grammer.)
Timbo - 05 Apr 2006 16:55 GMT > I haven't seen anything in XML > that does more than provide a guarantee that the syntax is right. Ok, so say you are writing an application that deploys an agent to find you the best prices for CDs on the web. If you share the same ontological definition of CD attributes, you could have the following album embedded in a webpage:
<Album> <Artist> Stevie Wonder </Artist> <Title> Innervisions </Title> <Producer> .. </Producer> <Track number=1 name=".."/> <Track number=2 name=".."/> ... etc.. <Price> £5</Price> </Album>
Compare that to the text:
Stevie Wonder, Innervisions, 1: ..., 2: ..., £5
You can see that clearly, any online CD store that follows the XML definition in the first one (which could be defined in a schema) would be easier to browse than one that has free text, especially if some CDs have data that others don't, such as accompanying musicians. You could find the grammar for the free text, write a parser for it (or download one), and interpret the parsed data, but simply sharing the set of definitions is more straightforward.
Steve Wampler - 05 Apr 2006 18:01 GMT >> I haven't seen anything in XML >> that does more than provide a guarantee that the syntax is right. [quoted text clipped - 25 lines] > one), and interpret the parsed data, but simply sharing the set of > definitions is more straightforward. Hmmm, I, as a human, find the second form *much* easier to browse. I can pick out the actual content *much* faster. Granted, I might prefer something like:
Steve Wonder: Innervisions ($9.25) 1: .... 2: .... 3: ....
but that would depend on whether I'm more interested in the artist and album or the details of the album content. (Great price, by the way!)
Of course, you're talking about computer handling of the data, where your points are more valid. That's *still* syntax though.
Oliver Wong - 05 Apr 2006 19:19 GMT >>> I haven't seen anything in XML >>> that does more than provide a guarantee that the syntax is right. [quoted text clipped - 43 lines] > points > are more valid. That's *still* syntax though. I find Timo's XML version as easy to read as Timbo's CSV version. However, I do find Steve's "custom" version easier to read over the other two, as a human.
However, another nice thing about XML over the other two formats is that there is a standardize escaping mechanism. Artists are... well... artistic... and they sometimes do crazy things. In CSV, or the custom format, how do you distinguish being an album whose name is the empty string, and an album whose name is the single space character? What if the album contains a colon in it? What if the artist name contains a colon in it? What if the album name contains an open-parenthesis and dollar sign in it, but no close-parenthesis? Etc.
As purely digital music becomes more popular (e.g. songs existing only as OGG or MP3 files, and no physical albums, so no cover art nescessary), you could have tech-savy artists define the names of their tracks to be the newline character for some specific platform, for example. Maybe I'll go write a song right now whose name is the value of the Java literal String expression "\u0000\r\n\u0008\r\n\n". For clarity, the name of my song is 7 characters long, and is not intended to be pronounced (there will be no lyrics in the song).
With XML, it's possible to express unambiguously any possible string of characters (using, e.g., entity-references). With CSV or the custom format, you'd have to invent an escaping-system, and then I, as a human, would have to learn about your escaping system to either be able to read the data myself, or to implement a program which can parse the data.
- Oliver
Roedy Green - 05 Apr 2006 22:36 GMT > With XML, it's possible to express unambiguously any possible string of >characters (using, e.g., entity-references). You have made a much better case for binary strings that don't need fancy XML escaping than you have for XML.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Oliver Wong - 05 Apr 2006 23:26 GMT >> With XML, it's possible to express unambiguously any possible string >> of >>characters (using, e.g., entity-references). > > You have made a much better case for binary strings that don't need > fancy XML escaping than you have for XML. The problem with a "straight-to-binary" approach is that you'd have to use custom tools to process the data. With XML, you can use a generic XML editor, or worse case, a simple text-editor.
I don't "mind" ASN.1 so much if only the editors were more readily available. From my perspective, it's almost the same as using gzip to unzip a file yielding an XML document, and then using an XML Editor on the resulting XML document.
- Oliver
Roedy Green - 06 Apr 2006 02:12 GMT > The problem with a "straight-to-binary" approach is that you'd have to >use custom tools to process the data. With XML, you can use a generic XML >editor, or worse case, a simple text-editor. No you don't. You use an ASN schema and a binary parser. It is just like XML only compact.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
James McGill - 06 Apr 2006 07:00 GMT > No you don't. You use an ASN schema and a binary parser. It is just > like XML only compact. Nobody is going to use ASN just for fun. It's so obviously a product of some 1980s multi-tiered management bureaucracy, it's not even funny. Don't get me wrong -- I appreciate the strong typing and hard guarantees that are possible within the framework. There are ASN constructs for things that would be a major pain in any representation (like the stuff dealing with Sets -- I understand the value in data binding applications).
But it's not *fun*. At no level is it easy to work with. It's something you use because your boss pays you to work with it, and it's NOT something you use simply because you enjoy it.
Chris Uppal - 06 Apr 2006 10:39 GMT > > No you don't. You use an ASN schema and a binary parser. It is just > > like XML only compact. > > Nobody is going to use ASN just for fun. It's so obviously a product of > some 1980s multi-tiered management bureaucracy, it's not even funny. Doesn't the same thing apply to XML ?
-- chris
Oliver Wong - 06 Apr 2006 16:05 GMT >> Nobody is going to use ASN just for fun. It's so obviously a product of >> some 1980s multi-tiered management bureaucracy, it's not even funny. > > Doesn't the same thing apply to XML ? I use XML "just for fun", in the sense that I've used it in situations where my boss isn't paying me to use it (including the situations where I'm my own boss). See many of my postings to this newsgroup for example. I'll often use "xml-like" syntax to show what's Java code versus what's prose.
- Oliver
Chris Uppal - 07 Apr 2006 08:52 GMT > I use XML "just for fun", in the sense that [...] And I thought /I/ was strange !
;-)
-- chris
James McGill - 07 Apr 2006 10:14 GMT > > I use XML "just for fun", in the sense that [...] > > And I thought /I/ was strange ! Well, my point was that I use XML schema for things like configuring games, communication between online game clients, the save game format, the parameters of the model, etc. Strictly for fun. I know that ASN.1 (for example) offers some very formal grammars that happen to be accepted as industry standards; but I am quite certain that it's anything but a pleasant framework to design with. But I'm biased, since pretty much all my messages are a few Kilobytes, and really, no amount of bloat that results from the markup is going to make enough difference that it overtakes RPC over HTTP or File IO as the limiting factor.
To be fair, the discussion of ASN.1 started in response to a proposition to use XML for a degenerate case where it's probably not the appropriate markup encoding to use.
Also, it's quite likely that when someone's golden hammer fails, he might be tempted to reinvent the wheel (badly), rather than use a different hammer for that problem. And that's why an amateur might need to be nudged in the direction of another alternative that he might never have heard about otherwise. I can respect that.
Now somebody is going to come out of the woodwork claiming that yacc is fun.
Chris Uppal - 07 Apr 2006 11:17 GMT [me:]
> > And I thought /I/ was strange ! [...]
> Now somebody is going to come out of the woodwork claiming that yacc is > fun. Yacc /is/ fun.
(I said I was strange ;-)
-- chris
Roedy Green - 07 Apr 2006 18:49 GMT On Fri, 07 Apr 2006 02:14:42 -0700, James McGill <jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone who said :
>. I know that ASN.1 >(for example) offers some very formal grammars that happen to be >accepted as industry standards; but I am quite certain that it's >anything but a pleasant framework to design with. the claim is you don't have to. You can use an XML schema.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
James McGill - 07 Apr 2006 19:30 GMT > the claim is you don't have to. You can use an XML schema. I guess the question is, why would you then add another layer of complexity, if you've already got an XSD that models your data to your satisfaction? I realize that if I was working for you, you would insist on a tightly packed, formalized wire format. That's cool. I've had to do similar things to map between an XML represenation of DNS data, and the ietf wire format for the records. I don't think an ASN model would be any weirder than that.
Chris Uppal - 06 Apr 2006 10:38 GMT > However, another nice thing about XML over the other two formats is > that there is a standardize escaping mechanism. Artists are... well... > artistic... and they sometimes do crazy things. All the file formats I can think of have well-defined escape mechanisms (in CSV, unfortunately, you have a choice of about 10 and it's difficult to be sure that all parties are agreed on which is in use). XML has one too. That's hardly an advantage for XML (especially when its mechanism is so crappy).
What the world needed, but didn't get, was a well-designed, standardised[*] escape mechanism which could be used in almost any file format....
([*] if only by convention)
-- chris
Oliver Wong - 06 Apr 2006 16:10 GMT >> However, another nice thing about XML over the other two formats is >> that there is a standardize escaping mechanism. Artists are... well... [quoted text clipped - 5 lines] > sure > that all parties are agreed on which is in use). So to me, this means that CSV does NOT have well-defiend escape mechanisms. That is, if your requirements are "support an 'export to CSV' functionality", it wouldn't be unusual to forbid "crazy things" appearing in your document model (or else just not worrying about it and letti
|
|