2009年9月2日 星期三

How XML Threatens Big Data : Dataspora Blog

Three Reasons Why XML Fails for Big Data

I. XML Spawns Data Bureaucracy

In its natural habitat, data lives in relational databases or as data structures in programs. The common import and export formats of these environments do not resemble XML, so much effort is dedicated to making XML fit. When more time is spent on inter-converting data — serializing, parsing,translating — than in using it, you’ve created a data bureaucracy.
Indeed, it was what Doug Crockford called “impedance mismatch inefficiencies” that sparked him to create JSON - standardizing Javascript’s object notation as a portable data container.

II. Yes, Size Matters for Data

Size matters for data in a way it does not for documents. Documents are intended for human consumption and have human-sized upper bounds (a lifetime’s worth of reading fits on a thumb drive). Data designed for machine consumption is bounded only by bandwidth and storage.

XML’s expansiveness — for even when compressed, the genie must be let out the bottle at some point — imposes memory, storage, and CPU costs.

III. Complexity Carries a Cost

I never fail to sigh when I open a data file and discover an army of tags, several ranks deep, surrounding the data I need. XML’s complexity imposes costs without commensurate benefits, specifically:

In-line, element-by-element tagging is redundant. Far preferable is stating the data model separately, and using a lightweight delimiter (such as a comma or a tab).

Text tags are purported to be self-documenting, but textual meaning is a slippery thing: it’s rare that one can be sure of a tag’s data type without consulting its DTD (in a separate document).

End-tags support nested structures (such as an aside (within (an aside)). But to facilitate data exchange, flattened out structures are preferable, and arbitrary levels of nesting are best using sparingly.

XML’s complexity inflicts misery on both sides of the data divide: on the publishing side, developers struggle to comply with the latest edicts of a fussy standards group. While data suitors labor to quickly unravel that XML format into something they can use.

[From How XML Threatens Big Data : Dataspora Blog]

沒有留言:

張貼留言