In defense of XML

There seems to be a lot of hatred for XML. It’s not hard to find blog posts and articles where the author rants about the deficiencies and inefficiencies of XML and promotes the beauty of JSON, YAML, or something else. Is this level of vitriol really deserved?

When I talk to people who dislike XML, they’re quick to point out examples where it’s been used in spectacularly poor fashion. These aren’t strawman arguments but instead genuine situations where the files are no more useful than a proprietary, undocumented, binary format. Given that one of the promises of XML was around effective data interchange, this is a shame.

However, should poor use of XML, even if widespread, be sufficient for us to abandon use completely? Especially when the very flexibility — extensibility — of XML has allowed it to be misused in the first place.

Element Normal Form

There’s a certain group of developers who look at the pedigree of XML and conclude that the only proper use is to use elements to markup text. They see XML as a specialization of standardized generalized markup language (SGML) and a cousin of the hypertext markup language (HTML), concluding that use of attributes is to be avoided at all costs, except (perhaps) for the occasional internal identifier.

This results in XML documents that look like this:

<person id="966">
  <fullName>John Doe</fullName>
  <knownAs>John</knownAs>
  <familyName>Doe</familyName>
  <birth>1966-03-31</birth>
  <addresses>
    <address>
        <street>1313 Mockingbird Lane</street>
        <city>Mockingbird Heights</city>
        <from>1966-03-31</from>
        <until>1999-12-31</until>
    </address>
    <address>
        <street>1600 Penselvania Avenue</street>
        <city>Washington DC</city>
        <from>2000-01-01</from>
        <until>2003-12-31</until>
    </address>
    <address>
        <street>1 Skid Row</street>
        <city>Hicksville</city>
        <from>2004-01-01</from>
    </address>
  </addresses>
</person>

There’s a lot to dislike about this. It’s verbose, repetitive and inefficient. Of the 690 characters in the file, fully 516 are dedicated to syntax and whitespace - only 174 characters of data (or 25%) would differ from one Person to another.

It doesn’t have to be this way.

By focussing on where XML came from, supporters of element normal form have lost sight of what it is - a well defined serialization format for heirarchical data.

Here’s the same exact information, serialized in a better way, making use of attributes:

<person id="966"
  fullName="John Doe"
  knownAs="John"
  familyName="Doe"
  birth="1966-03-31">
  <address street="1313 Mockingbird Lane"
    city="Mockingbird Heights"
    from="1966-03-31"
    until="1999-12-31" />
  <address street="1600 Pennsylvania Avenue"
    city="Washington DC"
    from="2000-01-01"
    until="2003-12-31" />
  <address street="1 Skid Row"
    city="Hicksville"
    from="2004-01-01" />
</person>

We’ve also lost the unnecessary wrapper around the multiple addresses.

This file has 440 characters, of which 266 are dedicated to syntax and whitespace. The same 174 characters of content are now 40% of the file.

For what it’s worth, this compares most favourably with 530 characters of JSON (where the same 174 characters of data would comprise just 32% of the file) and with a 423 character YAML file (where the data would be 41% of the file).

XML data serialization doesn’t have to be repetitive and inefficient - it can be as good as more recent formats such as JSON and YAML. Who knew?

Next Post
Finding source code in .NET Core 10 May 2017
Prior Post
Static Analysis tools for the Win 15 Apr 2017
Related Posts
Error assertions 26 Apr 2025
Browsers and WSL 31 Mar 2024
Factory methods and functions 05 Mar 2023
Using Constructors 27 Feb 2023
An Inconvenient API 18 Feb 2023
Method Archetypes 11 Sep 2022
A bash puzzle, solved 02 Jul 2022
A bash puzzle 25 Jun 2022
Improve your troubleshooting by aggregating errors 11 Jun 2022
Improve your troubleshooting by wrapping errors 28 May 2022
Archives
April 2017
2017