Expanding Your Toolkit: JSON vs. XML

Back in the stone age — the early 1980's — SGML was created to express document structure, so that complex documents could be electronically shared and rendered by cooperating government agencies and companies. "This part of the content is the title," "this part is a figure," "this is a legend under a figure," and so on.

Still in the stone age — in the 1990's — HTML arose to express less complex documents in a web-browser context. At the dawn of the modern era — the late 1990's — XML was created with the intent of finding a middle ground between the two. One that was both human-readable (plain text with Unicode support) and machine readable (sufficiently structured to be efficiently and unambiguously parsed). It was intended to be a practical compromise between completeness and simplicity.

Like it's parents, XML was conceived as a means of expressing a document . However, it was understood and desired that it could be used to express non-document data as well. XML's elegance comes from not having a rigidly fixed lexicon. It's all well and good to say <title> is a tile, and <p> is a paragraph, but how cool is it to say that <something> is a something? Without having to define an unworkably large universe of possibilities in advance and then deal with missing features, and force-fit messes they cause us to create, we can suddenly talk about <inventoryControlNumber> and <favoriteIceCream> just as easily as anything universally pre-defined.

If the call-a-thing-a-thing philosophy reminds you of JSON, it's not an accident. JSON was designed to be XML-light. We get to decide what terminology best suits the task at hand in JSON because that was one of the aspects that was kept from XML. This of course begs the question: So why do we need both XML and JSON? Good question.

The JSON and XML snippets in [Figure 1] convey the same core payload.

JSON:

{
    "employees": [
        {
            "firstName" : "Albert",
            "lastName"  : "Einstein"         },
        {
            "firstName" : "Albert",
            "lastName"  : "Schweitzer"
        }
    ]
}

XML:

<employees>
    <employee>
        <firstName>Albert</firstName>
        <lastName>Einstein</lastName>
    </employee>
    <employee>
        <firstName>Albert</firstName>
        <lastName>Schweitzer</lastName>
    </employee>
</employees>

You can see that JSON is a bit more compact without being any more obscure, so that looks like a win. In situations represented by the above example, it is. A huge set of real-life use cases are well served by JSON. It's terrific. No surprise that it is the data packaging system most in use on the web today. You might easily spend your entire career not needing anything more.

But.

For binary values — like images, sound, and video — JSON requires a to-string encoding and from-string decoding. Although easy enough to pull off for an occasional bit of multimedia content, when you have to scale this, it can add up to being expensive computationally, bloated, and there are transmission resources required.

XML can handle binary data expressed as text (just like JSON), or referred to by URL or other external access methods (just like JSON). But it can also manage your multimedia content as an embedded/in-line/raw/un-expanded binary data by employing a CDATA tag. Also, there is something called Xpath, which traverses the more complex trees common to XML for you, so you can find things like "/book/published/year/text()" without having to write any custom code. And, XML supports name space segregation, metadata (attributes), and externally or internally defined schema to describe how a payload is to be understood. Hmm. Getting interesting for more complex data scenarios, isn't it?

Oh, and you can XSL to filter and transform one XML "document" into another one more suited to a particular task.

And...

You must get the picture. Applications moving binary data or complex data hierarchies are usually better served by XML. When those requirements are not present, JSON is likely the better pick, due to its compactness, simplicity, familiarity, and widespread use by many of the web services you may interact with.

This brief article cannot provide a primer on XML. For those not familiar with it, I merely hope to have alerted you the fact that JSON has a more powerful ally able to do the heavy lifting that lies beyond JSON's reach. When your work calls for custom packaging of binary and or deeply hierarchical data, a solution leveraging XML will most likely be better than either JSON or a completely proprietary mechanism.

Bennett Barouch

Bennett Barouch, an executive at eBay, a Fortune 500 company, has been VP of Engineering at a half-dozen startup companies. Work he led is in the permanent collection of the Smithsonian, for Outstanding Achievement in Information Technology. incredible combination of experience.

View more articles

Featured:

Jul/Aug 2017

menu
menu