XML vs JSON

So what do you think? Which one is best? You may not believe it but a comparison like that can bring a lot of heat amongst geeks.

Before getting deeper into this article lets bring up to speed those who are sure what those two acronyms are. XML and JSON are two different types of structures of data in plain text. You can use XML or JSON in order to compile a structured document that contains keys and values.

Here is an example of RSS, a very popular XML format for news feeds. It is followed by the same format in JSON. RSS is an XML format but we converted it in JSON so that we can use it for the arguments of this article.

Example of an XML document:

<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
    <channel>
        <title>My News</title>
        <description>This is my news feed</description>
        <link>http://www.example.com</link>
        <lastBuildDate>Wed, 04 Dec 2013 12:00:00 GMT</lastBuildDate>
        <item>
            <pubDate>Wed, 04 Dec 2013 12:00:00 GMT</pubDate>
            <title>Title 3</title>
            <description>Description of news 3</description>
        </item>
        <item>
            <pubDate>Tue, 03 Dec 2013 12:00:00 GMT</pubDate>
            <title>Title 2</title>
            <description>Description of news 2</description>
        </item>
        <item>
            <pubDate>Mon, 02 Dec 2013 12:00:00 GMT</pubDate>
            <title>Title 1</title>
            <description>Description of news 1</description>
        </item>
	</channel>
</rss>

Example of a JSON document:

{
	"channel": {
		"title": "My News",
		"description": "This is my news feed",
		"link": "http://www.example.com",
		"lastBuildDate": "Wed, 04 Dec 2013 12:00:00 GMT",
		"items": [
			{
				"pubDate": "Wed, 04 Dec 2013 12:00:00 GMT",
				"title": "Title 3",
				"description": "Description of news 3"
			},
			{
				"pubDate": "Tue, 03 Dec 2013 12:00:00 GMT",
				"title": "Title 2",
				"description": "Description of news 2"
			},
			{
				"pubDate": "Mon, 02 Dec 2013 12:00:00 GMT",
				"title": "Title 1",
				"description": "Description of news 1"
			}
		]
	}
}

For many years XML used to be the absolute leader in its field. During the last few years JSON rapidly became a competitor and now most programmers prefer it against XML.

Anyone can instantly realize how much cleaner JSON is. But is that all there is? Can you build a philosophy on how superior a format is, based on just one factor?

If we take a closer look, XML may be the fat cousin of JSON but it does have two key structural features JSON doesn't.

Everything is a list
In XML every sub-node is an item of a list. In our example every news item has a title. Well, in XML you don't have to have just one title. If you ever need to extend your protocol you can add more nodes with the same name.

Attributes
XML has attributes. JSON doesn't, but of course the JSON format can adjust in order to offer a similar feature.

Overall, these two differences make XML a lot more extensible than JSON. Let's see an example. The news items in our example have a title and a description. What if someday you decide that you need to enhance your news feed with translated titles and descriptions. In XML that is easy and it could look like this:

<item>
    <pubDate>Wed, 04 Dec 2013 12:00:00 GMT</pubDate>
    <title lang="en-US">Title 3</title>
    <title lang="fr-FR">Titre 3</title>
    <title lang="el-GR">Τίτλος 3</title>
    <description lang="en-US">Description of news 3</description>
    <description lang="fr-FR">La descrizione delle notizie 3</description>
    <description lang="el-GR">Περιγραφή είδησης 3</description>
</item>

What you can see here is an attribute with a language locale. XML's structure allows multiple nodes with the same name, so the attribute makes all the difference. Old readers will still need to work because they will search for the first title node. New readers will be able to use the translations.

JSON can do the same, but not with the same ease. It has to change its format a lot. This is one way to change the JSON format to get a similar result:

{
	"pubDate": "Wed, 04 Dec 2013 12:00:00 GMT",
	"title-en-US: "Title 3",
	"title-fr-FR": "Titre 3",
	"title-el-GR": "Τίτλος 3",
	"description-en-US": "Description of news 3",
	"description-fr-FR": "La descrizione delle notizie 3",
	"description-el-GR": "Περιγραφή είδησης 3"
}

This is another way:

{
	"pubDate": "Wed, 04 Dec 2013 12:00:00 GMT",
	"title": {
		"en-US: "Title 3",
		"fr-FR": "Titre 3",
		"el-GR": "Τίτλος 3"
	},
	"description": {
		"en-US": "Description of news 3",
		"fr-FR": "La descrizione delle notizie 3",
		"el-GR": "Περιγραφή είδησης 3"
	}
}

But, both the last two alternatives change the format in a way that may break old readers. So maybe a better alternative would be:

{
	"pubDate": "Wed, 04 Dec 2013 12:00:00 GMT",
	"title: "Title 3",
	"title-translated": {
		"en-US: "Title 3",
		"fr-FR": "Titre 3",
		"el-GR": "Τίτλος 3"
	},
	"description": "Description of news 3",
	"description-translated": {
		"en-US": "Description of news 3",
		"fr-FR": "La descrizione delle notizie 3",
		"el-GR": "Περιγραφή είδησης 3"
	}
}

With the examples above we want to illustrate how complex it can be to extend JSON formats.

How to choose the best format

When you have to decide between vague choices, it is always smart to decide based on circumstances. If you build an open protocol for general use I would recommend you to choose XML. If you build a protocol for a closed system that only limited applications will access, maybe JSON's lightness is the key factor of your final choice.

The ultimate choice will have to be made after running possible use-cases in your imagination or remembering problems from your experience.