JSON and RDF: Inward and outward facing data structures and their consequences

On the surface JSON and RDF might seem interchangeable (no pun intended ;). Both are formats for exchanging (or “packaging”) data to provide feeds or the response to a say a website or services’ API. Working with RDF now for a while, I’m struck by how seemingly subtle differences between these two “interchange” formats are so significant, and how these differences relate to a larger drift of the web from an inherently decentralized nature to one increasingly centralized.

JSON comes from javascript’s “object notation” or the syntax of writing “literal” data structures in code. In JSON, the bread and butter data structures of code (number, string (text), list, and hash (or associative array, basically text-indexed collections of other data)) are directly and simply expressed. In RDF, URLs are more or less the primary form of representation. As “first-class” entities (ie a URL is a URL and not a string that happens to have a URL in it as in JSON). While it’s easy enough to “detect” URL values in JSON string by simple sniffing for say the leading protocol (xxxx://), JSON-oriented tools tend not to do so. For example in a browser-based JSON viewer (such as the excellent JSONivich which I regularly use), URLs are not (by default) clickable,. While this could be a “feature” added to the particular plugin, it’s the result of the fact that the JSON standard says nothing about URLs, nor indeed does it “need” or “want” to in terms of the problem it’s addressing.

In JSON, text and URLs are uniformly handled as “strings”

URLs are the basis of RDF’s “graph” oriented structure. In RDF data is represented as “flavored links” between things. These links are often called “triples” as they connect three things in the form (S–P–O) where, in linguists-soaked terms a “subject” is linked by virtue of a “predicate” to an “object”. Predicates are always URLs. Subjects are either URLs or else “temporary” nodes (a provision to allow for some degree of ad hoc structuring), and objects are either URLs or “literal” values (like numbers and strings). Additionally literal values may be qualified with a language code, and have an explicit data type (itself represented by, you guessed it, a URL). Typically data types are those for say dates to be explicitly marked.

While JSON’s beauty lies in it’s simplicity and direct relationship to code, the cost of that generality is that it extends the same limitations of data in code (its detached context of execution, the lack of sharable semantics beyond basic data types) to a networked context. Said another way, JSON is an excellent solution to a relatively easy problem, the cost of which is a shifting of difficult problems elsewhere. As a programmer I admire how JSON cuts through the nonsense of an approach that might claim an inherent superiority to say an XML representation. (If you’ve got a list of numbers, well then a JSON list of numbers is hard to beat). The cost of that “directness” however is that in a networked context, JSON’s generality reinforces practices of distributing data with “inward looking” structures thus making code written to receive and work with this data (from the “outside”) inherently dependent on those particularities. As a consequence, the code is both fragile (it “breaks” when the structure changes) and difficult to combine with other sources (as it’s up to the programmer to bridge any differences in structure).

RDF’s explicit use of URLs leads to not only convenient click-ability in a viewer, but significantly an inherent sense of mixability and non-hierarchical access. Mixability means that different rdf sources (or graphs) can be trivially combined. Non-hierarchical in that these “graph structures” (a concept from mathematics referring to a data structured as nodes and links), have no “top” from which to start unpacking but rather require pattern-oriented access methods (like SPARQL, but also simplified schemes like the “where” clause implemented by the helpful javascript library rdfquery). While introducing a slight “indirectness” to the accessing code, it means an end to ad hoc “unpacking” code that breaks as soon as the packing order changes, and enables resuable code that can potentially function with data from other sources.

The differences between JSON and RDF are linked to a larger trend of APIs of web services to drift away from standards-oriented formats (like RSS & Atom) to increasingly “programmer-friendly” JSON (giving you “just what you need”). Significantly, these APIs are also embracing stronger means of “authorizing” developer access (such as via mechanisms like OAUTH) whereby the use of these APIs require entering into a legal agreement between the programmer and the website, where typically the kinds of uses of the data are limited. This, in contrast to the typically open nature of an RSS feed (that tend to be publically accesible as in other web resources) and make no conditions on their use, reflect an increasingly centralizing and normalizing web.




← Previous post

Next post →

1 Comment

  1. Interesting post.
    However you can’t compare JSON with RDF because JSON is just a data structure format, while RDF is much more than that, it has the data strcuture format as XML or Turtle and the formality of semantic representation like URI, S-P-O , ontologies etc etc… in that sense, of course comparison cannot be made directly.
    You could compare JSON with XML or Turtle or CSV, but not with RDF.

    Said that, JSON is great because is has a very light footprint/overhead compared to XML, is more like Turtle format. Moreover JSON works very nicely exchanging data that should be consumed by javascript web clients.

    JSON can be used to carry over “RDF-like-data” the same way as RDF does. Have you looked at the work done for JSON-LD? see http://json-ld.org/ and http://www.w3.org/TR/json-ld/

    for example this JSON:

    “name”: “Manu Sporny”,
    “homepage”: “http://manu.sporny.org/”,
    “image”: “http://manu.sporny.org/images/manu.png”

    is translated into JSON-LD like:

    “http://schema.org/name”: “Manu Sporny”,
    “http://schema.org/url”: { “@id”: “http://manu.sporny.org/” },
    “http://schema.org/image”: {“@id”:”http://manu.sporny.org/images/manu.png” }

    Because JSON does not restrict you to use it with a bit more formalsm, like XML/RDF has done it over normal plain XML.

    So JSON is just a format and can be used to express semantic as well, we just need to follow the JSON-LD W3C recommendation if we want to have more context and meaning in our JSON.


Comments are closed.