bag of words
irreversability / stripping away the writer's process...
book of words
arbitrariness of the digital link of language to economy of representation separation from the body (pronouneable)
illusion of a universal language
what if the separation is not so easy to make
Brin and Page's RESOURCES.
ECONOMIES / Trade offs
In announcing Google's impending data center in Mons, Belgian prime minister Di Rupo invoked the link between the history of the mining industry in the region and the present and future interest in "data mining" as practiced by Google.
Whether bales of cotton, barrels of oil, or bags of words, what links these processes is the way in which the notion of "raw material" obscures the labor and power structures employed to secure them. "Raw" is always relative: "purity" depends on processes of "refinement" that typically carry social/ecological impact.
Stripping language of order is an act of "disembodiment", detaching it from the acts of writing and reading. The shift from (human) reading to machine reading involves a shift of responsibility from the individual human body to the obscured responsibilities and seemingly inevitable forces of the "machine", be it the machine of a market or the machine of an algorithm.
The (computer scientists) view of textual content as "unstructured", be it in a webpage or the OCR scanned pages of a book, reflect a negligence to the processes and labor of writing, editing, design, layout, typesetting, and eventually publishing, collecting and cataloging .
"Unstructured" to the computer scientist then, means non-conformant to particular forms of machine reading. "Structuring" then is a social process by which particular (additional) conventions are upon and employed. The computer scientist oftens views a text through the eyes of their particular reading algorithm, and in the process (voluntarily) blinds themselves to the work practices which have produced and maintain these "resources".
Berners-Lee, in chastising his audience of web publishers to not only publish online, but to release "unadulterated" data belies a lack of imagination in considering how language is itself structured and a blindness to the need for more than additional technical standards to connect to existing publishing practices.