My old friend Stephen Marche, the renowned Shakespearean, is at it again, this time with an impassioned piece preaching the massively controversial credo that “Literature is not Data.” It’s an attack on authors and academics. Or on digital humanists. Or on algorithms (which are, saith Marche, fascist). Or something. It’s a very strange, very ill-informed, very incoherent essay, and demands a more in-depth response from someone who is more immersed in current Digital Humanities practices than a mere dabbler such as myself.

But there are a couple of characteristic blunders in Marche’s article that I feel compelled to write about.

First of all, there is the weird narrative of Google Books he spins. In that story, “the openness and honest labor of engineers” comes face to face with the “closed ranks” of the “priestly class”: poor old Google just wants to make all the books it’s digitized freely available, or at least searchable, while “literary people” selfishly reject “the gift of digitization.” If Marche is to be believed, the conflict over Google Books was fought between a benign team of practically-minded innovators and a coterie of “writers and professors,” who, far from being “liberals, hedonists, bohemians,” are “in fact, profoundly, deeply, organically conservative.” He mentions, but then quickly ignores, that the legal case against Google was brought not just by the Authors Guild, but also the Association of American Publishers. Corporations, in Marche’s story, are good: they solve problems. Writers and thinkers, on the other hand, are bad: they squabble, and they “create problems rather than solving them.” Publishers, somehow, don’t appear to have a dog in this fight.

This is, of course, nonsense. It’s similarly nonsense to claim that “professors” were especially active in fighting Google’s noble mission to democratize knowledge. Very few professors make any money at all from their publications, as Marche must know. I recall some colleagues reacting with trepidation to the prospect of their books becoming available in full on Google, but that, of course, never happened; I would think that the vast majority of us are quite happy to have our work more widely accessible than it is when contained in the pages of $100-plus volumes and locked away in university libraries. The authors who objected most strenuously to Google’s project were those who stood to lose royalties — and, of course, their publishers.

Most revealingly, however, Marche claims that the idea of a digitized library of the world’s books was Google’s idea. It wasn’t. I wouldn’t presume to offer an authoritative alternative history, but I will point out that the Internet Archive, now containing almost 3 million public-domain texts, started in 1996, well before Google got on the band wagon. Nor is it true that “the world’s five largest libraries signed on as partners” in the Google Books project. They didn’t, and they haven’t. Some very large libraries were among the initial partners (Harvard and the NYPL, currently ranked 3rd and 4th in the US). But none of the major National Libraries have joined in, and the project remains extremely Anglocentric in focus.

Marche thinks Google should never have engaged with authors, because that way, well, lies madness. Instead, he proposes, “In hindsight, perhaps, Google should have followed the law for ‘fair use’ of copyright, come to agreements with the world’s major libraries to provide the Book Search to public institutions in perpetuity, and stepped aside.” Sounds good. Except that what “fair use” means in this context is far from a settled legal issue. It was the question at the heart of the law suit, and a question left open when Google and the parties in the law suit came to a settlement in 2009. That settlement was rejected by a judge in 2011, and the case is currently pending.

However, none of this has stopped Google digitizing books: the collection is steadily growing. Nor has it made conducting full-text searches harder. Google won’t display copyrighted material from books whose publishers have not signed an agreement, but the text is still being searched. And in any case, this only concerns material still in copyright. All older texts are fully and freely available.

All of which is to say, I have no idea exactly why Marche thinks Google Books has been a “failure” — or why he claims that scholars simply refused to engage with the kind of work Google is doing:

Academia could have done what humanists have done throughout history and tried to add to Google’s mandate: make the texts legible and available. They could have tried to bring out the contemporary relevance that only historical context, knowledge of literary tradition, and scholarly standards can provide. But this ancient task was anathema, for the simple reason that it would have involved honest work. Much easier to remain in the safe irrelevance of mass publication in the old mode, what Kingsley Amis called “the pseudo-light it threw on non-problems.

The central sneer here appears to be that academics don’t like “honest work” and prefer “mass publication in the old mode” — a mode that apparently does not involve making legible texts available. I honestly have no idea what Marche is talking about. The past twenty years have seen an astonishing wealth of academic, not-for-profit undertakings that make texts available in reliable versions all over the place all the time — independently of or in cooperation with commercial enterprises such as Google’s. That Marche would locate the true scholarly spirit so emphatically inside the hallowed halls of Googleplex speaks volumes.

Secondly, Marche talks a bit about the supposed impact of the digital revolution on academic research. His prime example is EEBO (Early English Books Online). He does not seem to be aware that EEBO is an expensive subscription service, nor does he seem to realize that the vast majority of the books it contains are simply digitized from microfilms that were available long before the World Wide Web changed everything. Here’s how he imagines Renaissance scholars worked in the bad old days:

Before EEBO arrived, every English scholar of the Renaissance had to spend time at the Bodleian library in Oxford; that’s where one found one’s material. But actually finding the material was only a part of the process of attending the Bodleian, where connections were made at the mother university in the land of the mother tongue. Professors were relics; they had snuffboxes and passed them to the right after dinner, because port is passed left. EEBO ended all that, because the merely practical reason for attending the Bodleian was no longer justifiable when the texts were all available online.

No British Library in Stephen Marche’s world; no Huntington, no Houghton, no Beinecke, no Folger, no Newberry, no Library of Congress; no Cambridge University Library, no National Library of Scotland. Renaissance scholars all flocked to the Bod — and now, one supposes, the Bod stands empty, while we all stare at our screens. I’m glad Stephen Marche was treated to snuff in hall at whatever college he was staying at in Oxford — I never have been, though I can report that professors still eat dinner there, and still pass the port. Some of them may fairly be considered relics, though, I expect, no more or fewer of them than in the pre-EEBO days. And the Bodleian remains busy, as do all the other excellent and well-stocked research libraries I mentioned.

It is certainly true that things have changed. Scholars fortunate enough to work at institutions with an EEBO subscription can read far more materials at home, just as those whose libraries owned full runs of the old STC microfilms could. But that hasn’t spelled the end of research trips to archives. What is true is that there is greater interest in manuscript work now than for a long time, and there is doubtless a connection between that shift in focus and the wider availability of digital texts. Cynically, one might suggest that scholars need to justify research trips somehow, and looking at manuscripts, or at individual copies of works, is a great way of doing that. More idealistically, one might argue that services such as EEBO have freed up more time for archival exploits that were simply not manageable for most scholars before. Either way, the scene Marche describes still plays out, all around the world (not just in Oxford). Though without the snuff. Same as it ever was.

And then there’s this bit of weirdness: “Stylometry, the analysis of definable patterns in literary styles, has also been a mode of desacralization.” Sure, I suppose. But of course stylometry has nothing to do with Google Books. Or, for that matter, inherently with the internet (as I imagine Lorenzo Valla would point out if he still could). Marche’s single example of the triumph of stylometry — the addition of Middleton’s name to the title page of Timon of Athens — has its basis in R. V. Holdsworth’s unpublished 1982 PhD thesis. In published form, the most prominent summary of the arguments can be found in Brian Vickers’ Shakespeare as Co-Author, which appeared in 2002: two years before Google Books put a single digitized volume online.

All of this forms the long opening salvo to Marche’s essay, which, given its ostensible purpose of arguing against the Digital Humanities, may seem a little odd. So far, he’s singing the praises of Google Books, highlighting the virtues of EEBO and of the new internetified science of Stylometry, and castigating crusty old lazy scholars for refusing to do their bit to make the media revolution happen. Sounds like a grand defence of DH to me, or at least a heavily corporatized version of DH.

But then Marche switches from one imaginary target to another — if academics aren’t loathsome in their retrograde attachment to paper, they are vile in their refusal to acknowledge the special status of the literary: “Literature cannot meaningfully be treated as data. The problem is essential rather than superficial: literature is not data. Literature is the opposite of data.”

To which one obvious answer is: well, duh. And another obvious response may be “Well, only if you don’t understand what ‘data’ means.”

On one level, Marche is naturally right, though it’s a little absurd that he thinks this is a great insight: “The experience of the mystery of language is the original literary sensation. The exuberance of ancient literature — whether it is in the simple, inscrutable lyrics of Sappho or Oedipus’s tragic misunderstanding of the oracles — contains a furiously distressed joy that words mean so much more than they mean.” As so often in Marche, it’s all expressed in too absolute terms, too, if you will, exuberantly, but the ideas are anything but new. Or controversial.

I don’t know, to be honest, what text mining DHers would have to say in response. I doubt anyone has come up with software that can explain how great literature works. I hope no one has. And if anyone ever were to develop a program that can deliver the ultimate analysis of any text we feed it, our jobs as teachers of literature would probably be over. But so would the jobs of literary authors. And as far as I know, no one is actively trying to destroy literature through electronic demystification.

Marche writes as if all scholarship were engaged in acts of literary interpretation — more specifically, in acts of close reading. As he must know well enough, given his academic background, nothing could be further from the truth. Criticism is one kind of literary scholarship, but it’s only part of the larger enterprise; and I suspect it’s the part DH is least good at. Literary history, on the other hand, is far more likely to benefit from the broad-based, distant view data-rich approaches can offer — although Marche, bizarrely, thinks that “the process of turning literature into data removes … the history of the reception of works.” He’s right that a data-centric approach is less likely to be influenced by “taste” or “refinement,” but for my money, that’s a good thing. History dictated by taste is history written by the winners. And that’s bad history. “Meaning is mushy,” Marche writes, not inaptly. But whereas the meaning of a line of poetry may emerge more clearly, or more richly, simply through contemplation, through critical engagement, the meaning of a historical development is just as likely to become more apparent through a process of accumulating more data — of stepping back and seeing the development in the broadest possible context, the kind of context data analysis can provide with a clarity and a neutrality likely lost in a critical endeavour propelled by questions of taste and a desire for refinement. (I can’t be the only one who’s finding it difficult to reconcile Marche-in-Matthew-Arnold-mode with Marche-in-Google-Books-acolyte-mode.)

Finally, Marche seems to think, rather puzzlingly, that “data” implies “completeness.” “Literature is terminally incomplete,” he notes. He means that not every text ever written has survived, as far as I can tell, though he quickly moves from this discussion of literature’s archival fragmentation to the (unrelated) challenge of the fragmentation of meaning in the literary text. He appears to concede that this problem of partial transmission does not afflict literature alone (“The information we have about the past is, in almost every case, fragmentary”), so that it is presumably not literature alone, but all human existence that is “haunted by such oblivion, by incipient decay.” But it’s unclear why any of this should matter in any case. No data set is ever complete. Marche’s counterexamples are baseball stats and case law. He doesn’t seem to be aware that in both those cases, we’re dealing with flawed and incomplete sets of data. Baseball stats have become ever more detailed and fine grained in recent years, and many of the analyses now possible (of pitching data in particular) cannot be undertaken for historical figures, as the numbers aren’t available. And the idea that it’s possible to “establish a complete database for all of the legislation and case law in the world” is just preposterous. Like any other human activity, law cases are subject to transcription and transmission, to conventional editing and pruning and to archival loss. There is nothing special about literature’s transmission challenges. Working with incomplete and unreliable data sets is an entirely familiar and common experience for analysts of all kinds.

It’s thus not news for anyone that “there are always masses of data which are simply missing or which cannot be untangled,” though some of us may be surprised to learn that “the most obvious and relevant example is Shakespeare.” Why would he be? Obvious, perhaps — but relevant? How? To whom? And in what sense? What’s clear is that Marche himself finds the Shakespearean data set confusing, so let me clarify: “There are nine different versions of Richard III; there are three versions of Hamlet, each with missing sections or added sections,” writes Marche. Well, no, there aren’t. There are eight quartos of Richard III, though they don’t differ much (if at all) from edition to edition after the third quarto. And then there are four reprints in folio form, but the second through fourth folio aren’t usually considered to have independent authority. So that’s either four different versions or thirteen. Hamlet exists in three different texts, two in quarto, one in folio; the second quarto was reprinted three times, but there are later quartos from the second half of the seventeenth century, five in total. So Hamlet, counting by the method Marche seems to use, half-heartedly, for Richard III, exists in somewhere between nine and fourteen “versions.”

If anything, we have too much literary data in these cases. What we don’t have, and what’s challenging, is non-literary data, supplementary information that would elucidate the status and the genesis of all these texts. The problem is not the indeterminacy of the literary work, or its incomplete transmission as such — it’s the absence of metadata. The challenge, that is to say, is not literary: it’s historical. And the mystery is not the mystery of language, but that of commercial publishing practices, playhouse conventions, censorship decisions, archival and collecting decisions, and so on. Algorithms won’t be the ultimate solution to those challenges and mysteries. But who ever said they would be?

17 Responses to Stephen Marche? Again?

  1. […] digital humanities; Holger Syme, professor of Elizabethan drama at the University of Toronto, appeared for the defense. (Syme has a pleasing facility at deflating blowhards of the Canadian intelligentsia—witness his […]

  2. […] digital humanities; Holger Syme, professor of Elizabethan drama at the University of Toronto, appeared for the defense. (Syme has a pleasing facility at deflating blowhards of the Canadian intelligentsia—witness his […]

  3. […] is not the first time Stephen has screwed up. He has a reputation for being pointless, incoherent, and rambling. He admits that, despite being a former professor of Shakespeare, his book on the […]

  4. […] the Bodleian? I never could make sense of his rant and others including Holger Syme have written much more eloquent defenses. But still – according to Marche we’re a bunch of immoral hooligans who threaten real […]

  5. Holger Syme says:

    Sadat:

    you don’t think the member of the “commentariat” (you’re word) started this particular “pissing match”? And you don’t think he makes it quite clear who he has respect for and whom he despises in the contrast between Google’s forward-lkooking engineers and the problem-creating academics?

    As for your second point: you might want to re-read both Marche’s and my essay. It would be foolish of me to claim that libraries have not been affected by the move to digital publishing. I said no such thing. But it’s equally foolish to claim that research libraries, and in particular the kind of research library Marche refers to, stand empty now that everyone just uses EEBO. That’s simply not true. The weird nostalgic image is his, as is the doom-and-gloom narrative. I don’t know what places such as the Bodleian will be like 20 years from now. All major archives are involved in a host of digitization projects at the moment, and those will doubtless affect their user numbers. We’ll see. But in the real world, academics and scholars are heavily involved in all those projects. I don’t know where you (or Marche) get the idea that universities are fighting some kind of wrong-headed ultra-conservative battle against change — we’re not.

  6. SadatUofT says:

    The best example I can come up with? This one. YOU, clinging to the pole around which the wind is screaming, the wind Marche describes, and is simply calling on you to acknowledge, and build structures to contain, control, harness: “And the Bodleian remains busy, as do all the other excellent and well-stocked research libraries I mentioned.” Library statistics tell a different story, but if the mass music, publishing and media businesses could have their heads in the sand for 10-15 years, academia will be buried in the much for what? 20? 30? 50 years? You mistake Marche’s criticisms for shots … no. He wants academia and its missions to survive. But first clearing your dogma is essential. And had you read a little more critically, and a little less indignantly, you’d have noticed Marche’s inherent distrust of Google, and the notion of engineered knowledges. But you were too busy being insulted. Shame.

  7. SadatUofT says:

    Truly sad to see a professor at an important school and literature department turn this into a pissing match between the cultural commentariat and academia, but typical of the small minds that seem to gravitate to the Ivory Tower these days. Scary to think you are also responsible for helping open the minds of young people, too. You could learn a lot from Marche if you took a short sojourn from being indignant and self-congratulatory.

  8. Neil Howlett says:

    In England & Wales we still don’t have a database of statute law (legislation) that is up to date and publicly accessible. Many millions have been spent so far on two databases of recent statute law but even they are subject to caveats that they are not up to date.
    There are commercial offerings- they are more open than the scholarly EEBO ( and most journals) to which I would be unable get access even if would pay.
    The latest proposal for statutes seems to be a wiki style system which relies on a community of experts to keep it up to date.
    As for case law that depends on a charitable foundation for modern cases and separately for historic cases. The law in Scotland, Northern Ireland and the Republic of Ireland is different, and proper interpretation of UK statutes requires access to European Community law which a online for free but there are then problems of translations.

  9. Todd Butler says:

    Speaking as someone who has waded through collections of case law, depositions, and the like, the idea that we could compile anything close to a complete compendium of world law (let alone Anglo-American) is astonishingly naive. Apart from disregarding the profound gaps in historical records equivalent to those in literary history (what? the London Fire didn’t burn any court records?), Marche unquestioningly swallows the law’s representation of its own objectivity and completeness. That a decision of a case represents the final word on a subject is a powerful, and perhaps even practically necessary, fiction, but it’s a fiction nonetheless.

    And what you left out of his description of the halcyon days of early modernists at the Bodleian is the phrase that precedes it: “the decline of the sacred.” What is sacred in this context seems not the books but a clubbish interchange of well-heeled (and undoubtedly male, employed by major private universities, etc.) folks.

  10. Bob Grumman says:

    A small disagreement: professors certainly do make money from their publications, they just don’t make it directly, they make it indirectly from increased salaries, tenure, eligibility for the many prizes available to professors but not to non-professors (in actuality, though most such prizes are technically available to non-professors), payment for speaking engagements–even if only travel expenses. . . .

    • Holger Syme says:

      I’d qualify that and say some professors do make money that way (though the majority of those are probably also part of the minority that earns significant royalties etc). But with few exceptions, that earning potential is not affected by having one’s books available (let alone full-text searchable) in the public domain.

  11. Carolyn Sale says:

    Appropriately scathing, Holger! The shallowness of the argument is most acute when Marche complains about others not defining their terms — “Even a relatively casual examination of the fundamental assumptions underlying the argument reveals the mushiness of the words beneath the hard equations. What is a ‘classic’? What is ‘influence’? — only to proceed, in the very next paragraph, to the claim “Literature is not data” without defining “data.” Further along comes the beaut “Data precedes written literature,” a gem made possible because he has also failed to define literature. And then, of course, there’s the self-important catachresis with which he concludes, “Insight remains handmade.” The “meaning” is “mushy,” all right! The article is cold ham-and-pea soup with dry tofu in place of the ham and chili flakes sprinkled on top for fake heat. The wonder is that it got served at all . . . .

  12. Arno Bosse says:

    Never mind Google Books. Project Gutenberg (http://www.gutenberg.org) was founded in 1971.

  13. Alex Gillespie says:

    “The experience of the mystery of language is the original literary sensation” – does he really write that? I guess he does. Thanks for this excellent response to a big pile of nonsense, Holger.

  14. Simon Tanner says:

    Bravo, well said. An informed and well-thought through examination of a very strange article indeed. Thank you for this.

Leave a Reply