Since the July 19th indictment of Aaron Swartz for surreptitiously whooshing nearly five million JSTOR documents onto a laptop concealed in an MIT network closet, there's been a lot of codswallop written about JSTOR, about Aaron Swartz and about the public's right to access documents in the public domain. A 24-year-old computer prodigy and political activist, Swartz has been caricatured as either a hero or a villain; likewise JSTOR. The U.S. Attorney for Massachusetts, Carmen M. Ortiz, who brought the charges against Swartz: she might be a bit of a villain, okay. Information wants to be free, it's been said. But whether this means free of charge or merely liberated from its confines is a distinction most often left unmade.
What we know so far, if the allegations in the indictment are true: late last year Swartz busted into the MIT network in order to conduct his download in secret, though he has been working at nearby Harvard for many years and has no direct affiliation with MIT. At Harvard, as at pretty much any U.S. university, Swartz would automatically have had full access to JSTOR. It's been widely asserted that Swartz intended to distribute the material he downloaded from JSTOR to the public, e.g. by posting the lot onto a file-sharing site like The Pirate Bay. And it's no wonder that people are saying this, because the government's indictment alleges it directly, but the indictment provides not a single shred of evidence to support these claims.
In a statement released the day the indictment was unsealed, U.S. Attorney Ortiz said: “Stealing is stealing, whether you use a computer command or a crowbar and whether you take documents, data or dollars. It is equally harmful to the victim, whether you sell what you have stolen or give it away.” Stealing may be stealing, but exactly what is the theft here? There were a few tweets around the time of the press release pointing out the absurdity of Ortiz's remark: "JSTOR is empty!" "I sincerely hope JSTOR will be able to recover the documents that were stolen from them."
If we want to understand the fix that Aaron Swartz is in, a full understanding of the details is in order. Let's start with JSTOR. What is it?
JSTOR (for Journal Storage) is a nonprofit organization founded in 1994 by the Andrew W. Mellon Foundation for the purpose of digitizing and distributing academic journals over the wire. The project was hatched at the University of Michigan with an initial $700,000 grant for hardware and software, plus an extra $1.2 million to pay for scanning just "ten core journals in history and economics" (Hardware, scanning and software development were astronomically expensive in those days.)
Seventeen years on, JSTOR digitizes and distributes over 1,400 journals, mostly to schools and libraries. The journals are divided up into different "collections" ("Arts & Sciences VIII," for example, offers 140 titles, including a series of rare 19th and 20th-century art magazines). The price for access to these collections varies wildly, according to the size, nature and location of the subscriber. Access is free for any nonprofit institution on the African continent, for example, and in a number of developing countries in other parts of the world. In the U.S., though, it might cost a four-year college over $50,000 for top-tier access; if you teach or study at a participating institution, you can read all the stuff your institution subscribes to for free.
In order to make these documents available, JSTOR has to license the content from publishers. Negotiating these licenses is a tricky business, not least because an academic journal has generally got a backlog of older content that, though it may be in the public domain, will still have to be scanned and archived. Some publishers charge web users outside the institutional subscription system a per-article fee for access to their stuff, a practice that has enraged many, given that quite a lot of this material is in the public domain, and the publisher's right to paywall such material seems therefore questionable.
A lot of people seem to believe that it doesn't cost anything to make documents available online, but that is absolutely not so. Yes, you can digitize an academic journal and put it online, but if you mean to offer reliable, permanent availability, it costs a huge amount of money just to keep up with the entropy. Plus you have to index the material to make it searchable, not a small job. Everything has to be backed up. When a hard drive fries, when servers or database software become obsolete or break down, when new anti-virus software is required, all this stuff requires a stable and permanent infrastructure and that does not come cheap. Finally, the more traffic you have, the more it costs to maintain fast, uninterrupted server access; you can see this whenever some little blog is mentioned in a newspaper and its server crashes five seconds later. In the case of JSTOR you are looking at many millions of hits every month, and they can't afford any mistakes.
So is JSTOR uniformly a good guy? Maybe not; it would certainly be nice if they would make their public-domain materials available to the general public. But if you are an academic librarian, JSTOR, a nonprofit, probably isn't making your blood boil the way for-profit publishers like NPG (Nature) and Reed Elsevier (The Lancet) do, the latter of which I have seen referred to online as "Lord Voldemort Elsevier" owing to the company's greed and general rapacity (though at least Lord Voldemort E. finally ordered a halt to the arms fairs traditionally thrown by one of its subsidiary companies, after top brass at The Lancet et al. kept screaming their pointy heads off).
Another thing to consider is that academic writers are paid through salaries and grants; they aren't paid (not directly, anyway) for the publication of their work. The whole system of compensation for academic content is very different from commercial publishing. When you pay for a JSTOR article online, none of the money goes to the author, it goes to the publisher.
Why Swartz? Why Now?
Once Swartz had been collared, JSTOR declined to pursue charges against him. (Politico reports that MIT plans to press charges, but university officials have not confirmed.) Indeed JSTOR immediately made a public statement to the effect that they have no beef with Swartz. There are two obvious reasons why the Feds decided to pursue criminal charges anyway. The first is that the Feds were already pissed off at Swartz and were just waiting for a chance to go after him.
In 2008, Swartz, taking advantage of a free trial of PACER, a government database of court records, cleverly automated a download of nearly 20 million documents. This was in response to the call of information activist Carl Malamud for donations of downloaded PACER documents, which ordinarily cost eight cents per page. Malamud's position is that since the public owns these documents, access to them should be easy and free of charge online. In the event, Swartz hadn't broken any laws, so the Feds were forced to drop their investigation. Perhaps a certain resentment lingered.
The other reason for going after Swartz is that he is a progressive activist and passionate champion of the free Internet and of open access. He has been so outspoken about open access in particular that his 2008 "Guerrilla Open Access Manifesto" was removed from its website, apparently in response to Swartz's legal troubles. Indeed, it has got some hairy stuff in it, considering the author's current situation:
We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that's out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks.
On the other hand I could see having left this document online, loud and proud. Because, "stuff that's out of copyright," he plainly says. That throws a particular complexion on the matter.
So what does Open Access mean, exactly? We have several good prototypes already available and thriving, such as the Public Library of Science (PloS), where scientists all over the world have been publishing their papers openly for years, with no fee for access to anyone anywhere on the Internet ever, all published under Creative Commons licenses. Here is a relevant bit from the PLoS website:
In 2003, PLoS launched a nonprofit scientific and medical publishing venture that provides scientists and physicians with high-quality, high-profile journals in which to publish their most important work. Under the open access model, PLoS journals are immediately available online, with no charges for access and no restrictions on subsequent redistribution or use, as long as the author(s) and source are cited, as specified by the Creative Commons Attribution License.
PloS charges those who wish to publish a fee, usually several thousand dollars, to cover peer review and publishing costs. These fees are ordinarily covered by the researcher's institution.
Keep all that in mind as you read a typical comment recently written by someone who understands bupkes about this thing: one Greg Maxwell, who recently uploaded a 33GB file of JSTOR articles onto The Pirate Bay in protest of the Swartz indictment. (Maxwell says the file contains the whole pre-1923 public domain archive of the Philosophical Transactions of the Royal Society.)
The documents are part of the shared heritage of all mankind, and are rightfully in the public domain, but they are not available freely. Instead the articles are available at $19 each—for one month's viewing, by one person, on one computer. It's a steal. From you.
This is about twenty kinds of not true. JSTOR is paid (not by the public, but by institutions) for a service, not for content. The money that individuals pay for these articles goes not to JSTOR, but to the publisher that is making the material available.
Let's follow this out a bit. There are nearly 19,000 documents in this 33GB download, and anyone can take them off The Pirate Bay—and then what? It will tax an ordinary home computer quite a lot to search just this one file, the archives of a single journal of the 1,400-plus currently distributed by JSTOR; that's the tiniest drop in the bucket. The practical futility of Maxwell's gesture only demonstrates that JSTOR is providing an invaluable service to the public, even with respect to documents in the public domain—one that could be improved upon, maybe, but completely impossible for individuals to duplicate using existing technologies.
But the worst misapprehension in Maxwell's remarks is his total misunderstanding of what public domain really means. Shakespeare is "part of the shared heritage of all mankind," too, but does that mean you can march into a Barnes & Noble and take any copy of Shakespeare that you want out of there for free? No! You have to pay Barnes & Noble and Penguin Classics or whomever for making it available to you in a form you can use, in this case a book. To fail to appreciate this point is to weaken the argument for open access by depriving it of clarity and focus.
Consider Project Gutenberg, where someone has kindly volunteered the work of scanning all of Shakespeare into digital form, and still other volunteers have provided text cleanup and money and server space and IT work so that you can download Shakespeare there for "free", but, well, no, it's not free; this has cost and is costing someone something, just as public money has been allocated by your local government to pay publishers and librarians to maintain public copies of books that you can borrow. In each case there is work to be done by people who most often need to be paid (and deserve to be paid) for their efforts.
Finally, making a profit off of public domain works is allowed; indeed, it's half the point. You're allowed to make a hip-hop rendition of Shakespeare's sonnets and then sell it to make money. The crux of the matter here is balancing the public interest against private interests; individuals should have the right to be compensated for their work, and the public should be free to reuse and remix the products of our shared culture.
Enter The Lawyers
A lot of open source advocates in the academy are pretty steamed at Aaron Swartz. As Meredith Farkas, the Head of Instructional Services at Portland State University, wrote in an email: "I don't think that releasing copyrighted works to the public is the best way to make the case for opening up scholarship or promoting open access," and this appears to be a widely held view. The thing is, we have absolutely NO indication, and no way of knowing, that Swartz meant to do anything of the kind. And just on the face of it, I can't imagine he did.
It seems far more likely that if he meant to distribute any JSTOR articles on a file-sharing site, he would have stripped out any copyrighted material first (1.7 million of the 4.8 million articles he downloaded, according to the indictment.) That would be child's play for someone like Swartz to do, and it would certainly have decreased his chances of landing in the soup.
Still, the government's indictment alleges that he intended to distribute the stuff to the public "through one or more file-sharing sites," without offering any details as to why they think he was going to do that. If they hope to prove this allegation based solely on the 2008 Guerrilla Open Access Manifesto, it would seem that they have got an uphill climb.
For one thing, Swartz has been working for years on analyzing huge data sets at Harvard and elsewhere. He has a longstanding professional interest in the study of large data sets. Sure, it's a little bit fishy that he didn't use the network at his home institution in order to access JSTOR. If the allegations in the indictment are true, it would also appear that Swartz took steps to cover his tracks in order to escape detection. I could think of a zillion possible reasons for this with one lobe tied behind my back: Did Swartz want to keep the nature of his work secret from a colleague for some professional reason? Had the Harvard IT department refused to permit him to take that much data down?
As many have pointed out, JSTOR has got its own systems for performing analysis on its data, but so what? No hacker of Swartz's abilities would be likely to need or want the kind of help JSTOR could offer him. That would be like telling Ferran Adrià to go to Whole Foods to get stuff for lunch; dude probably will not be coming back with a precooked pizza.
Swartz is being charged with hacker crimes, not copyright-infringement crimes, because he didn't actually distribute any documents, plus JSTOR didn't even want him prosecuted. These charges are: Wire Fraud, Computer Fraud, Unlawfully Obtaining Information from a Protected Computer, Recklessly Damaging a Protected Computer, Aiding and Abetting, and Criminal Forfeiture, and Being Too Smart for Being Such a Young Guy, and That Seems Dangerous (I made up only the last bit.)
By far the best analysis of the underlying legal reasoning of the Swartz indictment so far comes from the blog of Max Kennerly, a Philadelphia trial lawyer. Kennerly, too, finds it bizarre that U.S. Attorney Ortiz should be pressing this matter in the absence of any further beef between MIT, JSTOR and Swartz.
I don’t see what societal interest Carmen Ortiz think she’s vindicating with the Swartz indictment. According to Demand Progress, JSTOR already settled their claims with him. What more needs to be done here? The “criminal violation” here arises not from any social duty — like, you know, our society’s communal prohibition on murder — but rather from Swartz “exceeding the authorization” imposed by JSTOR on its servers. Prosecuting Swartz criminally makes less sense than prosecuting telecommunications companies for violating their consumer agreements, and we all know that’s not going to happen any time soon. [...] The whole case looks like the iPhone prototype saga again: a civil claim that some overly aggressive prosecutor is trying to dress up as a federal crime.
In order to prove the claim of wire fraud, Kennerly says, Ortiz will have to prove that Swartz meant to defraud JSTOR, which really means "defraud out of money."
I asked Kennerly about this. If Swartz really intended to make the JSTOR documents available on a file-sharing site as the indictment claims, thereby potentially preventing publishers from getting their JSTOR fees, is it still technically "defrauding" even if no money were ever to change hands? He replied (and this is some dense stuff, but please bear with me and get on in there, because it's crucial):
That's a good point you raise, and it could potentially complete the circle to show fraudulent intent. Assuming a jury finds, as a factual matter, that Swartz intended to release the documents, the prosecutors will likely argue that it's a "fraud" because Swartz was only allowed onto JSTOR's servers on the condition that he abide by its rules; if his intent was to release the documents to the public, that would break those rules, and so he "defrauded" JSTOR by misrepresenting his intentions when accessing it.
Consider the Skilling v. United States case (PDF) from the Supreme Court last year (yup, Skilling as in the guy from Enron). Scroll to III(A)(1):
Enacted in 1872, the original mail-fraud provision, the predecessor of the modern-day mail- and wire-fraud laws, proscribed, without further elaboration, use of the mails to advance “any scheme or artifice to defraud.” See McNally v. United States, 483 U. S. 350, 356 (1987) . In 1909, Congress amended the statute to prohibit, as it does today, “any scheme or artifice to defraud, or for obtaining money or property by means of false or fraudulent pretenses, representations, or promises.” §1341 (emphasis added); see id., at 357–358. Emphasizing Congress’ disjunctive phrasing, the Courts of Appeals, one after the other, interpreted the term “scheme or artifice to defraud” to include deprivations not only of money or property, but also of intangible rights.
[You can see how the reasoning would be bound to differ a lot, depending on whether or not you were talking about materials already in the public domain.]
In an opinion credited with first presenting the intangible-rights theory, Shushan v. United States, 117 F. 2d 110 (1941), the Fifth Circuit reviewed the mail-fraud prosecution of a public official who allegedly accepted bribes from entrepreneurs in exchange for urging city action beneficial to the bribe payers. “It is not true that because the [city] was to make and did make a saving by the operations there could not have been an intent to defraud,” the Court of Appeals maintained. Id., at 119. “A scheme to get a public contract on more favorable terms than would likely be got otherwise by bribing a public official,” the court observed, “would not only be a plan to commit the crime of bribery, but would also be a scheme to defraud the public.” Id., at 115.
The prosecutor would say the copyrighted articles were both "property" and, if that didn't work (and I can see it not working) that the copyrights were an "intangible right," like described above.
If I was Swartz's lawyer, though, I would turn around and say: the absence of any pecuniary gain for Swartz removes it from "fraud." The release of those documents is almost certainly copyright infringement, but that's not the same thing as "fraud." Consider this further part of the Skilling case:
In 1987, this Court, in McNally v. United States, stopped the development of the intangible-rights doctrine in its tracks. McNally involved a state officer who, in selecting Kentucky’s insurance agent, arranged to procure a share of the agent’s commissions via kickbacks paid to companies the official partially controlled. 483 U. S., at 360. The prosecutor did not charge that, “in the absence of the alleged scheme[,] the Commonwealth would have paid a lower premium or secured better insurance.” Ibid. Instead, the prosecutor maintained that the kickback scheme “defraud[ed] the citizens and government of Kentucky of their right to have the Commonwealth’s affairs conducted honestly.” Id., at 353.
We held that the scheme did not qualify as mail fraud. “Rather than constru[ing] the statute in a manner that leaves its outer boundaries ambiguous and involves the Federal Government in setting standards of disclosure and good government for local and state officials,” we read the statute “as limited in scope to the protection of property rights.” Id. , at 360. “If Congress desires to go further,” we stated, “it must speak more clearly.” Ibid .
If I was Swartz's lawyer, I'd say we have a McNally issue here. Congress has already spoken on what it means when someone wrongfully distributes someone else's written works: it's a copyright infringement. He didn't deprive anyone of property, he had more copies than they liked. If he intended to distribute them, well, that's an "intent to infringe on copyright," which isn't a crime.
What will the judge do? Beats me. Odds are she or he will send it to a jury to decide on its own what Swartz's intent really was, and, frankly, I can see them acquitting him as lacking the sort of malicious "mens rea" we think of as criminal. At that point the case would be dead. If they convict him, maybe these issues will go to the appellate courts.
Aaron, I Am Your Father
"An indictment is an allegation," Harvard law professor Lawrence Lessig wrote in a recent statement regarding the Swartz case published on the Media Freedom website. "[...] It is one side in a dispute." Lessig, one of the strongest advocates for open access and copyright reform in this country, indeed in the world, has been connected to Swartz for a long time. He brought Swartz on board to design the metadata format for Creative Commons, which Lessig co-founded. Here is how old Aaron Swartz was at that point:
It would not be far off to characterize Lessig as Swartz's mentor. From the same statement:
I can’t believe Aaron did this for personal gain. Unlike, say, Wall Street (and what were the penalties they suffered?), this wasn’t behavior designed to make the man rich. Nor, if the allegations are true, was this behavior designed to interfere with any of JSTOR's activity. It wasn’t a denial of service. It wasn’t designed to take any facility down.
What it was is unclear. What the law will say about it is even more unclear. What is not unclear, however, to me at least, is the ethical wrong here. I have endless respect for the genius and insight of this extraordinary kid. I cherish his advice and our friendship. But I am sorry if he indeed crossed this line. It is not a line I believe it right to cross, even if it is a line that needs to be redrawn, by better laws better tuned to the times.
Information activists like Lessig and Carl Malamud have been very vocal in their condemnation of the tendency of academic publishers to keep their materials locked up for the benefit of "elite" institutions like American universities. Lessig gave a rousing April keynote address on this and related topics at CERN, the European Organization for Nuclear Research ("The Architecture of Access to Scientific Knowledge: Just How Badly We Have Messed This Up"). He was deliberately preaching to the choir, here, in a widely visible forum; it's an important address, well worth watching. CERN is at the forefront of the open access movement, having adopted in 2005 a specific open access publications policy to encourage the free dissemination of scientific research and information. Physicists generally have spearheaded open access initiatives around the world with spectacular results, including the arXiv open access system for publishing scientific papers here in the U.S.
Still, there was maybe an injudicious statement or two in this speech. At one point Lessig showed a slide of a tweet made by Malamud: "Jstor is so morally offensive. $20 for a six-page article, unless you happen to work at a fancy school." (Actually no, though, it's not JSTOR that is making this money, it's free if you are in a developing country, and so on.) If the Aaron Swartz case clarifies the position of open access advocates with respect to nonprofit services like JSTOR, that at least will be a good thing.
The conclusion of Lessig's CERN presentation is particularly stirring.
We need to recognize in the academy, I think, an ethical obligation [...] An ethical obligation which is at the core of our mission. Our mission is universal access to knowledge—not American university access to knowledge, but universal access to knowledge in every part of the globe.
We don't need, for our work, exclusivity; and we shouldn't practice, with our work, exclusivity. And we should name those who do, wrong. Those who do are inconsistent with the ethic of our work.
The aims and ideals of Aaron Swartz can, I believe, be laid to some degree at this man's door. That is something I would be very proud of, if I were Lawrence Lessig. Whatever the results of the government's actions against Swartz—and whether or not those actions are ultimately motivated by an instinct toward intellectual property protectionism of the kind demonstrated by the RIAA and others in the U.S.—there can be little doubt that the motives of people like Lawrence Lessig and Aaron Swartz spring from a desire to serve the public good. To that extent we are in their debt, rather than the reverse.