An attempt to describe the lessons learned from "GuttenPlag".
Food for thought for future net-based investigation platforms.
A contribution to the discussions about "investigative Crowdsourcing".
first published June 8, 2012 in German
→ PDF-Version (in German) ←
r. Karl-Theodor zu Guttenberg had made it to the top of German politics at the age of 40. Coming from an aristocratic family in Bavaria, he was the Secretary of Defense, the most popular politician in Germany, and many hoped he would be the next Chancellor. But in February 2011, he stumbled and fell. Not over a political or private affair but over his dissertation. As it turned out, he had copied it in large parts from many different sources. "Abstrus", absurd, as Guttenberg put it. But true. Two weeks after the newspaper "Süddeutsche Zeitung" broke the news on accusations of plagiarism, and two weeks after volunteers on "GuttenPlag" had started to unearth more and more cases of plagiarism in his dissertation, Guttenberg stepped down. It was the birth of investigative crowdsourcing in Germany and so far one of the largest crowdsourcing projects worldwide, with more than 1000 volunteers investigating together. The volunteers documented 1218 cases of plagiarism on 393 pages; they identified 135 different sources such as newspaper articles, scientific papers, speeches, and other texts. Sources included the US Congressional Research Service, the Research Service of the German Parliament, and even an undergraduate's essay. Now, more than one year later, the swarm that made up "GuttenPlag" has vanished, left without a forwarding address. What remains is the insight that volunteers, cooperating by way of the Internet in the investigation of an explosive topic, can unleash a tremendous force. It is not longer just journalists, NGOs, or prosecutors who decide which topics bear more intense scrutiny. Everyone can participate. Together with like-minded people, they can use the Internet to search for the truth. Fukushima, the euro bailout scheme, the delayed and over-budget Berlin Airport – there have been a number of recent issues that could be investigated using an online platform. There are other topics that are not quite of this scale that could also be looked at by a sort of "investigative crowdsourcing", as we will be calling such online cooperation and collaboration activities.
In order to make it easier for interested persons to start a collaborative, investigative project, we want to describe the lessons learned from the "GuttenPlag" project. In passing these lessons on we hope that others can learn from both the errors and the sucesses of "GuttenPlag". Based on these experiences it is to be hoped that – should one be necessary – another swarm may form and that "investigative crowdsourcing" be seen in the German-speaking world as a mainstream tool for a society striving for enlightenment.
Finally, we want to propose some questions for further debate on the question of how such future investigations can be organized. We welcome your comments and ideas!
1. Why investigate with a swarm?
Because a swarm can, at best, find more information than individuals working alone, or even through a team of journalists. The swarm can achieve more than the sum of the contributions of its individuals.
This was the case with "GuttenPlag": Thousands of Internet users working together found more plagiarism than was possible with the editorial staffs of large media entities. It was only through this collaborative effort that it was possible to see the true scope of the case. However, "GuttenPlag" was never a competitor for the media. The platform complemented and assisted the investigative journalists in their work. This could be the general case for investigative platforms: In situations in which professional researchers – editorial staff, NGOs, universites, prosecutors – are unable to probe the depths of a topic, a swarm can provide additional assistance.
One should only undertake such a project if one is willing to be one of the hardest workers. While classical crowdsourcing (such as Amazon's commercial service "Mechanical Turk") is often used in order to delegate drudge work to the cheaper labor available on the Internet, this does not work for investigative topics. The open questions and the complexities of the themes demand interested, accurate, throrough, and competent researchers. An enormous amount of time must be invested in order to coordinate, motivate, and support such a group.
„Investigative crowdsourcing“ is more work than traditional crowdsourcing. But for those who dare engage in such a collaborative, net-based investigation, the rewards reaped will surpass those from traditional methods. A larger group of participants also has more resources for seeing an investigation through to the end. This can enable the group to perhaps take a decisive stept that would not otherwise occur.
2. What attracts a swarm?
One cannot expect that an investigative swarm form for every open question. There are three reasons why this worked in the case of "GuttenPlag":
(1) People were willing to invest their free time in "fighting the good fight" for something that was meaningful for them. Quite a number of the "GuttenPlag" activists – many from an academic background or themselves graduate students preparing their dissertations – were strongly motivated from their own perspective to collect proof, document the academic injustice, and demonstrate that the doctorate needed to be rescinded. In addition, many people wanted to be a part of such a movement of thousands of online volunteers who were in the process of getting one over on an unsympathetic public figure. This was perhaps not the best of intentions. In some cases it was probably a bit of both reasons. But the participants were united in the drive to counter the statements of zu Guttenberg (Translator's note: zu Guttenberg stated that the accusations that his dissertation was a plagiarism were "absurd" but that there might be an odd footnote that was amiss that would be corrected in a second edition) and to put public pressure on him.
(2) "GuttenPlag" worked because the threshhold for participation for beginners was very low. A digital copy of the thesis was soon found, and people learned quickly how to look for plagiarisms. Many of the plagiarized sources were freely available online. The documentation of a plagiarism only needed a few mouse clicks and a litte bit of Copy & Paste. Just a few minutes of effort and a bit of luck were all that was needed for someone to be able to document a new plagiarism.
- (3) The public interest in this topic was enormous, in particular because of the massive media attention. This interest of the media and the general public gave the participants the feeling of contributing to something important and provided more motivation.
In addition to numerous intermediate reports, the contributors were rewarded with an increase in their own personal competence: They learned strategies for searching and how to categorize the plagiarisms that they found.
This can be transferred to other, more complex topics. Investigation platforms could be offering their participants a deal: We will learn new skills together, perhaps with the help of experts or with collaboratively produced interactive tutorials. In return they contribute their time and apply this new knowledge to throwing light on a particular topic. In this case, not only those who are interested in the topic will be active, but also those who wish to learn something new. For example, people who want to investigate a putative case of financial fraud will join forces with those wanting to learn how to read a balance sheet (and hopefully those who want to do both).
3. Where is the swarm headed?
„GuttenPlag“ was not an uncontrolled, magically self-organized pile of peers. Just like a flock of birds that flies south in the fall, the collaborative process was guided by a common goal. There were clear agreements about the purpose, principles, and methods of working in the swarm: We are only documenting, we are not setting out political demands. Everything is checked twice, we do not evaluate anything that is not borne by the data. Everything is discussed in the group.
In order for these agreements to be strictly adhered to and to be communicated to new activists, it was necessary to have a clear manifest, a kind of dogma, stated on the home page of the investigation platform, as well as having moderators. They are the backbone of every undertaking of "investigative crowdsourcing". At the same time, they are rare assets. Only a few people in German-speaking countries have experience in organizing knowledge online, in bringing about decisions in online communities, and in resolving conflicts in such teams. Every investigative platform needs as many of them as possible in order for the project to have someone available 24/7 during peak times of the project, and to ensure that the project does not turn into a ghost town during lulls. Even if a platform is not overrun with people at the beginning, the number and quality of the moderators decides if a crowdsourcing project will be a success or if it will end in chaos.
There were two such moderators at "GuttenPlag" from the beginning: both of the initiators of the platform. One was in Germany, one was in the USA, so that because of the time difference there was always one of them online. Over time there was a core team of about 20 moderators who were regularly active on the platform. Together with approx. 100 other activists they made up a sort of "task force" that determined the direction of the work and initiated individual projects.
Anyone could be made moderator at "GuttenPlag" when they began to sort out and take care of the content of the platform. If one of the other moderators saw one of the volunteers working dedicatedly on the text, they would be offered administrator status. This would give them added functions such as protecting pages from vandalism, editing protected pages, or blocking troublemakers.
The moderators of "GuttenPlag" were for the most part tech savvy people who were editors for the "Wikipedia", or were active in Internet policy discussions at the Chaos Computer Club, or who worked as researchers. For investigative platforms that are looking into other topics it would be feasable for the moderators to come from far different areas. But it is generally useful if at least some of the moderators have had experience in collaborative projects, online or offline.
Without such moderators, "GuttenPlag" would have degenerated into chaos. Particularly in the first days, there were so many people on the platform that it was only possible to get any work done because of the constant stream of volunteers with experience in open content management systems ("wikis") who helped structure the content at "GuttenPlag" and cleaned up after the vandalism.
Since it is easier to keep activists on a platform then to recruit new people, it falls to the moderators to lead the swarm, as well as to try and keep the friction within the group to a minimum. In order to lead, the moderators need to know which task currently has the highest priority, what the next step is, how to delegate tasks to volunteers, how to keep people informed of what is happening, and to be able to promptly respond to new volunteers asking "how can I help?". In order to moderate in a conflict situation, the moderators have to use de-escalation strategies – and have the final word in the extreme case of having to forcibly remove someone from the swarm.
With "GuttenPlag" there were a number of times at which the tension in the group created a sort of herd instinct that threatened to spin out of control and split the group. This would have hindered the investigation. It is normal for there to be conflicts in such a situation: One does not see the others directly, the only communication is indirect communication and thus misunderstandings tend to build up. If moderators want to be able to influence such situations, they need to be constantly active in the project in order to be credible with the swarm.
Just as with many other collaborative processes on the Internet, a chat room was very important for "GuttenPlag". This was a real-time communication channel connecting up the volunteers that was used alongside of the investigative platform. Here it was possible to solve conflicts out of the public view. Moderators and the "Task force" were able to coordinate with each other and to discuss their problems "live". At "GuttenPlag" the activists met evenings in a chat room at appointed times in order to decide what direction the future work was to take. The chat was thus the motor for a number of initiatives on the platform, and the task force was able to show the other volunteers how to solve particular problems.
Both "GuttenPlag" and the follow-up platform "VroniPlag" demonstrated that there is a danger in having a separate chat room. A chasm grows between the volunteers who are only active on the platform, and those who are also active in the chat. This can only be bridged by active communication.
4. The methodology
An investigative platform needs to have the technical basis in order to withstand an onslaught of volunteers. This was not the case in the beginning of "GuttenPlag". The project began as an open text document using the "Google Docs" service. This broke down on the following day, as it could not keep up with the load of interested persons. As soon as about 100 people were working in the document at the same time, it would get flakey. All the persons currently editing would be thrown off, or it became painfully slow to use, or the document could not be opened at all.
The swarm moved to a wiki, a web site where anyone can add, edit, or delete pages using a browser. The content management system that is behind the screen stores information about who changed what [when]. This makes it possible to understand where and how errors occurred and permits the restoration of a previous version. Some individuals, the moderators, have extended rights so that they can delete pages or protect them from vandals.
"GuttenPlag" chose the California company Wikia as its technology provider. The initiators of "GuttenPlag" felt that it would be able to deal with a large number of readers and editors because it has a very broad technical basis. (In all, over 20,000 different IP addresses edited pages in the wiki. Wikia reported that there had already been more than ten million page views just two weeks after the start of the wiki). The service is free of charge and also permits anonymous volunteers. In addition, the initiators of "GuttenPlag" trusted the company, as it was founded by the initiator of "Wikipedia". Wikia took care of the work and costs involved with providing the technology, and in exchange sold ads to be displayed on the pages.
An additional reason for choosing Wikia was that the servers are physically located in the USA, and thus are not susceptible to restraining orders or political pressure [from German authorities]. The decision to use a commercial provider for an investigative platform, or to run the server yourself, needs careful consideration. On one's own server one had control of the user interface, the connection to external systems, the content, and the users. This might be a good reason for an investigative platform to remain independent of the infrastructure of institutions such as universities or media companies. That way, no one can stop the project or force it into a particular direction. In addition, one can avoid alienating potential volunteers who would be leery of trusting a particular company or institution.
An investigative platform should also offer its volunteers the technical tools for simplifying the research. For example, having a program that checks the numbers in an income tax declaraction for plausibility according to Benford's Law would be advantageous. The successor to "GuttenPlag", "VroniPlag", has a numer of semi-automatic tools that were developed for checking suspicious passages directly with Google or for coloring identical passages in texts.
5. Dealing with destructive elements
Whoever starts a research platform must be prepared to deal with people who are attracted to the platform with the intention of provoking failure. "GuttenPlag" had its share of troublemakers who tried (perhaps motivated by political reasons) to interfere with the searching for plagiarism. For example, they would post plausible but false "findings" in the lists. But even people who do not subscribe to the goals of the platform do not necessarily have to be seen as a large problem, if these three strategies are observed:
(1) Some people find it important to announce their opinions in prominent locations on the Internet, for example as a comment on articles by news organizations or in blogs. An investigative platform should offer the possibility of commenting, announced on the first page and easy to reach and spontaneously use without registration. The forum at "GuttenPlag", however, quickly developed into an unmoderated "cesspool" (as it was called amongst the moderators) because no one had the time or capacity to deal with the aggressive and non-conctructive comments filling the pages. The ease of use of the forum was found to be satisfactory for the commentators. They could dump their frustrations there without getting in the way of the investigation itself and inhibiting the work process. Maybe it would be possible to moderate such a forum and thus turn a potential "cesspool" into a gold mine. Perhaps the critics can be invited to convince themselves of the utility of the platform and be turned into volunteers.
- (2) Moderators are constantly having to clean up after vandals who delete content from the platform, insert incorrect information, or – as was the case with "GuttenPlag" – add pages and pages of Bible verses to the lists of potential plagiarisms. The moderators have to delete these entries, or revert them, and sometimes have to resort to blocking the users involved. At least the wiki generates a publically visible list of all the edits automatically so that it can quickly be seen who just changed what.
- (3) Some people want to work with the platform, but are of a different opinion or want to move the platform in a different direction from the majority of the volunteers. Sometimes they act against the decisions of the swarm. Instead of casting them out, it has proved effective at "GuttenPlag" for a moderator to engage that person in a private chat or via email in a discussion and to forge a compromise. In the case that a volunteer has gone a bit too far, it is easier to accept the criticism in a private communication channel. Often it was discovered in these discussions that it was all just a misunderstanding.
In general: the more moderators there are for keeping everything in order, the less destruction can be caused by individual troublemakers. Deletions and blocking of users should only be used in clearly defined situations and only as an ultima ratio. Otherwise the moderators are easy targets for allegations of censorship. Deletions and blockings often serve to escalate conflicts about differences of opinion within the project.
6. Documenting the work
A bloodthirsty mob, descending on some poor, defenseless creature out of the depths of the anonymous Internet with the goal of publically humiliating him or her – this should not be the impression a swarm leaves behind, and it should not even whisper at such an intention. If the investigative platform follows social values and norms, it should still be possible to permit volunteers to participate without hurdles. Pseudonyms connect with a pre-existing identity on the Internet. Other net savvy people may already know and trust this pseudonym.
"GuttenPlag" stood in the crossfires of hard criticism because of the decision not to insist on the use of civil names. In hindsight, this decision was absolutely necessary, as it would have frightened off many people to insist on such names. (For example: one volunteer who was named in the media was sent anonymous, threatening letters). But if anonymity is permitted on a research platform, special care must be taken that all of the steps can be retraced and that the volunteers apply ethical and practical guidelines of investigative journalism and science. Evidence must be collected without prejudging the result; no premature condemnations; research must be done in all possible directions; both incriminating and exculpating material must be treated; multiple sources must be used, etc.
For the volunteers at "GuttenPlag", it was clear on the basis of the facts that the doctorate needed to be rescinded. It was considered to be the goal of the platform to deliver the proof of these facts. It was not, however, a goal to express political demands for resignation. (There were discussions about this, and drafts of an open letter were set out, but this would have harmed the self-chosen strict objectivity and would have meant that all volunteers would be forced to share this opinion.)
In the first step, the documentation of the facts, the platform was very cautious – particularly because speculations could be dangerous legally. In the beginning the documentation registered "textual matching", "peculiarities", and "strong indicators". The assessment of these points as plagiarism was only taken at a later time. This second step, the assessment of the research result, needs to have clear, publically visible rules so that all of the volunteers can proceed in a like-minded manner. Often the rules are modified as the assessment proceeds. Every assissment needs to be conducted along the lines of the scientific principle of "peer review" with a four-eyes principle. Moderators need to assess samples and make all steps taken reviewable for the general public. The platform needs to undertake all possible steps to defend itself against criticism of the work. As soon as the work is published the attacks can be massive, so the work needs to be unassailable – any weaknesses in the arguments need to be clearly stated.
The preliminary results need to be made clear for the general public through the use of evaluative wording such as "such an explanation seems improbably because ..." or "here we see the pattern clearly". Statements like this make it easier for the media to quote the findings of the platform.
When an intensive documentation and evaluation phase is over, the platform needs to find a good time to close the case (instead of having the project peter out because of time and energy deficits as was the case with "GuttenPlag"). A report is on the one hand an important contribution to the public discussion of the case. On the other hand, it gives the volunteers a sense of closure. The report should document the methods of research and evaluation used and attempt to summarize and analyze the findings. As in any scientific paper, the report should include open questions and explain the steps that need to be taken so that the volunteers can judge the amount of time needed to continue with this case. The platform should not, however, completely disappear, because at some time point in the future there may be new findings show up, or the older findings can be seen in an new light and thus take on a new meaning.
7. Open communication
The barcode used in "GuttenPlag" was designed to visualize the places in the doctoral thesis of Karl-Theodor zu Guttenberg at which plagiarism had been found. Its purpose was to serve as a kind of progress bar and to inform new volunteers as to which parts of the thesis had already been investigated. It turned out, however, to be a good tool for communicating with the general public, because it was intuitively understandable and made a nice illustration for reporting the case in the media. This experience demonstrated that research platforms need to be able to visualize their work and to publish it under as free a license as possible. A good possibility here is the use of the "Creative Commons Attribution" license. This permits anyone to use the material, as long as the source is named.
It is also a good idea to keep an open channel with mainstream media, press agencies, and specialized blogs from the very beginning. Collaborators in a research platform don't have to feel the need to communicate every finding immediately. But they should communicate openly, in order to build up trust. Journalists often want to have home stories, personalized stories. Research platforms can to a certain degree play along with this, but the facts are what should be in the focus of any story. Exclusive rights to a story weaken the position of independence of the platform and position it as a competitor to the other media players. All serious media should be treated equally. (One does need to respect deadlines – national newspapers often print their first issue in the late afternoon of the day before, weekly newspapers and magazines often have a deadline 1 oder 1/2 days prior to publication.)
A press review posted on the platform is handy as a motivator for the activists, but it is also good for external people, to demonstrate the relevance of the work that is done by the volunteers.
Food for thought
On the basis of the lessons learned at "GuttenPlag" we, the authors of this text, discussed how "investigative crowdsourcing" could be developed in the German-speaking world so that collaborative research projects such as "GuttenPlag, "VroniPlag" or "WulffPlag" don't remain singularities. In the following we formulate our ideas as food for thought – but perhaps we are completely off track with this. We hope that you will be active in contributing own ideas, objections, and experience reports!
First idea: A central investigation platform?
We asked ourselves: Would it make sense to set up a central, permanent research platform that would be open for all projects? That would have the advantage that in time this plattform would offer a good infrastructure for all sorts of research that would be available to everyone. In and around the platform there would be an experienced and well-coordinated team that would develop competency in research over time. They could be an extablished first point of contact for project ideas. In addition, a solid community would possibly be easier to mobilize than having to set up a new platform for every new case and make it generally known. What would the disadvantages of such a central platform be? How could misuse be avoided? How can it be avoided that the platform falls into disuse? Would individual platforms for each case be better?
Second idea: A foundation?
Should "investigative crowdsourcing" be funded by an independent foundation? It could be organized so that the foundation itself does not have a research agenda, so that the individual platforms have a higher respect. It could restrict the funding to be just for technical and procedural help. The foundation could possibly take care of any donations that might arrive in order to set up a server infrastructure or for developing research tools. Or would such a foundation be in complete opposition to the idea of a swarm? What are the dangers involved in such a pooling of interests?
Third idea: Symbiosis with established media?
What position should a research platform take in relation to mainstream media? What about a symbiosis, a relationship in which both sides profit from a cooperation? "GuttenPlag" was both the target of reporting and the source of information for the media at the same time. The media was important for "GuttenPlag" as a multiplier that brought in new people who had just heard about the platform. This could be intensified: Media such as the British daily "The Guardian" or the US-American research organisation "ProPublica" have recognized that outside of their research staff, perhaps even amongst their readers, there are experts for almost every topic imaginable. They are focussing on cooperating with the general public ("Open Journalism").
Is this idea even being discussed in the German-language world? Would it even work here, that journalists contribute their experience towards the success of a research platform and in return for their cooperation obtain better research results as they would working alone? Or should research platforms work independently of media? Is the danger of assimilation perhaps to large? Is there even an individual medium that would be able to get cozy with a critical mass?
Fourth idea: New topics?
What topics would have been interesting for "investigative crowdsourcing" in the past twelve months? Why? What topics are not even possible for a public, collaborative research? What topics would interest you so much, that you would be active in a platform that was researching the topic? What topics are the most important ones that would need to be investigated by a swarm?
What do you think?
This text only reflects the personal opinions of the two authors. It was written on their own iniative, independent of "GuttenPlag" and their respective employers. It was done on their own time on private computers. All costs incurred, such as travel expenses, were paid for out of their own pockets.
The authors wish to thank Kai Biermann from „ZEIT ONLINE“, Johannes Kuhn from „Süddeutsche.de“, Amanda Michel from „The Guardian“ as well as KayH from „GuttenPlag“/„VroniPlag“ for their helpful comments on drafts of this text.
The text „Reflections on a Swarm“ by PlagDoc and Martin Kotynek is, in addition to the license that is valid for this wiki, available unter a CC-BY 3.0 license (Creative Commons Namensnennung 3.0 Deutschland ).