The Semantic Web, Crowdsourcing and the Future of Open Discourse
A programmers role in harnessing the wisdom of crowds
Nate Olson interviews Yaron Koren via email May 12-14, 2007
Yaron Koren is a freelance Web programmer based in Brooklyn. Two of his creations that work in tandem are Discourse DB and the Semantic Forms extension for MediaWiki, the same software that powers Wikipedia. Discourse DB is a wiki that organizes "the opinions of the world's journalists and commentators about ongoing political events and issues." It has a complex structure under the hood to map the "semantic" relationships of the data that users enter, but the most innovative feature of the site, the Semantic Forms extension, allows for a more intuitive user interface than most similar applications. Anyone can download it to use with a MediaWiki-based wiki.
Nate Olson: Do you consider "crowdsourcing" to be a distinct phenomenon? If so, how do you see it evolving over the next 1-2 years?
Yaron Koren: It's certainly not a new concept--there's been a lot of companies that have run contests to come up with advertising slogans and the like for a long time, for instance. But the Internet has obviously made it easier to do it. I don't personally see much change in the concept over the short term. Companies will try to tap more into crowdsourcing, but they'll discover--if they don't know it already, that is--that it's very hard to control your message if you're not working directly with the people making the content. On the other hand, the Threadless/Cafe Press model, where it's the crowd making the products themselves, seems to be doing quite well.
Q: What motivates people to contribute to a project like Discourse DB?
A: People, in general, want to contribute to projects that already have a lot of contributors and users. That's the Catch-22 of creating a crowd-based site. People don't want to be among the first to contribute to something, because they don't want to put in a lot of effort that will then go to waste if no one else joins in. The second difficulty is that people tend to be users or readers before they become contributors. So the usual motivation for contributing is, you already use the site or software, and you see one or two things that could be improved with just a little bit of work. So you put in that work, and the satisfaction derived from that encourages you to go for bigger improvements.
Q: What incentives can help overcome the Catch-22 of participation? Money? Community praise? Does creating a somewhat formal incentive structure necessarily endanger the integrity of a crowdsourced project?
A: Money can work, obviously, though then people may come to expect that financial reward for every bit of work. It works if what you're crowdsourcing is actually being sold, but as I said, I don't know if that even counts as crowdsourcing. For the non-paid kind, I think the most important thing is just that people feel that their efforts will be used by others and won't go to waste. If you can make that guarantee, I think you have a good shot at succeeding.
Q: Is there really "wisdom in crowds?" If so, what's the clearest example you know of?
A: Well, terms like "wisdom in crowds" will mean ten different things to ten different people. To me, it's the idea that aggregating the opinion of a large group of people will, over time, give more correct opinions than taking any one person's opinion, or polling a small group of experts. That's distinct from "crowdsourcing" or collaborative projects like wikis. To me, the clearest demonstration that aggregation works is just the success of democracy as a system of government, compared to all the others that have been tried. Plenty of systems of government have billed themselves as the rule of an enlightened elite over the uneducated masses, and they've all failed, sometimes spectacularly so.
Q: What has surprised you about Discourse DB?
A: I've been surprised by the kinds of topics people have been interested in. Political commentators like to write about issues like the Iraq War, military tribunals, etc., but the most popular topic on the site, since pretty much the beginning, has been the Sarbanes-Oxley Act. And the second most popular is England's Human Rights Act, which has almost never been written about in the United States. Whether there's some deeper significance to that disconnect, I don't know.
Q: The Semantic Forms extension is simple to understand in practical terms--it lets wiki users enter information via forms instead of "wikitext" markup. But what is its larger significance?
A: It abstracts away all the semantic theory and markup, and lets the users just focus on the data. Most users, when entering data, don't want to do anything more complicated than just filling out a form, for a variety of good reasons. Semantic wikis let a large group of people collaborate on a set of data, in a way that is difficult or impossible to do with conventional databases--but they're based on concepts that are rather obscure. So Semantic Forms allows people to get all the huge benefits of semantic wikis, with hopefully none of the drawbacks of being esoteric.
Q: How does crowdsourcing relate to the "semantic Web" (broadly understood, small "s")? What role will crowdsourcing play in shaping the semantic Web, and vice versa? Is the practical link between them the wiki format?
A: I think the two are very much tied in with each other, in that data stores that are best created through crowdsourcing are also best implemented through the semantic Web, and semantic wikis in particular. When you want to crowdsource a set of data, what you're really saying is, "This set of data is too big for any one person to enter it all, and too messy, or subjective, or diffuse to be automated." Given that kind of data, I think a semantic wiki is
definitely the technology to use, because it lets users modify the data structures, and it makes collaboration much easier. The most famous crowdsourced database right now is probably IMDb; if they were starting the site today I'd think implementing it as a semantic wiki would make much more sense for them.
Q: How do you see the Semantic Forms extension maturing over the next year or so? Do you hope to release other software building-blocks, like SF, that streamline collaborative user interfaces?
A: I think the next big improvements for semantic wikis is not in the data entry but in the querying and visualization of data. The current tools for querying, like SPARQL, are very free-form, and can be used to query any set of semantic data. That flexibility also means that they're very difficult for non-techies to use. But if you pre-define the structure of the data, you should be able to interact with it in ways similar to those of database reporting tools. It's the same concept as Semantic Forms, just for data output instead of input.
5/23/07









