The "crowd computing" revolution


Michael Copeland in the Atlantic: “Software might be eating the world, but Rob Miller, a professor of computer science at MIT, foresees a “crowd computing” revolution that makes workers and machines colleagues rather than competitors….
Miller studies human-computer interaction, specifically a field called crowd computing. A play on the more common term “cloud computing,” crowd computing is software that employs a group of people to do small tasks and solve a problem better than an algorithm or a single expert. Examples of crowd computing include Wikipedia, Amazon’s Mechanical Turk (where workers outsource projects that computers can’t do to an online community) a Facebook’s photo tagging feature.
But just as humans are better than computers at some things, Miller concedes that algorithms have surpassed human capability in several fields. Take a look at libraries, which now have advanced digital databases, eliminating the need for most human reference librarians. There’s also flight search, where algorithms are much better than people at finding the cheapest fare.
That said, more complicated tasks even in those fields can get tricky for a computer.
“For complex flight search, people are still better,” Miller says. A site called Flightfox lets travelers input a complex trip while a group of experts help find the cheapest or most convenient combination of flights. “There are travel agents and frequent flyers in that crowd, people with expertise at working angles of the airfare system that are not covered by the flight searches and may never be covered because they involve so many complex intersecting rules that are very hard to code.”
Social and cultural understanding is another area in which humans will always exceed computers, Miller says. People are constantly inventing new slang, watching the latest viral videos and movies, or partaking in some other cultural phenomena together. That’s something that an algorithm won’t ever be able to catch up to. “There’s always going to be a frontier of human understanding that leads the machines,” he says.
A post-employee economy where every task is automated by a computer is something Miller does not see happening, nor does he want it to happen. Instead, he considers the relationship between human and machine symbiotic. Both machines and humans benefit in crowd computing, “the machine wants to acquire data so it can train and get better. The crowd is improved in many ways, like through pay or education,” Miller says. And finally, the end users “get the benefit of a more accurate and fast answer.”
Miller’s User Interface Design Group at MIT has made several programs illustrating how this symbiosis between user, crowd and machine works. Most recently, the MIT group created Cobi, a tool that taps into an academic community to plan a large-scale conference. The software allows members to identify papers they want presented and what authors are experts in specific fields. A scheduling tool combines the community’s input with an algorithm that finds the best times to meet.
Programs more practical for everyday users include Adrenaline, a camera driven by a crowd, and Soylent, a word processing tool that allows people to do interactive document shortening and proofreading. The Adrenaline camera took a video and then had a crowd on call to very quickly identify the best still in that video, whether it was the best group portrait, mid-air jump, or angle of somebody’s face. Soylent also used users on Mechanical Turk to proofread and shorten text in Microsoft Word. In the process, Miller and his students found that the crowd found errors that neither a single expert proofreader nor the program—with spell and grammar check turned on—could find.
“It shows this is the essential thing that human beings bring that algorithms do not,” Miller said.
That said, you can’t just use any crowd for any task. “It does depend on having appropriate expertise in the crowd. If [the text] had been about computational biology, they might not have caught [the error]. The crowd does have to have skills.” Going forward, Miller thinks that software will increasingly use the power of the crowd. “In the next 10 or 20 years it will be more likely we already have a crowd,” he says. “There will already be these communities and they will have needs, some of which will be satisfied by software and some which will require human help and human attention. I think a lot of these algorithms and system techniques that are being developed by all these startups, who are experimenting with it in their own spaces, are going to be things that we’ll just naturally pick up and use as tools.”

Why Crowdsourcing is the Next Cloud Computing


Alpheus Bingham, co-founder and a member of the board of directors at InnoCentive, in Wired: “But over the course of a decade, what we now call cloud-based or software-as-a-service (SaaS) applications has taken the world by storm and become mainstream. Today, cloud computing is an umbrella term that applies to a wide variety of successful technologies (and business models), from business apps like Salesforce.com, to infrastructure like Amazon Elastic Compute Cloud (Amazon EC2), to consumer apps like Netflix. It took years for all these things to become mainstream, and if the last decade saw the emergence (and eventual dominance) of the cloud over previous technologies and models, this decade will see the same thing with crowdsourcing.
Both an art and a science, crowdsourcing taps into the global experience and wisdom of individuals, teams, communities, and networks to accomplish tasks and work. It doesn’t matter who you are, where you live, or what you do or believe — in fact, the more diversity of thought and perspective, the better. Diversity is king and it’s common for people on the periphery of — or even completely outside of — a discipline or science to end up solving important problems.
The specific nature of the work offers few constraints – from a small business needing a new logo, to the large consumer goods company looking to ideate marketing programs, or to the nonprofit research organization looking to find a biomarker for ALS, the value is clear as well.
To get to the heart of the matter on why crowdsourcing is this decade’s cloud computing, several immediate reasons come to mind:
Crowdsourcing Is Disruptive
Much as cloud computing has created a new guard that in many ways threatens the old guard, so too has crowdsourcing. …
Crowdsourcing Provides On-Demand Talent Capacity
Labor is expensive and good talent is scarce. Think about the cost of adding ten additional researchers to a 100-person R&D team. You’ve increased your research capacity by 10% (more or less), but at a significant cost – and, a significant FIXED cost at that. …
Crowdsourcing Enables Pay-for-Performance.
You pay as you go with cloud computing — gone are the days of massive upfront capital expenditures followed by years of ongoing maintenance and upgrade costs. Crowdsourcing does even better: you pay for solutions, not effort, which predictably sometimes results in failure. In fact, with crowdsourcing, the marketplace bears the cost of failure, not you….
Crowdsourcing “Consumerizes” Innovation
Crowdsourcing can provide a platform for bi-directional communication and collaboration with diverse individuals and groups, whether internal or external to your organization — employees, customers, partners and suppliers. Much as cloud computing has consumerized technology, crowdsourcing has the same potential to consumerize innovation, and more broadly, how we collaborate to bring new ideas, products and services to market.
Crowdsourcing Provides Expert Services and Skills That You Don’t Possess.
One of the early value propositions of cloud-based business apps was that you didn’t need to engage IT to deploy them or Finance to help procure them, thereby allowing general managers and line-of-business heads to do their jobs more fluently and more profitably…”

The small-world effect is a modern phenomenon


New paper by Seth A. Marvel, Travis Martin, Charles R. Doering, David Lusseau, M. E. J. Newman: “The “small-world effect” is the observation that one can find a short chain of acquaintances, often of no more than a handful of individuals, connecting almost any two people on the planet. It is often expressed in the language of networks, where it is equivalent to the statement that most pairs of individuals are connected by a short path through the acquaintance network. Although the small-world effect is well-established empirically for contemporary social networks, we argue here that it is a relatively recent phenomenon, arising only in the last few hundred years: for most of mankind’s tenure on Earth the social world was large, with most pairs of individuals connected by relatively long chains of acquaintances, if at all. Our conclusions are based on observations about the spread of diseases, which travel over contact networks between individuals and whose dynamics can give us clues to the structure of those networks even when direct network measurements are not available. As an example we consider the spread of the Black Death in 14th-century Europe, which is known to have traveled across the continent in well-defined waves of infection over the course of several years. Using established epidemiological models, we show that such wave-like behavior can occur only if contacts between individuals living far apart are exponentially rare. We further show that if long-distance contacts are exponentially rare, then the shortest chain of contacts between distant individuals is on average a long one. The observation of the wave-like spread of a disease like the Black Death thus implies a network without the small-world effect.”

Facilitating scientific discovery through crowdsourcing and distributed participation


Antony Williams in  EMBnet. journal:” Science has evolved from the isolated individual tinkering in the lab, through the era of the “gentleman scientist” with his or her assistant(s), to group-based then expansive collaboration and now to an opportunity to collaborate with the world. With the advent of the internet the opportunity for crowd-sourced contribution and large-scale collaboration has exploded and, as a result, scientific discovery has been further enabled. The contributions of enormous open data sets, liberal licensing policies and innovative technologies for mining and linking these data has given rise to platforms that are beginning to deliver on the promise of semantic technologies and nanopublications, facilitated by the unprecedented computational resources available today, especially the increasing capabilities of handheld devices. The speaker will provide an overview of his experiences in developing a crowdsourced platform for chemists allowing for data deposition, annotation and validation. The challenges of mapping chemical and pharmacological data, especially in regards to data quality, will be discussed. The promise of distributed participation in data analysis is already in place.”

Smart Machines: IBM's Watson and the Era of Cognitive Computing


New book from Columbia Business School Publishing: “We are crossing a new frontier in the evolution of computing and entering the era of cognitive systems. The victory of IBM’s Watson on the television quiz show Jeopardy! revealed how scientists and engineers at IBM and elsewhere are pushing the boundaries of science and technology to create machines that sense, learn, reason, and interact with people in new ways to provide insight and advice.
In Smart Machines, John E. Kelly III, director of IBM Research, and Steve Hamm, a writer at IBM and a former business and technology journalist, introduce the fascinating world of “cognitive systems” to general audiences and provide a window into the future of computing. Cognitive systems promise to penetrate complexity and assist people and organizations in better decision making. They can help doctors evaluate and treat patients, augment the ways we see, anticipate major weather events, and contribute to smarter urban planning. Kelly and Hamm’s comprehensive perspective describes this technology inside and out and explains how it will help us conquer the harnessing and understanding of “big data,” one of the major computing challenges facing businesses and governments in the coming decades. Absorbing and impassioned, their book will inspire governments, academics, and the global tech industry to work together to power this exciting wave in innovation.”
See also Why cognitive systems?

And Data for All: On the Validity and Usefulness of Open Government Data


Paper presented at the the 13th International Conference on Knowledge Management and Knowledge Technologies: “Open Government Data (OGD) stands for a relatively young trend to make data that is collected and maintained by state authorities available for the public. Although various Austrian OGD initiatives have been started in the last few years, less is known about the validity and the usefulness of the data offered. Based on the data-set on Vienna’s stock of trees, we address two questions in this paper. First of all, we examine the quality of the data by validating it according to knowledge from a related discipline. It shows that the data-set we used correlates with findings from meteorology. Then, we explore the usefulness and exploitability of OGD by describing a concrete scenario in which this data-set can be supportive for citizens in their everyday life and by discussing further application areas in which OGD can be beneficial for different stakeholders and even commercially used.”

Choose Your Own Route on Finland's Algorithm-Driven Public Bus


Brian Merchant at Motherboard: “Technology should probably be transforming public transit a lot faster than it is. Yes, apps like Hopstop have made finding stops easier and I’ve started riding the bus in unfamiliar parts of town a bit more often thanks to Google Maps’ route info. But these are relatively small steps, and it’s all limited to making scheduling information more widely available. Where’s the innovation on the other side? Where’s the Uber-like interactivity, the bus that comes to you after a tap on the iPhone?
In Finland, actually. The Kutsuplus is Helsinki’s groundbreaking mass transit hybrid program that lets riders choose their own routes, pay for fares on their phones, and summon their own buses. It’s a pretty interesting concept. With a ten minute lead time, you summon a Kutsuplus bus to a stop using the official app, just as you’d call a livery cab on Uber. Each minibus in the fleet seats at least nine people, and there’s room for baby carriages and bikes.
You can call your own private Kutsuplus, but if you share the ride, you share the costs—it’s about half the price of a cab fare, and a dollar or two more expensive than old school bus transit. You can then pick your own stop, also using the app.
The interesting part is the scheduling, which is entirely automated. If you’re sharing the ride, an algorithm determines the most direct route, and you only get charged as though you were riding solo. You can pay with a Kutsuplus wallet on the app, or, eventually, bill the charge to your phone bill.”

NEW Publication: “Reimagining Governance in Practice: Benchmarking British Columbia’s Citizen Engagement Efforts”


Over the last few years, the Government of British Columbia (BC), Canada has initiated a variety of practices and policies aimed at providing more legitimate and effective governance. Leveraging advances in technology, the BC Government has focused on changing how it engages with its citizens with the goal of optimizing the way it seeks input and develops and implements policy. The efforts are part of a broader trend among a wide variety of democratic governments to re-imagine public service and governance.
At the beginning of 2013, BC’s Ministry of Citizens’ Services and Open Government, now the Ministry of Technology, Innovation and Citizens’ Services, partnered with the GovLab to produce “Reimagining Governance in Practice: Benchmarking British Columbia’s Citizen Engagement Efforts.” The GovLab’s May 2013 report, made public today, makes clear that BC’s current practices to create a more open government, leverage citizen engagement to inform policy decisions, create new innovations, and provide improved public monitoring­—though in many cases relatively new—are consistently among the strongest examples at either the provincial or national level.
According to Stefaan Verhulst, Chief of Research at the GovLab: “Our benchmarking study found that British Columbia’s various initiatives and experiments to create a more open and participatory governance culture has made it a leader in how to re-imagine governance. Leadership, along with the elimination of imperatives that may limit further experimentation, will be critical moving forward. And perhaps even more important, as with all initiatives to re-imaging governance worldwide, much more evaluation of what works, and why, will be needed to keep strengthening the value proposition behind the new practices and polices and provide proof-of-concept.”
See also our TheGovLab Blog.

The Value of Personal Data


The Digital Enlightenment Yearbook 2013 is dedicated this year to Personal Data:  “The value of personal data has traditionally been understood in ethical terms as a safeguard for personality rights such as human dignity and privacy. However, we have entered an era where personal data are mined, traded and monetized in the process of creating added value – often in terms of free services including efficient search, support for social networking and personalized communications. This volume investigates whether the economic value of personal data can be realized without compromising privacy, fairness and contextual integrity. It brings scholars and scientists from the disciplines of computer science, law and social science together with policymakers, engineers and entrepreneurs with practical experience of implementing personal data management.
The resulting collection will be of interest to anyone concerned about privacy in our digital age, especially those working in the field of personal information management, whether academics, policymakers, or those working in the private sector.”

Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”