Selected Readings on Crowdsourcing Tasks and Peer Production


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of crowdsourcing was originally published in 2014.

Technological advances are creating a new paradigm by which institutions and organizations are increasingly outsourcing tasks to an open community, allocating specific needs to a flexible, willing and dispersed workforce. “Microtasking” platforms like Amazon’s Mechanical Turk are a burgeoning source of income for individuals who contribute their time, skills and knowledge on a per-task basis. In parallel, citizen science projects – task-based initiatives in which citizens of any background can help contribute to scientific research – like Galaxy Zoo are demonstrating the ability of lay and expert citizens alike to make small, useful contributions to aid large, complex undertakings. As governing institutions seek to do more with less, looking to the success of citizen science and microtasking initiatives could provide a blueprint for engaging citizens to help accomplish difficult, time-consuming objectives at little cost. Moreover, the incredible success of peer-production projects – best exemplified by Wikipedia – instills optimism regarding the public’s willingness and ability to complete relatively small tasks that feed into a greater whole and benefit the public good. You can learn more about this new wave of “collective intelligence” by following the MIT Center for Collective Intelligence and their annual Collective Intelligence Conference.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Benkler, Yochai. The Wealth of Networks: How Social Production Transforms Markets and Freedom. Yale University Press, 2006. http://bit.ly/1aaU7Yb.

  • In this book, Benkler “describes how patterns of information, knowledge, and cultural production are changing – and shows that the way information and knowledge are made available can either limit or enlarge the ways people can create and express themselves.”
  • In his discussion on Wikipedia – one of many paradigmatic examples of people collaborating without financial reward – he calls attention to the notable ongoing cooperation taking place among a diversity of individuals. He argues that, “The important point is that Wikipedia requires not only mechanical cooperation among people, but a commitment to a particular style of writing and describing concepts that is far from intuitive or natural to people. It requires self-discipline. It enforces the behavior it requires primarily through appeal to the common enterprise that the participants are engaged in…”

Brabham, Daren C. Using Crowdsourcing in Government. Collaborating Across Boundaries Series. IBM Center for The Business of Government, 2013. http://bit.ly/17gzBTA.

  • In this report, Brabham categorizes government crowdsourcing cases into a “four-part, problem-based typology, encouraging government leaders and public administrators to consider these open problem-solving techniques as a way to engage the public and tackle difficult policy and administrative tasks more effectively and efficiently using online communities.”
  • The proposed four-part typology describes the following types of crowdsourcing in government:
    • Knowledge Discovery and Management
    • Distributed Human Intelligence Tasking
    • Broadcast Search
    • Peer-Vetted Creative Production
  • In his discussion on Distributed Human Intelligence Tasking, Brabham argues that Amazon’s Mechanical Turk and other microtasking platforms could be useful in a number of governance scenarios, including:
    • Governments and scholars transcribing historical document scans
    • Public health departments translating health campaign materials into foreign languages to benefit constituents who do not speak the native language
    • Governments translating tax documents, school enrollment and immunization brochures, and other important materials into minority languages
    • Helping governments predict citizens’ behavior, “such as for predicting their use of public transit or other services or for predicting behaviors that could inform public health practitioners and environmental policy makers”

Boudreau, Kevin J., Patrick Gaule, Karim Lakhani, Christoph Reidl, Anita Williams Woolley. “From Crowds to Collaborators: Initiating Effort & Catalyzing Interactions Among Online Creative Workers.” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 14-060. January 23, 2014. https://bit.ly/2QVmGUu.

  • In this working paper, the authors explore the “conditions necessary for eliciting effort from those affecting the quality of interdependent teamwork” and “consider the the role of incentives versus social processes in catalyzing collaboration.”
  • The paper’s findings are based on an experiment involving 260 individuals randomly assigned to 52 teams working toward solutions to a complex problem.
  • The authors determined the level of effort in such collaborative undertakings are sensitive to cash incentives. However, collaboration among teams was driven more by the active participation of teammates, rather than any monetary reward.

Franzoni, Chiara, and Henry Sauermann. “Crowd Science: The Organization of Scientific Research in Open Collaborative Projects.” Research Policy (August 14, 2013). http://bit.ly/HihFyj.

  • In this paper, the authors explore the concept of crowd science, which they define based on two important features: “participation in a project is open to a wide base of potential contributors, and intermediate inputs such as data or problem solving algorithms are made openly available.” The rationale for their study and conceptual framework is the “growing attention from the scientific community, but also policy makers, funding agencies and managers who seek to evaluate its potential benefits and challenges. Based on the experiences of early crowd science projects, the opportunities are considerable.”
  • Based on the study of a number of crowd science projects – including governance-related initiatives like Patients Like Me – the authors identify a number of potential benefits in the following categories:
    • Knowledge-related benefits
    • Benefits from open participation
    • Benefits from the open disclosure of intermediate inputs
    • Motivational benefits
  • The authors also identify a number of challenges:
    • Organizational challenges
    • Matching projects and people
    • Division of labor and integration of contributions
    • Project leadership
    • Motivational challenges
    • Sustaining contributor involvement
    • Supporting a broader set of motivations
    • Reconciling conflicting motivations

Kittur, Aniket, Ed H. Chi, and Bongwon Suh. “Crowdsourcing User Studies with Mechanical Turk.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 453–456. CHI ’08. New York, NY, USA: ACM, 2008. http://bit.ly/1a3Op48.

  • In this paper, the authors examine “[m]icro-task markets, such as Amazon’s Mechanical Turk, [which] offer a potential paradigm for engaging a large number of users for low time and monetary costs. [They] investigate the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks.”
  • The authors conclude that in addition to providing a means for crowdsourcing small, clearly defined, often non-skill-intensive tasks, “Micro-task markets such as Amazon’s Mechanical Turk are promising platforms for conducting a variety of user study tasks, ranging from surveys to rapid prototyping to quantitative measures. Hundreds of users can be recruited for highly interactive tasks for marginal costs within a timeframe of days or even minutes. However, special care must be taken in the design of the task, especially for user measurements that are subjective or qualitative.”

Kittur, Aniket, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth M. Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton. “The Future of Crowd Work.” In 16th ACM Conference on Computer Supported Cooperative Work (CSCW 2013), 2012. http://bit.ly/1c1GJD3.

  • In this paper, the authors discuss paid crowd work, which “offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale.” However, they caution that, “it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework.”
  • The authors argue that seven key challenges must be met to ensure that crowd work processes evolve and reach their full potential:
    • Designing workflows
    • Assigning tasks
    • Supporting hierarchical structure
    • Enabling real-time crowd work
    • Supporting synchronous collaboration
    • Controlling quality

Madison, Michael J. “Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo.” In Convening Cultural Commons, 2013. http://bit.ly/1ih9Xzm.

  • This paper explores a “case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place, Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis on the Internet. In the second place, Galaxy Zoo is a highly successful example of peer production, some times known as crowdsourcing…In the third place, is a highly visible example of data-intensive science, sometimes referred to as e-science or Big Data science, by which scientific researchers develop methods to grapple with the massive volumes of digital data now available to them via modern sensing and imaging technologies.”
  • Madison concludes that the success of Galaxy Zoo has not been the result of the “character of its information resources (scientific data) and rules regarding their usage,” but rather, the fact that the “community was guided from the outset by a vision of a specific organizational solution to a specific research problem in astronomy, initiated and governed, over time, by professional astronomers in collaboration with their expanding universe of volunteers.”

Malone, Thomas W., Robert Laubacher and Chrysanthos Dellarocas. “Harnessing Crowds: Mapping the Genome of Collective Intelligence.” MIT Sloan Research Paper. February 3, 2009. https://bit.ly/2SPjxTP.

  • In this article, the authors describe and map the phenomenon of collective intelligence – also referred to as “radical decentralization, crowd-sourcing, wisdom of crowds, peer production, and wikinomics – which they broadly define as “groups of individuals doing things collectively that seem intelligent.”
  • The article is derived from the authors’ work at MIT’s Center for Collective Intelligence, where they gathered nearly 250 examples of Web-enabled collective intelligence. To map the building blocks or “genes” of collective intelligence, the authors used two pairs of related questions:
    • Who is performing the task? Why are they doing it?
    • What is being accomplished? How is it being done?
  • The authors concede that much work remains to be done “to identify all the different genes for collective intelligence, the conditions under which these genes are useful, and the constraints governing how they can be combined,” but they believe that their framework provides a useful start and gives managers and other institutional decisionmakers looking to take advantage of collective intelligence activities the ability to “systematically consider many possible combinations of answers to questions about Who, Why, What, and How.”

Mulgan, Geoff. “True Collective Intelligence? A Sketch of a Possible New Field.” Philosophy & Technology 27, no. 1. March 2014. http://bit.ly/1p3YSdd.

  • In this paper, Mulgan explores the concept of a collective intelligence, a “much talked about but…very underdeveloped” field.
  • With a particular focus on health knowledge, Mulgan “sets out some of the potential theoretical building blocks, suggests an experimental and research agenda, shows how it could be analysed within an organisation or business sector and points to possible intellectual barriers to progress.”
  • He concludes that the “central message that comes from observing real intelligence is that intelligence has to be for something,” and that “turning this simple insight – the stuff of so many science fiction stories – into new theories, new technologies and new applications looks set to be one of the most exciting prospects of the next few years and may help give shape to a new discipline that helps us to be collectively intelligent about our own collective intelligence.”

Sauermann, Henry and Chiara Franzoni. “Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation.” SSRN Working Papers Series. November 28, 2013. http://bit.ly/1o6YB7f.

  • In this paper, Sauremann and Franzoni explore the issue of interest-based motivation in crowd-based knowledge production – in particular the use of the crowd science platform Zooniverse – by drawing on “research in psychology to discuss important static and dynamic features of interest and deriv[ing] a number of research questions.”
  • The authors find that interest-based motivation is often tied to a “particular object (e.g., task, project, topic)” not based on a “general trait of the person or a general characteristic of the object.” As such, they find that “most members of the installed base of users on the platform do not sign up for multiple projects, and most of those who try out a project do not return.”
  • They conclude that “interest can be a powerful motivator of individuals’ contributions to crowd-based knowledge production…However, both the scope and sustainability of this interest appear to be rather limited for the large majority of contributors…At the same time, some individuals show a strong and more enduring interest to participate both within and across projects, and these contributors are ultimately responsible for much of what crowd science projects are able to accomplish.”

Schmitt-Sands, Catherine E. and Richard J. Smith. “Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk.” SSRN Working Papers Series. January 9, 2014. http://bit.ly/1ugaYja.

  • In this paper, the authors describe an experiment involving the nascent use of Amazon’s Mechanical Turk as a social science research tool. “While researchers have used crowdsourcing to find research subjects or classify texts, [they] used Mechanical Turk to conduct a policy scan of local government websites.”
  • Schmitt-Sands and Smith found that “crowdsourcing worked well for conducting an online policy program and scan.” The microtasked workers were helpful in screening out local governments that either did not have websites or did not have the types of policies and services for which the researchers were looking. However, “if the task is complicated such that it requires ongoing supervision, then crowdsourcing is not the best solution.”

Shirky, Clay. Here Comes Everybody: The Power of Organizing Without Organizations. New York: Penguin Press, 2008. https://bit.ly/2QysNif.

  • In this book, Shirky explores our current era in which, “For the first time in history, the tools for cooperating on a global scale are not solely in the hands of governments or institutions. The spread of the Internet and mobile phones are changing how people come together and get things done.”
  • Discussing Wikipedia’s “spontaneous division of labor,” Shirky argues that the process is like, “the process is more like creating a coral reef, the sum of millions of individual actions, than creating a car. And the key to creating those individual actions is to hand as much freedom as possible to the average user.”

Silvertown, Jonathan. “A New Dawn for Citizen Science.” Trends in Ecology & Evolution 24, no. 9 (September 2009): 467–471. http://bit.ly/1iha6CR.

  • This article discusses the move from “Science for the people,” a slogan adopted by activists in the 1970s to “’Science by the people,’ which is “a more inclusive aim, and is becoming a distinctly 21st century phenomenon.”
  • Silvertown identifies three factors that are responsible for the explosion of activity in citizen science, each of which could be similarly related to the crowdsourcing of skills by governing institutions:
    • “First is the existence of easily available technical tools for disseminating information about products and gathering data from the public.
    • A second factor driving the growth of citizen science is the increasing realisation among professional scientists that the public represent a free source of labour, skills, computational power and even finance.
    • Third, citizen science is likely to benefit from the condition that research funders such as the National Science Foundation in the USA and the Natural Environment Research Council in the UK now impose upon every grantholder to undertake project-related science outreach. This is outreach as a form of public accountability.”

Szkuta, Katarzyna, Roberto Pizzicannella, David Osimo. “Collaborative approaches to public sector innovation: A scoping study.” Telecommunications Policy. 2014. http://bit.ly/1oBg9GY.

  • In this article, the authors explore cases where government collaboratively delivers online public services, with a focus on success factors and “incentives for services providers, citizens as users and public administration.”
  • The authors focus on six types of collaborative governance projects:
    • Services initiated by government built on government data;
    • Services initiated by government and making use of citizens’ data;
    • Services initiated by civil society built on open government data;
    • Collaborative e-government services; and
    • Services run by civil society and based on citizen data.
  • The cases explored “are all designed in the way that effectively harnesses the citizens’ potential. Services susceptible to collaboration are those that require computing efforts, i.e. many non-complicated tasks (e.g. citizen science projects – Zooniverse) or citizens’ free time in general (e.g. time banks). Those services also profit from unique citizens’ skills and their propensity to share their competencies.”

Can social media make every civil servant an innovator?


Steve Kelman at FCW: “Innovation, particularly in government, can be very hard. Lots of signoffs, lots of naysayers. For many, it’s probably not worth the hassle.
Yet all sorts of examples are surfacing about ways civil servants, non-profits, startups and researchers have thought to use social media — or data mining of government information — to get information that can either help citizens directly or help agencies serve citizens. I want to call attention to examples that I’ve seen just in the past few weeks — partly to recognize the creative people who have come up with these ideas, but partly to make a point about the relationship between these ideas and the general issue of innovation in government. I think that these social media and data-driven experiments are often a much simpler way for civil servants to innovate than many of the changes we typically think of under the heading “innovation in government.” They open the possibility to make innovation in government an activity for the civil service masses.
One example that was reported in The New York Times was about a pilot project at the New York City Department of Health and Mental Hygiene to do rapid keyword searches with phrases such as “vomit” and “diarrhea” associated with 294,000 Yelp restaurant reviews in New York City. The city is using a software program developed at Columbia University. They have now expended the monitoring to occur daily, to get quick information on possible problems at specific restaurants or with specific kinds of food.
A second example, reported in BloombergBusinessWeek, involved — perhaps not surprisingly, given the publication — an Israeli startup called Treato that is applying a similar idea to ferretting out adverse drug reactions before they come in through FDA studies and other systems. The founders are cooperating with researchers at Harvard Medical School and FDA officials, among others. Their software looks through Twitter and Facebook, along with a large number of patient forum sites, to cull out from all the reports of illnesses the incidents that may well reflect an unusual presence of adverse drug reactions.
These examples are fascinating in themselves. But one thing that caught my eye about both is that each seems high on the creativity dimension and low on the need-to-overcome-bureaucracy dimension. Both ideas reflect new and improved ways to do what these organizations do anyway, which is gather information to help inform regulatory and health decisions by government. Neither requires any upheaval in an agency’s existing culture, or steps on somebody’s turf in any serious way. Introducing the changes doesn’t require major changes in an agency’s internal procedures. Compared to many innovations in government, these are easy ones to make happen. (They do all need some funds, however.)
What I hope is that the information woven into social media will unlock a new era of innovation inside government. The limits of innovation are much less determined by difficult-to-change bureaucratic processes and can be much more responsive to an individual civil servant’s creativity…”

How NYC Open Data and Reddit Saved New Yorkers Over $55,000 a Year


IQuantNY: “NYC generates an enormous amount of data each year, and for the most part, it stays behind closed doors.  But thanks to the Open Data movement, signed into law by Bloomberg in 2012 and championed over the last several years by Borough President Gale Brewer, along with other council members, we now get to see a small slice of what the city knows. And that slice is growing.
There have been some detractors along the way; a senior attorney for the NYPD said in 2012 during a council hearing that releasing NYPD data in csv format was a problem because they were “concerned with the integrity of the data itself” and because “data could be manipulated by people who want ‘to make a point’ of some sort”.  But our democracy is built on the idea of free speech; we let all the information out and then let reason lead the way.
In some ways, Open Data adds another check and balance into government: its citizens.  I’ve watched the perfect example of this check work itself out over the past month.  You may have caught my post that used parking ticket data to identify the fire hydrant in New York City that was generating the most income for the city in the form of fines: $33,000 a year.  And on the next block, the second most profitable hydrant was generating $24,000 a year.  That’s two consecutive blocks with hydrants generating over $55,000 a year. But there was a problem.  In my post, I laid out why these two parking spots were extremely confusing and basically seemed like a trap; there was a wide “curb extension” between the street and the hydrant, making it appear like the hydrant was not by the street.  Additionally, the DOT had painted parking spots right where you would be fined if you parked.
Once the data was out there, the hydrant took on a life of its own.  First, it raised to the top of the nyc sub-reddit.  That is basically one way that the internet voted that this is in-fact “interesting”.  And that is how things go from small to big. From there, it travelled to the New York Observer, which was able to get a comment from the DOT. After that, it appeared in the New York Post, the post was republished in Gothamist and finally it even went global in the Daily Mail.
I guess the pressure was on the DOT at this point, as each media source reached out for comment, but what struck me was their response to the Observer:

“While DOT has not received any complaints about this location, we will review the roadway markings and make any appropriate alterations”

Why does someone have to complain in order for the DOT to see problems like this?  In fact, the DOT just redesigned every parking sign in New York because some of the old ones were considered confusing.  But if this hydrant was news to them, it implies that they did not utilize the very strongest source of measuring confusion on our streets: NYC parking tickets….”

Why Governments Should Adopt a Digital Engagement Strategy


Lindsay Crudele at StateTech: “Government agencies increasingly value digital engagement as a way to transform a complaint-based relationship into one of positive, proactive constituent empowerment. An engaged community is a stronger one.
Creating a culture of participatory government, as we strive to do in Boston, requires a data-driven infrastructure supported by IT solutions. Data management and analytics solutions translate a huge stream of social media data, drive conversations and creative crowdsourcing, and support transparency.
More than 50 departments across Boston host public conversations using a multichannel, multidisciplinary portfolio of accounts. We integrate these using an enterprise digital engagement management tool that connects and organizes them to break down silos and boost collaboration. Moreover, the technology provides a lens into ways to expedite workflow and improve service delivery.

A Vital Link in Times of Need

Committed and creative daily engagement builds trusting collaboration that, in turn, is vital in an inevitable crisis. As we saw during the tragic events of the 2013 Boston Marathon bombings and recent major weather events, rapid response through digital media clarifies the situation, provides information about safety and manages constituent expectations.
Boston’s enterprise model supports coordinated external communication and organized monitoring, intake and response. This provides a superadmin with access to all accounts for governance and the ability to easily amplify central messaging across a range of cultivated communities. These communities will later serve in recovery efforts.
The conversations must be seeded by a keen, creative and data-driven content strategy. For an agency to determine the correct strategy for the organization and the community it serves, a growing crop of social analytics tools can provide efficient insight into performance factors: type of content, deployment schedule, sentiment, service-based response time and team performance, to name a few. For example, in February, the city of Boston learned that tweets from our mayor with video saw 300 percent higher engagement than those without.
These insights can inform resource deployment, eliminating guesswork to more directly reach constituents by their preferred methods. Being truly present in a conversation demonstrates care and awareness and builds trust. This increased positivity can be measured through sentiment analysis, including change over time, and should be monitored for fluctuation.
During a major event, engagement managers may see activity reach new peaks in volume. IT solutions can interpret Big Data and bring a large-scale digital conversation back into perspective, identifying public safety alerts and emerging trends, needs and community influencers who can be engaged as amplifying partners.

Running Strong One Year Later

Throughout the 2014 Boston Marathon, we used three monitoring tools to deliver smart alerts to key partners across the organization:
• An engagement management tool organized conversations for account performance and monitoring.
• A brand listening tool scanned for emerging trends across the city and uncovered related conversations.
• A location-based predictive tool identified early alerts to discover potential problems along the marathon route.
With the team and tools in place, policy-based training supports the sustained growth and operation of these conversation channels. A data-driven engagement strategy unearths all of our stories, where we, as public servants and neighbors, build better communities together….”

The Emerging Science of Computational Anthropology


Emerging Technology From the arXiv: The increasing availability of big data from mobile phones and location-based apps has triggered a revolution in the understanding of human mobility patterns. This data shows the ebb and flow of the daily commute in and out of cities, the pattern of travel around the world and even how disease can spread through cities via their transport systems.
So there is considerable interest in looking more closely at human mobility patterns to see just how well it can be predicted and how these predictions might be used in everything from disease control and city planning to traffic forecasting and location-based advertising.
Today we get an insight into the kind of detailed that is possible thanks to the work of Zimo Yang at Microsoft research in Beijing and a few pals. These guys start with the hypothesis that people who live in a city have a pattern of mobility that is significantly different from those who are merely visiting. By dividing travelers into locals and non-locals, their ability to predict where people are likely to visit dramatically improves.
Zimo and co begin with data from a Chinese location-based social network called Jiepang.com. This is similar to Foursquare in the US. It allows users to record the places they visit and to connect with friends at these locations and to find others with similar interests.
The data points are known as check-ins and the team downloaded more than 1.3 million of them from five big cities in China: Beijing, Shanghai, Nanjing, Chengdu and Hong Kong. They then used 90 per cent of the data to train their algorithms and the remaining 10 per cent to test it. The Jiapang data includes the users’ hometowns so it’s easy to see whether an individual is checking in in their own city or somewhere else.
The question that Zimo and co want to answer is the following: given a particular user and their current location, where are they most likely to visit in the near future? In practice, that means analysing the user’s data, such as their hometown and the locations recently visited, and coming up with a list of other locations that they are likely to visit based on the type of people who visited these locations in the past.
Zimo and co used their training dataset to learn the mobility pattern of locals and non-locals and the popularity of the locations they visited. The team then applied this to the test dataset to see whether their algorithm was able to predict where locals and non-locals were likely to visit.
They found that their best results came from analysing the pattern of behaviour of a particular individual and estimating the extent to which this person behaves like a local. That produced a weighting called the indigenization coefficient that the researchers could then use to determine the mobility patterns this person was likely to follow in future.
In fact, Zimo and co say they can spot non-locals in this way without even knowing their home location. “Because non-natives tend to visit popular locations, like the Imperial Palace in Beijing and the Bund in Shanghai, while natives usually check in around their homes and workplaces,” they add.
The team say this approach considerably outperforms the mixed algorithms that use only individual visiting history and location popularity. “To our surprise, a hybrid algorithm weighted by the indigenization coefficients outperforms the mixed algorithm accounting for additional demographical information.”
It’s easy to imagine how such an algorithm might be useful for businesses who want to target certain types of travelers or local people. But there is a more interesting application too.
Zimo and co say that it is possible to monitor the way an individual’s mobility patterns change over time. So if a person moves to a new city, it should be possible to see how long it takes them to settle in.
One way of measuring this is in their mobility patterns: whether they are more like those of a local or a non-local. “We may be able to estimate whether a non-native person will behave like a native person after a time period and if so, how long in average a person takes to become a native-like one,” say Zimo and co.
That could have a fascinating impact on the way anthropologists study migration and the way immigrants become part of a local community. This is computational anthropology a science that is clearly in its early stages but one that has huge potential for the future.”
Ref: arxiv.org/abs/1405.7769 : Indigenization of Urban Mobility

A brief history of open data


Article by Luke Fretwell in FCW: “In December 2007, 30 open-data pioneers gathered in Sebastopol, Calif., and penned a set of eight open-government data principles that inaugurated a new era of democratic innovation and economic opportunity.
“The objective…was to find a simple way to express values that a bunch of us think are pretty common, and these are values about how the government could make its data available in a way that enables a wider range of people to help make the government function better,” Harvard Law School Professor Larry Lessig said. “That means more transparency in what the government is doing and more opportunity for people to leverage government data to produce insights or other great business models.”
The eight simple principles — that data should be complete, primary, timely, accessible, machine-processable, nondiscriminatory, nonproprietary and license-free — still serve as the foundation for what has become a burgeoning open-data movement.

The benefits of open data for agencies

  • Save time and money when responding to Freedom of Information Act requests.
  • Avoid duplicative internal research.
  • Use complementary datasets held by other agencies.
  • Empower employees to make better-informed, data-driven decisions.
  • Attract positive attention from the public, media and other agencies.
  • Generate revenue and create new jobs in the private sector.

Source: Project Open Data

In the seven years since those principles were released, governments around the world have adopted open-data initiatives and launched platforms that empower researchers, journalists and entrepreneurs to mine this new raw material and its potential to uncover new discoveries and opportunities. Open data has drawn civic hacker enthusiasts around the world, fueling hackathons, challenges, apps contests, barcamps and “datapaloozas” focused on issues as varied as health, energy, finance, transportation and municipal innovation.
In the United States, the federal government initiated the beginnings of a wide-scale open-data agenda on President Barack Obama’s first day in office in January 2009, when he issued his memorandum on transparency and open government, which declared that “openness will strengthen our democracy and promote efficiency and effectiveness in government.” The president gave federal agencies three months to provide input into an open-government directive that would eventually outline what each agency planned to do with respect to civic transparency, collaboration and participation, including specific objectives related to releasing data to the public.
In May of that year, Data.gov launched with just 47 datasets and a vision to “increase public access to high-value, machine-readable datasets generated by the executive branch of the federal government.”
When the White House issued the final draft of its federal Open Government Directive later that year, the U.S. open-government data movement got its first tangible marching orders, including a 45-day deadline to open previously unreleased data to the public.
Now five years after its launch, Data.gov boasts more than 100,000 datasets from 227 local, state and federal agencies and organizations….”

Open Data Is Open for Business


Jeffrey Stinson at Stateline: ” Last month, web designer Sean Wittmeyer and colleague Wojciech Magda walked away with a $25,000 prize from the state of Colorado for designing an online tool to help businesses decide where to locate in the state.
The tool, called “Beagle Score,” is a widget that can be embedded in online commercial real estate listings. It can rate a location by taxes and incentives, zoning, even the location of possible competitors – all derived from about 30 data sets posted publicly by the state of Colorado and its municipalities.
The creation of Beagle Score is an example of how states, cities, counties and the federal government are encouraging entrepreneurs to take raw government data posted on “open data” websites and turn the information into products the public will buy.
“The (Colorado contest) opened up a reason to use the data,” said Wittmeyer, 25, of Fort Collins. “It shows how ‘open data’ can solve a lot of challenges. … And absolutely, we can make it commercially viable. We can expand it to other states, and fairly quickly.”
Open-data advocates, such as President Barack Obama’s former information chief Vivek Kundra, estimate a multibillion-dollar industry can be spawned by taking raw government data files on sectors such as weather, population, energy, housing, commerce or transportation and turn them into products for the public to consume or other industries to pay for.
They can be as simple as mobile phone apps identifying every stop sign you will encounter on a trip to a different town, or as intricate as taking weather and crops data and turning it into insurance policies farmers can buy.

States, Cities Sponsor ‘Hackathons’

At least 39 states and 46 cities and counties have created open-data sites since the federal government, Utah, California and the cities of San Francisco and Washington, D.C., began opening data in 2009, according to the federal site, Data.gov.
Jeanne Holm, the federal government’s Data.gov “evangelist,” said new sites are popping up and new data are being posted almost daily. The city of Los Angeles, for example, opened a portal last week.
In March, Democratic New York Gov. Andrew Cuomo said that in the year since it was launched, his state’s site has grown to some 400 data sets with 50 million records from 45 agencies. Available are everything from horse injuries and deaths at state race tracks to maps of regulated child care centers. The most popular data: top fishing spots in the state.
State and local governments are sponsoring “hackathons,” “data paloozas,” and challenges like Colorado’s, inviting businesspeople, software developers, entrepreneurs or anyone with a laptop and a penchant for manipulating data to take part. Lexington, Kentucky, had a civic hackathon last weekend. The U.S. Transportation Department and members of the Geospatial Transportation Mapping Association had a three-day data palooza that ended Wednesday in Arlington, Virginia.
The goals of the events vary. Some, like Arlington’s transportation event, solicit ideas for how government can present its data more effectively. Others seek ideas for mining it.
Aldona Valicenti, Lexington’s chief information officer, said many cities want advice on how to use the data to make government more responsive to citizens, and to communicate with them on issues ranging from garbage pickups and snow removal to upcoming civic events.
Colorado and Wyoming had a joint hackathon last month sponsored by Google to help solve government problems. Colorado sought apps that might be useful to state emergency personnel in tracking people and moving supplies during floods, blizzards or other natural disasters. Wyoming sought help in making its tax-and-spend data more understandable and usable by its citizens.
Unless there’s some prize money, hackers may not make a buck from events like these, and participate out of fun, curiosity or a sense of public service. But those who create an app that is useful beyond the boundaries of a particular city or state, or one that is commercially valuable to business, can make serious money – just as Beagle Score plans to do. Colorado will hold onto the intellectual property rights to Beagle Score for a year. But Wittmeyer and his partner will be able to profit from extending it to other states.

States Trail in Open Data

Open data is an outgrowth of the e-government movement of the 1990s, in which government computerized more of the data it collected and began making it available on floppy disks.
States often have trailed the federal government or many cities in adjusting to the computer age and in sharing information, said Emily Shaw, national policy manager for the Sunlight Foundation, which promotes transparency in government. The first big push to share came with public accountability, or “checkbook” sites, that show where government gets its revenue and how it spends it.
The goal was to make government more transparent and accountable by offering taxpayers information on how their money was spent.
The Texas Comptroller of Public Accounts site, established in 2007, offers detailed revenue, spending, tax and contracts data. Republican Comptroller Susan Combs’ office said having a one-stop electronic site also has saved taxpayers about $12.3 million in labor, printing, postage and other costs.
Not all states’ checkbook sites are as openly transparent and detailed as Texas, Shaw said. Nor are their open-data sites. “There’s so much variation between the states,” she said.
Many state legislatures are working to set policies for releasing data. Since the start of 2010, according to the National Conference of State Legislatures, nine states have enacted open-data laws, and more legislation is pending. But California, for instance, has been posting open data for five years without legislation setting policies.
Just as states have lagged in getting data out to the public, less of it has been turned into commercial use, said Joel Gurin, senior adviser at the Governance Lab at New York University and author of the book “Open Data Now.”
Gurin leads Open Data 500, which identifies firms that that have made products from open government data and turned them into regional or national enterprises. In April, it listed 500. It soon may expand. “We’re finding more and more companies every day,” he said. “…

Making cities smarter through citizen engagement


Vaidehi Shah at Eco-Business: “Rapidly progressing information communications technology (ICT) is giving rise to an almost infinite range of innovations that can be implemented in cities to make them more efficient and better connected. However, in order for technology to yield sustainable solutions, planners must prioritise citizen engagement and strong leadership.
This was the consensus on Tuesday at the World Cities Summit 2014, where representatives from city and national governments, technology firms and private sector organisations gathered in Singapore to discuss strategies and challenges to achieving sustainable cities in the future.
Laura Ipsen, Microsoft corporate vice president for worldwide public sector, identified globalisation, social media, big data, and mobility as the four major technological trends prevailing in cities today, as she spoke at the plenary session with a theme on “The next urban decade: critical challenges and opportunities”.
Despite these increasing trends, she cautioned, “technology does not build infrastructure, but it does help better engage citizens and businesses through public-private partnerships”.
For example, “LoveCleanStreets”, an online tool developed by Microsoft and partners, enables London residents to report infrastructure problems such as damaged roads or signs, shared Ipsen.
“By engaging citizens through this application, cities can fix problems early, before they get worse,” she said.
In Singapore, the ‘MyWaters’ app of PUB, Singapore’s national water agency, is also a key tool for the government to keep citizens up-to-date of water quality and safety issues in the country, she added.
Even if governments did not actively develop solutions themselves, simply making the immense amounts of data collected by the city open to businesses and citizens could make a big difference to urban liveability, Mark Chandler, director of the San Francisco Mayor’s Office of International Trade and Commerce, pointed out.
Opening up all of the data collected by San Francisco, for instance, yielded 60 free mobile applications that allow residents to access urban solutions related to public transport, parking, and electricity, among others, he explained. This easy and convenient access to infrastructure and amenities, which are a daily necessity, is integral to “a quality of life that keeps the talented workforce in the city,” Chandler said….”

Open Government Data: Helping Parents to find the Best School for their Kids


Radu Cucos at the Open Government Partnership blog: “…This challenge – finding the right school – is probably one of the most important decisions in many parents’ lives.  Parents are looking for answers to questions such as which schools are located in safe neighborhoods, which ones have the highest teacher – students’ ratio, which schools have the best funding, which schools have the best premises or which ones have the highest grades average.
It is rarely an easy decision, but is made doubly difficult in the case of migrants.  People residing in the same location for a long time know, more or less, which are the best education institutions in their city, town or village. For migrants, the situation is absolutely the opposite. They have to spend extra time and resources in identifying relevant information about schools.
Open Government Data is an effective solution which can ease the problem of a lack of accessible information about existing schools in a particular country or location. By adopting the Open Government Data policy in the educational field, governments release data about grades, funding, student and teacher numbers, data generated throughout time by schools, colleges, universities and other educational settings.
Developers then use this data for creating applications which portray information in easy accessible formats. Three of the best apps which I have come across are highlighted below:

  • Discover Your School, developed under the Province of British Columbia of Canada Open Data Initiative, is a platform for parents who are interested in finding a school for their kids, learning about the school districts or comparing schools in the same area. The application provides comprehensive information, such as the number of students enrolled in schools each year, class sizes, teaching language, disaster readiness, results of skills assessment, and student and parent satisfaction. Information and data can be viewed in interactive formats, including maps. On top of that, Discover Your School engages parents in policy making and initiatives such as Erase Bullying or British Columbia Education Plan.
  • The School Portal, developed under the Moldova Open Data Initiative, uses data made public by the Ministry of Education of Moldova to offer comprehensive information about 1529 educational institutions in the Republic of Moldova. Users of the portal can access information about schools yearly budgets, budget implementation, expenditures, school rating, students’ grades, schools’ infrastructure and communications. The School Portal has a tool which allows visitors to compare schools based on different criteria – infrastructure, students’ performance or annual budgets. The additional value of the portal is the fact that it serves as a platform for private sector entities which sell school supplies to advertise their products. The School Portal also allows parents to virtually interact with the Ministry of Education of Moldova or with a psychologist in case they need additional information or have concerns regarding the education of their children.
  • RomaScuola, developed under the umbrella of the Italian Open Data Initiative, allows visitors to obtain valuable information about all schools in the Rome region. Distinguishing it from the two listed above is the ability to compare schools depending on such facets as frequency of teacher absence, internet connectivity, use of IT equipment for teaching, frequency of students’ transfer to other schools and quality of education in accordance with the percentage of issued diplomas.

Open data on schools has great value not only for parents but also for the educational system in general. Each country has its own school market, if education is considered as a product in this market. Perfect information about products is one of the main characteristics of competitive markets. From this perspective, giving parents the opportunity to have access to information about schools characteristics will contribute to the increase in the competitiveness of the schools market. Educational institutions will have incentives to improve their performance in order to attract more students…”

The Trend towards “Smart Cities”


Chien-Chu Chen in the International Journal of Automation and Smart Technology (AUSMT): “Looking back over the past century, the steady pace of development in many of the world’s cities has resulted in a situation where a high percentage of these cities are now faced with the problem of aging, decrepit urban infrastructure; a considerable number of cities are having to undertake large-scale infrastructure renewal projects. While creating new opportunities in the area of infrastructure, ongoing urbanization is also creating problems, such as excessive consumption of water, electric power and heat energy, environmental pollution, increased greenhouse gas emissions, traffic jams, and the aging of the existing residential housing stock, etc. All of these problems present a challenge to cities’ ability to achieve sustainable development. In response to these issues, the concept of the “smart city” has grown in popularity throughout the world. The aim of smart city initiatives is to make the city a vehicle for “smartification” through the integration of different industries and sectors. As initiatives of this kind move beyond basic automation into the realm of real “smartification,” the smart city concept is beginning to take concrete form….”