big data

We Need a Citizen Maker Movement

Curated on June 25, 2014August 3, 2018 by Stefaan Verhulst

Lorelei Kelly at the Huffington Post: “It was hard to miss the giant mechanical giraffe grazing on the White House lawn last week. For the first time ever, the President organized a Maker Faire–inviting entrepreneurs and inventors from across the USA to celebrate American ingenuity in the service of economic progress.
The maker movement is a California original. Think R2D2 serving margaritas to a jester with an LED news scroll. The #nationofmakers Twitter feed has dozens of examples of collaborative production, of making, sharing and learning.
But since this was the White House, I still had to ask myself, what would the maker movement be if the economy was not the starting point? What if it was about civics? What if makers decided to create a modern, hands-on democracy?
What is democracy anyway but a never ending remix of new prototypes? Last week’s White House Maker Faire heralded a new economic bonanza. This revolution’s poster child is 3-D printing– decentralized fabrication that is customized to meet local needs. On the government front, new design rules for democracy are already happening in communities, where civics and technology have generated a front line of maker cities.
But the distance between California’s tech capacity and DC does seem 3000 miles wide. The NSA’s over collection/surveillance problem and Healthcare.gov’s doomed rollout are part of the same system-wide capacity deficit. How do we close the gap between California’s revolution and our institutions?

In California, disruption is a business plan. In DC, it’s a national security threat.
In California, hackers are artists. In DC, they are often viewed as criminals.
In California, “cyber” is a dystopian science fiction word. In DC, cyber security is in a dozen oversight plans for Congress.

in California, individuals are encouraged to “fail forward.” In DC, risk-aversion is bipartisan.

Scaling big problems with local solutions is a maker specialty. Government policymaking needs this kind of help.
Here’s the issue our nation is facing: The inability of the non-military side of our public institutions to process complex problems. Today, this competence and especially the capacity to solve technical challenges often exist only in the private sector. If something is urgent and can’t be monetized, it becomes a national security problem. Which increasingly means that critical decision making that should be in the civilian remit instead migrates to the military. Look at our foreign policy. Good government is a counter terrorism strategy in Afghanistan. Decades of civilian inaction on climate change means that now Miami is referred to as a battle space in policy conversations.
This rhetoric reflects an understandable but unacceptable disconnect for any democracy.
To make matters more confusing, much of the technology in civics (like list building petitions) is suited for elections, not for governing. It is often antagonistic. The result? policy making looks like campaigning. We need some civic tinkering to generate governing technology that comes with relationships. Specifically, this means technology that includes many voices, but has identifiable channels for expertise that can sort complexity and that is not compromised by financial self-interest.
Today, sorting and filtering information is a huge challenge for participation systems around the world. Information now ranks up there with money and people as a lever of power. On the people front, the loud and often destructive individuals are showing up effectively. On the money front, our public institutions are at risk of becoming purely pay to play (wonks call this “transactional”).
Makers, ask yourselves, how can we turn big data into a political constituency for using real evidence–one that can compete with all the negative noise and money in the system? For starters, technologists out West must stop treating government like it’s a bad signal that can be automated out of existence. We are at a moment where our society requires an engineering mindset to develop modern, tech-savvy rules for democracy. We need civic makers….”

Privacy and Open Government

Curated on June 24, 2014August 3, 2018 by Stefaan Verhulst

Paper by Teresa Scassa in Future Internet: “The public-oriented goals of the open government movement promise increased transparency and accountability of governments, enhanced citizen engagement and participation, improved service delivery, economic development and the stimulation of innovation. In part, these goals are to be achieved by making more and more government information public in reusable formats and under open licences. This paper identifies three broad privacy challenges raised by open government. The first is how to balance privacy with transparency and accountability in the context of “public” personal information. The second challenge flows from the disruption of traditional approaches to privacy based on a collapse of the distinctions between public and private sector actors. The third challenge is that of the potential for open government data—even if anonymized—to contribute to the big data environment in which citizens and their activities are increasingly monitored and profiled.”

LifeLogging: personal big data

Curated on June 24, 2014August 3, 2018 by Stefaan Verhulst

Paper by Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. at Foundations and Trends in Information Retrieval: “We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking a information retrieval scientist’s perspective on lifelogging and the quantified self.”

How Crowdsourced Astrophotographs on the Web Are Revolutionizing Astronomy

Curated on June 23, 2014October 31, 2018 by Stefaan Verhulst

Emerging Technology From the arXiv: “Astrophotography is currently undergoing a revolution thanks to the increased availability of high quality digital cameras and the software available to process the pictures after they have been taken.
Since photographs of the night sky are almost always better with long exposures that capture more light, this processing usually involves combining several images of the same part of the sky to produce one with a much longer effective exposure.
That’s all straightforward if you’ve taken the pictures yourself with the same gear under the same circumstances. But astronomers want to do better.
“The astrophotography group on Flickr alone has over 68,000 images,” say Dustin Lang at Carnegie Mellon University in Pittsburgh and a couple of pals. These and other images represent a vast source of untapped data for astronomers.
The problem is that it’s hard to combine images accurately when little is known about how they were taken. Astronomers take great care to use imaging equipment in which the pixels produce a signal that is proportional to the number of photons that hit.
But the same cannot be said of the digital cameras widely used by amateurs. All kinds of processes can end up influencing the final image.
So any algorithm that combines them has to cope with these variations. “We want to do this without having to infer the (possibly highly nonlinear) processing that has been applied to each individual image, each of which has been wrecked in its own loving way by its creator,” say Lang and co.
Now, these guys say they’ve cracked it. They’ve developed a system that automatically combines images from the same part of the sky to increase the effective exposure time of the resulting picture. And they say the combined images can rival those from much professional telescopes.
They’ve tested this approach by downloading images of two well-known astrophysical objects: the NGC 5907 Galaxy and the colliding pair of galaxies—Messier 51a and 51b.
For NGC 5907, they ended up with 4,000 images from Flickr, 1,000 from Bing and 100 from Google. They used an online system called astrometry.net that automatically aligns and registers images of the night sky and then combined the images using their new algorithm, which they call Enhance.
The results are impressive. They say that the combined images of NGC5907 (bottom three images) show some of the same faint features that revealed a single image taken over 11 hours of exposure using a 50 cm telescope (the top left image). All the images reveal the same kind of fine detail such as a faint stellar stream around the galaxy.
The combined image for the M51 galaxies is just as impressive, taking only 40 minutes to produce on a single processor. It reveals extended structures around both galaxies, which astronomers know to be debris from their gravitational interaction as they collide.
Lang and co say these faint features are hugely important because they allow astronomers to measure the age, mass ratios, and orbital configurations of the galaxies involved. Interestingly, many of these faint features are not visible in any of the input images taken from the Web. They emerge only once images have been combined.
One potential problem with algorithms like this is that they need to perform well as the number of images they combine increases. It’s no good if they grind to a halt as soon as a substantial amount of data becomes available.
On this score, Lang and co say astronomers can rest easy. The performance of their new Enhance algorithm scales linearly with the number of images it has to combine. That means it should perform well on large datasets.
The bottom line is that this kind of crowd-sourced astronomy has the potential to make a big impact, given that the resulting images rival those from large telescopes.
And it could also be used for historical images, say Lang and co. The Harvard Plate Archives, for example, contain half a million images dating back to the 1880s. These were all taken using different emulsions, with different exposures and developed using different processes. So the plates all have different responses to light, making them hard to compare.
That’s exactly the problem that Lang and co have solved for digital images on the Web. So it’s not hard to imagine how they could easily combine the data from the Harvard archives as well….”
Ref: arxiv.org/abs/1406.1528 : Towards building a Crowd-Sourced Sky Map

A Big Day for Big Data: The Beginning of Our Data Transformation

Curated on June 19, 2014May 29, 2019 by Stefaan Verhulst

Mark Doms, Under Secretary for Economic Affairs at the US Department of Commerce: “Wednesday, June 18, 2014, was a big day for big data. The Commerce Department participated in the inaugural Open Data Roundtable at the White House, with GovLab at NYU and the White House Office of Science and Technology Policy. The event brought businesses and non-profit organizations that rely on Commerce data together with Commerce Department officials to discuss how to make the data we collect and release easier to find, understand and use. This initiative has significant potential to fuel new businesses; create jobs; and help federal, state and local governments make better decisions.
OpenData 500

Under Secretary Mark Doms presented and participated in the first Open Data Roundtable at the White House, organized by Commerce, GovLab at NYU and the White House Office of Science and Technology Policy
Data innovation is revolutionizing every aspect of our society and government data is playing a major role in the revolution. From the National Oceanic and Atmospheric Administration’s (NOAA’s) climate data to the U.S. Census Bureau’s American Community Survey, the U.S. Patent and Trademark Office (USPTO) patent and trademark records, and National Institute of Standards and Technology (NIST) research, companies, organizations and people are using this information to innovate, grow our economy and better plan for the future.
At this week’s Open Data 500, some key insights I came away with include:

There is a strong desire for data consistency across the Commerce Department, and indeed the federal government.

Data should be catalogued in a common, machine-readable format.

Data should be accessible in bulk, allowing the private sector greater flexibility to harness the information.

The use of a single platform for access to government data would create efficiencies and help coordination across agencies.

Furthermore, business leaders stand ready to help us achieve these goals.
Secretary Pritzker is the first Secretary of Commerce to make data a departmental priority in the Commerce Department’s Strategic Plan, and has branded Commerce as “America’s Data Agency.” In keeping with that mantra, over the next several months, my team at the Economics and Statistics Administration (ESA), which includes the Bureau of Economic Analysis and the U.S. Census Bureau, will be involved in similar forums. We will be engaging our users – businesses, academia, advocacy organizations, and state and local governments – to drive this open data conversation forward.
Today was a big first step in that process. The insight gained will help inform our efforts ahead. Thanks again to the team at GovLab and the White House for their hard work in making it possible!”

The Emerging Power of Big Data

Curated on June 19, 2014July 19, 2019 by Stefaan Verhulst

New America Foundation Report on the Chicago experience of using big data: “Big data is transforming the commercial marketplace but it also has the potential to reshape government affairs and urban development. In a new report from the Emerging Leaders Program at the Chicago Council of Global Affairs, Lincoln S. Ellis, a founding member of the World Economic Roundtable, and other authors from the Emerging Leaders Program, explore how big data can be used by mega-cities to meet the challenges they face in an age of resource constraints to improve the lives of their residents.
Using Chicago as a case study, the report examines how the explosion of data availability enables cities to do more with less—to improve government services, fund much needed transportation, provide better education, and guarantee public safety. And do more with less is what many cities have had to do over the past five years because many cities have had to cut their budgets and reduce the number of public employees in the post-financial crisis economy. It is also what they will need to continue to do in the future.
“Unfortunately, resource constraints are a consistent feature of the post-crisis global landscape,” argues Ellis. “Happily, so too is the renaissance in productivity gains garnered by our ability to leverage technology and information to achieve our most important public purposes in a smarter and more efficient way.”
Click here to view the report as a PDF.”

Big Data, My Data

Curated on June 18, 2014May 29, 2019 by Stefaan Verhulst

Jane Sarasohn-Kahn at iHealthBeat: “The routine operation of modern health care systems produces an abundance of electronically stored data on an ongoing basis,” Sebastian Schneeweis writes in a recent New England Journal of Medicine Perspective.
Is this abundance of data a treasure trove for improving patient care and growing knowledge about effective treatments? Is that data trove a Pandora’s black box that can be mined by obscure third parties to benefit for-profit companies without rewarding those whose data are said to be the new currency of the economy? That is, patients themselves?
In this emerging world of data analytics in health care, there’s Big Data and there’s My Data (“small data”). Who most benefits from the use of My Data may not actually be the consumer.
Big focus on Big Data. Several reports published in the first half of 2014 talk about the promise and perils of Big Data in health care. The Federal Trade Commission’s study, titled “Data Brokers: A Call for Transparency and Accountability,” analyzed the business practices of nine “data brokers,” companies that buy and sell consumers’ personal information from a broad array of sources. Data brokers sell consumers’ information to buyers looking to use those data for marketing, managing financial risk or identifying people. There are health implications in all of these activities, and the use of such data generally is not covered by HIPAA. The report discusses the example of a data segment called “Smoker in Household,” which a company selling a new air filter for the home could use to target-market to an individual who might seek such a product. On the downside, without the consumers’ knowledge, the information could be used by a financial services company to identify the consumer as a bad health insurance risk.
“Big Data and Privacy: A Technological Perspective,” a report from the President’s Office of Science and Technology Policy, considers the growth of Big Data’s role in helping inform new ways to treat diseases and presents two scenarios of the “near future” of health care. The first, on personalized medicine, recognizes that not all patients are alike or respond identically to treatments. Data collected from a large number of similar patients (such as digital images, genomic information and granular responses to clinical trials) can be mined to develop a treatment with an optimal outcome for the patients. In this case, patients may have provided their data based on the promise of anonymity but would like to be informed if a useful treatment has been found. In the second scenario, detecting symptoms via mobile devices, people wishing to detect early signs of Alzheimer’s Disease in themselves use a mobile device connecting to a personal couch in the Internet cloud that supports and records activities of daily living: say, gait when walking, notes on conversations and physical navigation instructions. For both of these scenarios, the authors ask, “Can the information about individuals’ health be sold, without additional consent, to third parties? What if this is a stated condition of use of the app? Should information go to the individual’s personal physicians with their initial consent but not a subsequent confirmation?”
The World Privacy Foundation’s report, titled “The Scoring of America: How Secret Consumer Scores Threaten Your Privacy and Your Future,” describes the growing market for developing indices on consumer behavior, identifying over a dozen health-related scores. Health scores include the Affordable Care Act Individual Health Risk Score, the FICO Medication Adherence Score, various frailty scores, personal health scores (from WebMD and OneHealth, whose default sharing setting is based on the user’s sharing setting with the RunKeeper mobile health app), Medicaid Resource Utilization Group Scores, the SF-36 survey on physical and mental health and complexity scores (such as the Aristotle score for congenital heart surgery). WPF presents a history of consumer scoring beginning with the FICO score for personal creditworthiness and recommends regulatory scrutiny on the new consumer scores for fairness, transparency and accessibility to consumers.
At the same time these three reports went to press, scores of news stories emerged discussing the Big Opportunities Big Data present. The June issue of CFO Magazine published a piece called “Big Data: Where the Money Is.” InformationWeek published “Health Care Dives Into Big Data,” Motley Fool wrote about “Big Data’s Big Future in Health Care” and WIRED called “Cloud Computing, Big Data and Health Care” the “trifecta.”
Well-timed on June 5, the Office of the National Coordinator for Health IT’s Roadmap for Interoperability was detailed in a white paper, titled “Connecting Health and Care for the Nation: A 10-Year Vision to Achieve an Interoperable Health IT Infrastructure.” The document envisions the long view for the U.S. health IT ecosystem enabling people to share and access health information, ensuring quality and safety in care delivery, managing population health, and leveraging Big Data and analytics. Notably, “Building Block #3” in this vision is ensuring privacy and security protections for health information. ONC will “support developers creating health tools for consumers to encourage responsible privacy and security practices and greater transparency about how they use personal health information.” Looking forward, ONC notes the need for “scaling trust across communities.”
Consumer trust: going, going, gone? In the stakeholder community of U.S. consumers, there is declining trust between people and the companies and government agencies with whom people deal. Only 47% of U.S. adults trust companies with whom they regularly do business to keep their personal information secure, according to a June 6 Gallup poll. Furthermore, 37% of people say this trust has decreased in the past year. Who’s most trusted to keep information secure? Banks and credit card companies come in first place, trusted by 39% of people, and health insurance companies come in second, trusted by 26% of people.
Trust is a basic requirement for health engagement. Health researchers need patients to share personal data to drive insights, knowledge and treatments back to the people who need them. PatientsLikeMe, the online social network, launched the Data for Good project to inspire people to share personal health information imploring people to “Donate your data for You. For Others. For Good.” For 10 years, patients have been sharing personal health information on the PatientsLikeMe site, which has developed trusted relationships with more than 250,000 community members…”

Selected Readings on Crowdsourcing Tasks and Peer Production

Curated on June 12, 2014December 13, 2018 by Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of crowdsourcing was originally published in 2014.

Technological advances are creating a new paradigm by which institutions and organizations are increasingly outsourcing tasks to an open community, allocating specific needs to a flexible, willing and dispersed workforce. “Microtasking” platforms like Amazon’s Mechanical Turk are a burgeoning source of income for individuals who contribute their time, skills and knowledge on a per-task basis. In parallel, citizen science projects – task-based initiatives in which citizens of any background can help contribute to scientific research – like Galaxy Zoo are demonstrating the ability of lay and expert citizens alike to make small, useful contributions to aid large, complex undertakings. As governing institutions seek to do more with less, looking to the success of citizen science and microtasking initiatives could provide a blueprint for engaging citizens to help accomplish difficult, time-consuming objectives at little cost. Moreover, the incredible success of peer-production projects – best exemplified by Wikipedia – instills optimism regarding the public’s willingness and ability to complete relatively small tasks that feed into a greater whole and benefit the public good. You can learn more about this new wave of “collective intelligence” by following the MIT Center for Collective Intelligence and their annual Collective Intelligence Conference.

Selected Reading List (in alphabetical order)

Yochai Benkler — The Wealth of Networks: How Social Production Transforms Markets and Freedom — a book on the ways commons-based peer-production is transforming modern society.
Daren C. Brabham — Using Crowdsourcing in Government — a report describing the diversity of methods crowdsourcing could be greater utilized by governments, including through the leveraging of micro-tasking platforms.
Kevin J. Boudreau, Patrick Gaule, Karim Lakhani, Christoph Reidl, Anita Williams Woolley – From Crowds to Collaborators: Initiating Effort & Catalyzing Interactions Among Online Creative Workers – a working paper exploring the conditions,
including incentives, that affect online collaboration.
Chiara Franzoni and Henry Sauermann — Crowd Science: The Organization of Scientific Research in Open Collaborative Projects — a paper describing the potential advantages of deploying crowd science in a variety of contexts.
Aniket Kittur, Ed H. Chi and Bongwon Suh — Crowdsourcing User Studies with Mechanical Turk — a paper proposing potential benefits beyond simple task completion for microtasking platforms like Mechanical Turk.
Aniket Kittur, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth M. Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton — The Future of Crowd Work — a paper describing the promise of increased and evolved crowd work’s effects on the global economy.
Michael J. Madison — Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo — an in-depth case study of the Galaxy Zoo containing insights regarding the importance of clear objectives and institutional and/or professional collaboration in citizen science initiatives.
Thomas W. Malone, Robert Laubacher and Chrysanthos Dellarocas – Harnessing Crowds: Mapping the Genome of Collective Intelligence – an article proposing a framework for understanding collective intelligence efforts.
Geoff Mulgan – True Collective Intelligence? A Sketch of a Possible New Field – a paper proposing theoretical building blocks and an experimental and research agenda around the field of collective intelligence.
Henry Sauermann and Chiara Franzoni – Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation – a paper exploring the role of interest-based motivation in collaborative knowledge production.
Catherine E. Schmitt-Sands and Richard J. Smith – Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk – an article describing an experiment using Mechanical Turk to crowdsource public policy research microtasks.
Clay Shirky — Here Comes Everybody: The Power of Organizing Without Organizations — a book exploring the ways largely unstructured collaboration is remaking practically all sectors of modern life.
Jonathan Silvertown — A New Dawn for Citizen Science — a paper examining the diverse factors influencing the emerging paradigm of “science by the people.”
Katarzyna Szkuta, Roberto Pizzicannella, David Osimo – Collaborative approaches to public sector innovation: A scoping study – an article studying success factors and incentives around the collaborative delivery of online public services.

Annotated Selected Reading List (in alphabetical order)

Benkler, Yochai. The Wealth of Networks: How Social Production Transforms Markets and Freedom. Yale University Press, 2006. http://bit.ly/1aaU7Yb.

In this book, Benkler “describes how patterns of information, knowledge, and cultural production are changing – and shows that the way information and knowledge are made available can either limit or enlarge the ways people can create and express themselves.”

In his discussion on Wikipedia – one of many paradigmatic examples of people collaborating without financial reward – he calls attention to the notable ongoing cooperation taking place among a diversity of individuals. He argues that, “The important point is that Wikipedia requires not only mechanical cooperation among people, but a commitment to a particular style of writing and describing concepts that is far from intuitive or natural to people. It requires self-discipline. It enforces the behavior it requires primarily through appeal to the common enterprise that the participants are engaged in…”

Brabham, Daren C. Using Crowdsourcing in Government. Collaborating Across Boundaries Series. IBM Center for The Business of Government, 2013. http://bit.ly/17gzBTA.

In this report, Brabham categorizes government crowdsourcing cases into a “four-part, problem-based typology, encouraging government leaders and public administrators to consider these open problem-solving techniques as a way to engage the public and tackle difficult policy and administrative tasks more effectively and efficiently using online communities.”

The proposed four-part typology describes the following types of crowdsourcing in government:
- Knowledge Discovery and Management
- Distributed Human Intelligence Tasking
- Broadcast Search
- Peer-Vetted Creative Production

In his discussion on Distributed Human Intelligence Tasking, Brabham argues that Amazon’s Mechanical Turk and other microtasking platforms could be useful in a number of governance scenarios, including:
- Governments and scholars transcribing historical document scans
- Public health departments translating health campaign materials into foreign languages to benefit constituents who do not speak the native language
- Governments translating tax documents, school enrollment and immunization brochures, and other important materials into minority languages
- Helping governments predict citizens’ behavior, “such as for predicting their use of public transit or other services or for predicting behaviors that could inform public health practitioners and environmental policy makers”

Boudreau, Kevin J., Patrick Gaule, Karim Lakhani, Christoph Reidl, Anita Williams Woolley. “From Crowds to Collaborators: Initiating Effort & Catalyzing Interactions Among Online Creative Workers.” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 14-060. January 23, 2014. https://bit.ly/2QVmGUu.

In this working paper, the authors explore the “conditions necessary for eliciting effort from those affecting the quality of interdependent teamwork” and “consider the the role of incentives versus social processes in catalyzing collaboration.”

The paper’s findings are based on an experiment involving 260 individuals randomly assigned to 52 teams working toward solutions to a complex problem.

The authors determined the level of effort in such collaborative undertakings are sensitive to cash incentives. However, collaboration among teams was driven more by the active participation of teammates, rather than any monetary reward.

Franzoni, Chiara, and Henry Sauermann. “Crowd Science: The Organization of Scientific Research in Open Collaborative Projects.” Research Policy (August 14, 2013). http://bit.ly/HihFyj.

In this paper, the authors explore the concept of crowd science, which they define based on two important features: “participation in a project is open to a wide base of potential contributors, and intermediate inputs such as data or problem solving algorithms are made openly available.” The rationale for their study and conceptual framework is the “growing attention from the scientific community, but also policy makers, funding agencies and managers who seek to evaluate its potential benefits and challenges. Based on the experiences of early crowd science projects, the opportunities are considerable.”

Based on the study of a number of crowd science projects – including governance-related initiatives like Patients Like Me – the authors identify a number of potential benefits in the following categories:
- Knowledge-related benefits
- Benefits from open participation
- Benefits from the open disclosure of intermediate inputs
- Motivational benefits

The authors also identify a number of challenges:
- Organizational challenges
- Matching projects and people
- Division of labor and integration of contributions
- Project leadership
- Motivational challenges
- Sustaining contributor involvement
- Supporting a broader set of motivations
- Reconciling conflicting motivations

Kittur, Aniket, Ed H. Chi, and Bongwon Suh. “Crowdsourcing User Studies with Mechanical Turk.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 453–456. CHI ’08. New York, NY, USA: ACM, 2008. http://bit.ly/1a3Op48.

In this paper, the authors examine “[m]icro-task markets, such as Amazon’s Mechanical Turk, [which] offer a potential paradigm for engaging a large number of users for low time and monetary costs. [They] investigate the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks.”

The authors conclude that in addition to providing a means for crowdsourcing small, clearly defined, often non-skill-intensive tasks, “Micro-task markets such as Amazon’s Mechanical Turk are promising platforms for conducting a variety of user study tasks, ranging from surveys to rapid prototyping to quantitative measures. Hundreds of users can be recruited for highly interactive tasks for marginal costs within a timeframe of days or even minutes. However, special care must be taken in the design of the task, especially for user measurements that are subjective or qualitative.”

Kittur, Aniket, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth M. Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton. “The Future of Crowd Work.” In 16th ACM Conference on Computer Supported Cooperative Work (CSCW 2013), 2012. http://bit.ly/1c1GJD3.

In this paper, the authors discuss paid crowd work, which “offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale.” However, they caution that, “it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework.”

The authors argue that seven key challenges must be met to ensure that crowd work processes evolve and reach their full potential:
- Designing workflows
- Assigning tasks
- Supporting hierarchical structure
- Enabling real-time crowd work
- Supporting synchronous collaboration
- Controlling quality

Madison, Michael J. “Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo.” In Convening Cultural Commons, 2013. http://bit.ly/1ih9Xzm.

This paper explores a “case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place, Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis on the Internet. In the second place, Galaxy Zoo is a highly successful example of peer production, some times known as crowdsourcing…In the third place, is a highly visible example of data-intensive science, sometimes referred to as e-science or Big Data science, by which scientific researchers develop methods to grapple with the massive volumes of digital data now available to them via modern sensing and imaging technologies.”

Madison concludes that the success of Galaxy Zoo has not been the result of the “character of its information resources (scientific data) and rules regarding their usage,” but rather, the fact that the “community was guided from the outset by a vision of a specific organizational solution to a specific research problem in astronomy, initiated and governed, over time, by professional astronomers in collaboration with their expanding universe of volunteers.”

Malone, Thomas W., Robert Laubacher and Chrysanthos Dellarocas. “Harnessing Crowds: Mapping the Genome of Collective Intelligence.” MIT Sloan Research Paper. February 3, 2009. https://bit.ly/2SPjxTP.

In this article, the authors describe and map the phenomenon of collective intelligence – also referred to as “radical decentralization, crowd-sourcing, wisdom of crowds, peer production, and wikinomics – which they broadly define as “groups of individuals doing things collectively that seem intelligent.”

The article is derived from the authors’ work at MIT’s Center for Collective Intelligence, where they gathered nearly 250 examples of Web-enabled collective intelligence. To map the building blocks or “genes” of collective intelligence, the authors used two pairs of related questions:
- Who is performing the task? Why are they doing it?
- What is being accomplished? How is it being done?

The authors concede that much work remains to be done “to identify all the different genes for collective intelligence, the conditions under which these genes are useful, and the constraints governing how they can be combined,” but they believe that their framework provides a useful start and gives managers and other institutional decisionmakers looking to take advantage of collective intelligence activities the ability to “systematically consider many possible combinations of answers to questions about Who, Why, What, and How.”

Mulgan, Geoff. “True Collective Intelligence? A Sketch of a Possible New Field.” Philosophy & Technology 27, no. 1. March 2014. http://bit.ly/1p3YSdd.

In this paper, Mulgan explores the concept of a collective intelligence, a “much talked about but…very underdeveloped” field.

With a particular focus on health knowledge, Mulgan “sets out some of the potential theoretical building blocks, suggests an experimental and research agenda, shows how it could be analysed within an organisation or business sector and points to possible intellectual barriers to progress.”

He concludes that the “central message that comes from observing real intelligence is that intelligence has to be for something,” and that “turning this simple insight – the stuff of so many science fiction stories – into new theories, new technologies and new applications looks set to be one of the most exciting prospects of the next few years and may help give shape to a new discipline that helps us to be collectively intelligent about our own collective intelligence.”

Sauermann, Henry and Chiara Franzoni. “Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation.” SSRN Working Papers Series. November 28, 2013. http://bit.ly/1o6YB7f.

In this paper, Sauremann and Franzoni explore the issue of interest-based motivation in crowd-based knowledge production – in particular the use of the crowd science platform Zooniverse – by drawing on “research in psychology to discuss important static and dynamic features of interest and deriv[ing] a number of research questions.”

The authors find that interest-based motivation is often tied to a “particular object (e.g., task, project, topic)” not based on a “general trait of the person or a general characteristic of the object.” As such, they find that “most members of the installed base of users on the platform do not sign up for multiple projects, and most of those who try out a project do not return.”

They conclude that “interest can be a powerful motivator of individuals’ contributions to crowd-based knowledge production…However, both the scope and sustainability of this interest appear to be rather limited for the large majority of contributors…At the same time, some individuals show a strong and more enduring interest to participate both within and across projects, and these contributors are ultimately responsible for much of what crowd science projects are able to accomplish.”

Schmitt-Sands, Catherine E. and Richard J. Smith. “Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk.” SSRN Working Papers Series. January 9, 2014. http://bit.ly/1ugaYja.

In this paper, the authors describe an experiment involving the nascent use of Amazon’s Mechanical Turk as a social science research tool. “While researchers have used crowdsourcing to find research subjects or classify texts, [they] used Mechanical Turk to conduct a policy scan of local government websites.”

Schmitt-Sands and Smith found that “crowdsourcing worked well for conducting an online policy program and scan.” The microtasked workers were helpful in screening out local governments that either did not have websites or did not have the types of policies and services for which the researchers were looking. However, “if the task is complicated such that it requires ongoing supervision, then crowdsourcing is not the best solution.”

Shirky, Clay. Here Comes Everybody: The Power of Organizing Without Organizations. New York: Penguin Press, 2008. https://bit.ly/2QysNif.

In this book, Shirky explores our current era in which, “For the first time in history, the tools for cooperating on a global scale are not solely in the hands of governments or institutions. The spread of the Internet and mobile phones are changing how people come together and get things done.”

Discussing Wikipedia’s “spontaneous division of labor,” Shirky argues that the process is like, “the process is more like creating a coral reef, the sum of millions of individual actions, than creating a car. And the key to creating those individual actions is to hand as much freedom as possible to the average user.”

Silvertown, Jonathan. “A New Dawn for Citizen Science.” Trends in Ecology & Evolution 24, no. 9 (September 2009): 467–471. http://bit.ly/1iha6CR.

This article discusses the move from “Science for the people,” a slogan adopted by activists in the 1970s to “’Science by the people,’ which is “a more inclusive aim, and is becoming a distinctly 21st century phenomenon.”

Silvertown identifies three factors that are responsible for the explosion of activity in citizen science, each of which could be similarly related to the crowdsourcing of skills by governing institutions:
- “First is the existence of easily available technical tools for disseminating information about products and gathering data from the public.
- A second factor driving the growth of citizen science is the increasing realisation among professional scientists that the public represent a free source of labour, skills, computational power and even finance.
- Third, citizen science is likely to benefit from the condition that research funders such as the National Science Foundation in the USA and the Natural Environment Research Council in the UK now impose upon every grantholder to undertake project-related science outreach. This is outreach as a form of public accountability.”

Szkuta, Katarzyna, Roberto Pizzicannella, David Osimo. “Collaborative approaches to public sector innovation: A scoping study.” Telecommunications Policy. 2014. http://bit.ly/1oBg9GY.

In this article, the authors explore cases where government collaboratively delivers online public services, with a focus on success factors and “incentives for services providers, citizens as users and public administration.”

The authors focus on six types of collaborative governance projects:
- Services initiated by government built on government data;
- Services initiated by government and making use of citizens’ data;
- Services initiated by civil society built on open government data;
- Collaborative e-government services; and
- Services run by civil society and based on citizen data.
The cases explored “are all designed in the way that effectively harnesses the citizens’ potential. Services susceptible to collaboration are those that require computing efforts, i.e. many non-complicated tasks (e.g. citizen science projects – Zooniverse) or citizens’ free time in general (e.g. time banks). Those services also profit from unique citizens’ skills and their propensity to share their competencies.”

Why Governments Should Adopt a Digital Engagement Strategy

Curated on June 11, 2014August 15, 2018 by Stefaan Verhulst

Lindsay Crudele at StateTech: “Government agencies increasingly value digital engagement as a way to transform a complaint-based relationship into one of positive, proactive constituent empowerment. An engaged community is a stronger one.
Creating a culture of participatory government, as we strive to do in Boston, requires a data-driven infrastructure supported by IT solutions. Data management and analytics solutions translate a huge stream of social media data, drive conversations and creative crowdsourcing, and support transparency.
More than 50 departments across Boston host public conversations using a multichannel, multidisciplinary portfolio of accounts. We integrate these using an enterprise digital engagement management tool that connects and organizes them to break down silos and boost collaboration. Moreover, the technology provides a lens into ways to expedite workflow and improve service delivery.

A Vital Link in Times of Need

Committed and creative daily engagement builds trusting collaboration that, in turn, is vital in an inevitable crisis. As we saw during the tragic events of the 2013 Boston Marathon bombings and recent major weather events, rapid response through digital media clarifies the situation, provides information about safety and manages constituent expectations.
Boston’s enterprise model supports coordinated external communication and organized monitoring, intake and response. This provides a superadmin with access to all accounts for governance and the ability to easily amplify central messaging across a range of cultivated communities. These communities will later serve in recovery efforts.
The conversations must be seeded by a keen, creative and data-driven content strategy. For an agency to determine the correct strategy for the organization and the community it serves, a growing crop of social analytics tools can provide efficient insight into performance factors: type of content, deployment schedule, sentiment, service-based response time and team performance, to name a few. For example, in February, the city of Boston learned that tweets from our mayor with video saw 300 percent higher engagement than those without.
These insights can inform resource deployment, eliminating guesswork to more directly reach constituents by their preferred methods. Being truly present in a conversation demonstrates care and awareness and builds trust. This increased positivity can be measured through sentiment analysis, including change over time, and should be monitored for fluctuation.
During a major event, engagement managers may see activity reach new peaks in volume. IT solutions can interpret Big Data and bring a large-scale digital conversation back into perspective, identifying public safety alerts and emerging trends, needs and community influencers who can be engaged as amplifying partners.

Running Strong One Year Later

Throughout the 2014 Boston Marathon, we used three monitoring tools to deliver smart alerts to key partners across the organization:
• An engagement management tool organized conversations for account performance and monitoring.
• A brand listening tool scanned for emerging trends across the city and uncovered related conversations.
• A location-based predictive tool identified early alerts to discover potential problems along the marathon route.
With the team and tools in place, policy-based training supports the sustained growth and operation of these conversation channels. A data-driven engagement strategy unearths all of our stories, where we, as public servants and neighbors, build better communities together….”

Critiquing Big Data: Politics, Ethics, Epistemology \| Special Section Introduction	PDF
Kate Crawford, Mary L. Gray, Kate Miltner	10 pgs.

The Big Data Divide	ABSTRACT PDF
Mark Andrejevic	17 pgs.

Metaphors of Big Data	ABSTRACT PDF
Cornelius Puschmann, Jean Burgess	20 pgs.

Advertising, Big Data and the Clearance of the Public Realm: Marketers’ New Approaches to the Content Subsidy	ABSTRACT PDF
Nick Couldry, Joseph Turow	17 pgs.

A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large Scale Data Sets	ABSTRACT PDF
Lawrence Busch	18 pgs.

Topics: big data

Privacy and Open Government

LifeLogging: personal big data

How Crowdsourced Astrophotographs on the Web Are Revolutionizing Astronomy

A Big Day for Big Data: The Beginning of Our Data Transformation

The Emerging Power of Big Data

Big Data, My Data

Selected Readings on Crowdsourcing Tasks and Peer Production

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

79 countries

Streetlight Effect

(stritlaɪt ɪˈfɛkt)

Subscribe to curated findings and actionable knowledge
from The Living Library, delivered to your inbox every Friday

Blockchain and Identity

This One Does Not Go Up To 11: The Quantified Self Movement as an Alternative Big Data Practice	ABSTRACT PDF
Dawn Nafus, Jamie Sherman	11 pgs.

The Theory/Data Thing	ABSTRACT PDF
Geoffrey C. Bowker	5 pgs.