LifeLogging: personal big data


Paper by Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. at Foundations and Trends in Information Retrieval: “We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking a information retrieval scientist’s perspective on lifelogging and the quantified self.”

Opening Public Transportation Data in Germany


Thesis by Kaufmann, Stefan: “Open data has been recognized as a valuable resource, and public institutions have taken to publishing their data under open licenses, also in Germany. However, German public transit agencies are still reluctant to publish their schedules as open data. Also, two widely used data exchange formats used in German transit planning are proprietary, with no documentation publicly available. Through this work, one of the proprietary formats was reverse-engineered, and a transformation process into the open GTFS schedule format was developed. This process allowed a partnering transit operator to publish their schedule as open data. Also, through a survey taken with German transit authorities and operators, the prevalence of transit data exchange formats, and reservations concerning open transit data were evaluated. The survey brought a series of issues to light which serve as obstacles for opening up transit data. Addressing the issues found through this work, and partnering with open-minded transit authorities to further develop transit data publishing processes can serve as a foundation for wider adoption of publishing open transit data in Germany”

Giving is a question of time: Response times and contributions to a real world public good


Discussion Paper (University of Heidelberg) by Lohse, Johannes and Goeschl, Timo and Diederich , Johannes: “Recent experimental research has examined whether contributions to public goods can be traced back to intuitive or deliberative decision-making, using response times in public good games in order to identify the specific decision process at work. In light of conflicting results, this paper reports on an analysis of response time data from an online experiment in which over 3400 subjects from the general population decided whether to contribute to a real world public good. The between-subjects evidence confirms a strong positive link between contributing and deliberation and between free-riding and intuition. The average response time of contributors is 40 percent higher than that of free-riders. A within-subject analysis reveals that for a given individual, contributing significantly increases and free-riding significantly decreases the amount of deliberation required.”

Transparency, legitimacy and trust


John Kamensky at Federal Times: “The Open Government movement has captured the imagination of many around the world as a way of increasing transparency, participation, and accountability. In the US, many of the federal, state, and local Open Government initiatives have been demonstrated to achieve positive results for citizens here and abroad. In fact, the White House’s science advisors released a refreshed Open Government plan in early June.
However, a recent study in Sweden says the benefits of transparency may vary, and may have little impact on citizens’ perception of legitimacy and trust in government. This research suggests important lessons on how public managers should approach the design of transparency strategies, and how they work in various conditions.
Jenny de Fine Licht, a scholar at the University of Gothenberg in Sweden, offers a more nuanced view of the influence of transparency in political decision making on public legitimacy and trust, in a paper that appears in the current issue of “Public Administration Review.” Her research challenges the assumption of many in the Open Government movement that greater transparency necessarily leads to greater citizen trust in government.
Her conclusion, based on an experiment involving over 1,000 participants, was that the type and degree of transparency “has different effects in different policy areas.” She found that “transparency is less effective in policy decisions that involve trade-offs related to questions of human life and death or well-being.”

The background

Licht says there are some policy decisions that involve what are called “taboo tradeoffs.” A taboo tradeoff, for example, would be making budget tradeoffs in policy areas such as health care and environmental quality, where human life or well-being is at stake. In cases where more money is an implicit solution, the author notes, “increased transparency in these policy areas might provoke feeling of taboo, and, accordingly, decreased perceived legitimacy.”
Other scholars, such as Harvard’s Jane Mansbridge,contend that “full transparency may not always be the best practice in policy making.” Full transparency in decision-making processes would include, for example, open appropriation committee meetings. Instead, she recommends “transparency in rationale – in procedures, information, reasons, and the facts on which the reasons are based.” That is, provide a full explanation after-the-fact.
Licht tested the hypothesis that full transparency of the decision-making process vs. partial transparency via providing after-the-fact rationales for decisions may create different results, depending on the policy arena involved…
Open Government advocates have generally assumed that full and open transparency is always better. Licht’s conclusion is that “greater transparency” does not necessarily increase citizen legitimacy and trust. Instead, the strategy of encouraging a high degree of transparency requires a more nuanced application in its use. While the she cautions about generalizing from her experiment, the potential implications for government decision-makers could be significant.
To date, many of the various Open Government initiatives across the country have assumed a “one size fits all” approach, across the board. Licht’s conclusions, however, help explain why the results of various initiatives have been divergent in terms of citizen acceptance of open decision processes.
Her experiment seems to suggest that citizen engagement is more likely to create a greater citizen sense of legitimacy and trust in areas involving “routine” decisions, such as parks, recreation, and library services. But that “taboo” decisions in policy areas involving tradeoffs of human life, safety, and well-being may not necessarily result in greater trust as a result of the use of full and open transparency of decision-making processes.
While she says that transparency – whether full or partial – is always better than no transparency, her experiment at least shows that policy makers will, at a minimum, know that the end result may not be greater legitimacy and trust. In any case, her research should engender a more nuanced conversation among Open Government advocates at all levels of government. In order to increase citizens’ perceptions of legitimacy and trust in government, it will take more than just advocating for Open Data!”

Poetica


at TechnologyCrunch: “The ability to collaborate on the draft of a document is actually fiendishly tedious online. Many people might be used to Microsoft Word ‘Track Changes’ (ugh) despite the fact it looks awful and takes some getting used to. Nor does Google Docs really create a collaboration experience that mere mortals can get into. Step in Poetica, a brand new startup co-founded by Blaine Cook, formerly Twitter’s founding lead engineer.
Cook has now raised an angel round of funding for the London-based company which is hoping to change how teams create, share and edit work on the web, across any devices and mediums.
Poetica, which opens its doors to new signups today, is a browser-based editor and Chrome extension that portrays a more traditional view of text collaboration – in the same way you might see someone scribble on a piece of paper….
Cook says the goal is to “bring rich collaboration tools based on cutting-edge technology and design to everyone” who wants to communicate online. In other words, they are going for a fairly big play here. And he reckons he can do it from London, over the Valley, where he worked at Twitter: “London has an incredible community of brilliant software engineers and designers, and a growing and supportive investor base.”

Big Data, My Data


Jane Sarasohn-Kahn  at iHealthBeat: “The routine operation of modern health care systems produces an abundance of electronically stored data on an ongoing basis,” Sebastian Schneeweis writes in a recent New England Journal of Medicine Perspective.
Is this abundance of data a treasure trove for improving patient care and growing knowledge about effective treatments? Is that data trove a Pandora’s black box that can be mined by obscure third parties to benefit for-profit companies without rewarding those whose data are said to be the new currency of the economy? That is, patients themselves?
In this emerging world of data analytics in health care, there’s Big Data and there’s My Data (“small data”). Who most benefits from the use of My Data may not actually be the consumer.
Big focus on Big Data. Several reports published in the first half of 2014 talk about the promise and perils of Big Data in health care. The Federal Trade Commission’s study, titled “Data Brokers: A Call for Transparency and Accountability,” analyzed the business practices of nine “data brokers,” companies that buy and sell consumers’ personal information from a broad array of sources. Data brokers sell consumers’ information to buyers looking to use those data for marketing, managing financial risk or identifying people. There are health implications in all of these activities, and the use of such data generally is not covered by HIPAA. The report discusses the example of a data segment called “Smoker in Household,” which a company selling a new air filter for the home could use to target-market to an individual who might seek such a product. On the downside, without the consumers’ knowledge, the information could be used by a financial services company to identify the consumer as a bad health insurance risk.
Big Data and Privacy: A Technological Perspective,” a report from the President’s Office of Science and Technology Policy, considers the growth of Big Data’s role in helping inform new ways to treat diseases and presents two scenarios of the “near future” of health care. The first, on personalized medicine, recognizes that not all patients are alike or respond identically to treatments. Data collected from a large number of similar patients (such as digital images, genomic information and granular responses to clinical trials) can be mined to develop a treatment with an optimal outcome for the patients. In this case, patients may have provided their data based on the promise of anonymity but would like to be informed if a useful treatment has been found. In the second scenario, detecting symptoms via mobile devices, people wishing to detect early signs of Alzheimer’s Disease in themselves use a mobile device connecting to a personal couch in the Internet cloud that supports and records activities of daily living: say, gait when walking, notes on conversations and physical navigation instructions. For both of these scenarios, the authors ask, “Can the information about individuals’ health be sold, without additional consent, to third parties? What if this is a stated condition of use of the app? Should information go to the individual’s personal physicians with their initial consent but not a subsequent confirmation?”
The World Privacy Foundation’s report, titled “The Scoring of America: How Secret Consumer Scores Threaten Your Privacy and Your Future,” describes the growing market for developing indices on consumer behavior, identifying over a dozen health-related scores. Health scores include the Affordable Care Act Individual Health Risk Score, the FICO Medication Adherence Score, various frailty scores, personal health scores (from WebMD and OneHealth, whose default sharing setting is based on the user’s sharing setting with the RunKeeper mobile health app), Medicaid Resource Utilization Group Scores, the SF-36 survey on physical and mental health and complexity scores (such as the Aristotle score for congenital heart surgery). WPF presents a history of consumer scoring beginning with the FICO score for personal creditworthiness and recommends regulatory scrutiny on the new consumer scores for fairness, transparency and accessibility to consumers.
At the same time these three reports went to press, scores of news stories emerged discussing the Big Opportunities Big Data present. The June issue of CFO Magazine published a piece called “Big Data: Where the Money Is.” InformationWeek published “Health Care Dives Into Big Data,” Motley Fool wrote about “Big Data’s Big Future in Health Care” and WIRED called “Cloud Computing, Big Data and Health Care” the “trifecta.”
Well-timed on June 5, the Office of the National Coordinator for Health IT’s Roadmap for Interoperability was detailed in a white paper, titled “Connecting Health and Care for the Nation: A 10-Year Vision to Achieve an Interoperable Health IT Infrastructure.” The document envisions the long view for the U.S. health IT ecosystem enabling people to share and access health information, ensuring quality and safety in care delivery, managing population health, and leveraging Big Data and analytics. Notably, “Building Block #3” in this vision is ensuring privacy and security protections for health information. ONC will “support developers creating health tools for consumers to encourage responsible privacy and security practices and greater transparency about how they use personal health information.” Looking forward, ONC notes the need for “scaling trust across communities.”
Consumer trust: going, going, gone? In the stakeholder community of U.S. consumers, there is declining trust between people and the companies and government agencies with whom people deal. Only 47% of U.S. adults trust companies with whom they regularly do business to keep their personal information secure, according to a June 6 Gallup poll. Furthermore, 37% of people say this trust has decreased in the past year. Who’s most trusted to keep information secure? Banks and credit card companies come in first place, trusted by 39% of people, and health insurance companies come in second, trusted by 26% of people.
Trust is a basic requirement for health engagement. Health researchers need patients to share personal data to drive insights, knowledge and treatments back to the people who need them. PatientsLikeMe, the online social network, launched the Data for Good project to inspire people to share personal health information imploring people to “Donate your data for You. For Others. For Good.” For 10 years, patients have been sharing personal health information on the PatientsLikeMe site, which has developed trusted relationships with more than 250,000 community members…”

Why Statistically Significant Studies Aren’t Necessarily Significant


Michael White in PSMagazine on how modern statistics have made it easier than ever for us to fool ourselves: “Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.
OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”…
That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?
Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”
However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.
But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.
To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”
Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”
What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense…”

Selected Readings on Crowdsourcing Tasks and Peer Production


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of crowdsourcing was originally published in 2014.

Technological advances are creating a new paradigm by which institutions and organizations are increasingly outsourcing tasks to an open community, allocating specific needs to a flexible, willing and dispersed workforce. “Microtasking” platforms like Amazon’s Mechanical Turk are a burgeoning source of income for individuals who contribute their time, skills and knowledge on a per-task basis. In parallel, citizen science projects – task-based initiatives in which citizens of any background can help contribute to scientific research – like Galaxy Zoo are demonstrating the ability of lay and expert citizens alike to make small, useful contributions to aid large, complex undertakings. As governing institutions seek to do more with less, looking to the success of citizen science and microtasking initiatives could provide a blueprint for engaging citizens to help accomplish difficult, time-consuming objectives at little cost. Moreover, the incredible success of peer-production projects – best exemplified by Wikipedia – instills optimism regarding the public’s willingness and ability to complete relatively small tasks that feed into a greater whole and benefit the public good. You can learn more about this new wave of “collective intelligence” by following the MIT Center for Collective Intelligence and their annual Collective Intelligence Conference.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Benkler, Yochai. The Wealth of Networks: How Social Production Transforms Markets and Freedom. Yale University Press, 2006. http://bit.ly/1aaU7Yb.

  • In this book, Benkler “describes how patterns of information, knowledge, and cultural production are changing – and shows that the way information and knowledge are made available can either limit or enlarge the ways people can create and express themselves.”
  • In his discussion on Wikipedia – one of many paradigmatic examples of people collaborating without financial reward – he calls attention to the notable ongoing cooperation taking place among a diversity of individuals. He argues that, “The important point is that Wikipedia requires not only mechanical cooperation among people, but a commitment to a particular style of writing and describing concepts that is far from intuitive or natural to people. It requires self-discipline. It enforces the behavior it requires primarily through appeal to the common enterprise that the participants are engaged in…”

Brabham, Daren C. Using Crowdsourcing in Government. Collaborating Across Boundaries Series. IBM Center for The Business of Government, 2013. http://bit.ly/17gzBTA.

  • In this report, Brabham categorizes government crowdsourcing cases into a “four-part, problem-based typology, encouraging government leaders and public administrators to consider these open problem-solving techniques as a way to engage the public and tackle difficult policy and administrative tasks more effectively and efficiently using online communities.”
  • The proposed four-part typology describes the following types of crowdsourcing in government:
    • Knowledge Discovery and Management
    • Distributed Human Intelligence Tasking
    • Broadcast Search
    • Peer-Vetted Creative Production
  • In his discussion on Distributed Human Intelligence Tasking, Brabham argues that Amazon’s Mechanical Turk and other microtasking platforms could be useful in a number of governance scenarios, including:
    • Governments and scholars transcribing historical document scans
    • Public health departments translating health campaign materials into foreign languages to benefit constituents who do not speak the native language
    • Governments translating tax documents, school enrollment and immunization brochures, and other important materials into minority languages
    • Helping governments predict citizens’ behavior, “such as for predicting their use of public transit or other services or for predicting behaviors that could inform public health practitioners and environmental policy makers”

Boudreau, Kevin J., Patrick Gaule, Karim Lakhani, Christoph Reidl, Anita Williams Woolley. “From Crowds to Collaborators: Initiating Effort & Catalyzing Interactions Among Online Creative Workers.” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 14-060. January 23, 2014. https://bit.ly/2QVmGUu.

  • In this working paper, the authors explore the “conditions necessary for eliciting effort from those affecting the quality of interdependent teamwork” and “consider the the role of incentives versus social processes in catalyzing collaboration.”
  • The paper’s findings are based on an experiment involving 260 individuals randomly assigned to 52 teams working toward solutions to a complex problem.
  • The authors determined the level of effort in such collaborative undertakings are sensitive to cash incentives. However, collaboration among teams was driven more by the active participation of teammates, rather than any monetary reward.

Franzoni, Chiara, and Henry Sauermann. “Crowd Science: The Organization of Scientific Research in Open Collaborative Projects.” Research Policy (August 14, 2013). http://bit.ly/HihFyj.

  • In this paper, the authors explore the concept of crowd science, which they define based on two important features: “participation in a project is open to a wide base of potential contributors, and intermediate inputs such as data or problem solving algorithms are made openly available.” The rationale for their study and conceptual framework is the “growing attention from the scientific community, but also policy makers, funding agencies and managers who seek to evaluate its potential benefits and challenges. Based on the experiences of early crowd science projects, the opportunities are considerable.”
  • Based on the study of a number of crowd science projects – including governance-related initiatives like Patients Like Me – the authors identify a number of potential benefits in the following categories:
    • Knowledge-related benefits
    • Benefits from open participation
    • Benefits from the open disclosure of intermediate inputs
    • Motivational benefits
  • The authors also identify a number of challenges:
    • Organizational challenges
    • Matching projects and people
    • Division of labor and integration of contributions
    • Project leadership
    • Motivational challenges
    • Sustaining contributor involvement
    • Supporting a broader set of motivations
    • Reconciling conflicting motivations

Kittur, Aniket, Ed H. Chi, and Bongwon Suh. “Crowdsourcing User Studies with Mechanical Turk.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 453–456. CHI ’08. New York, NY, USA: ACM, 2008. http://bit.ly/1a3Op48.

  • In this paper, the authors examine “[m]icro-task markets, such as Amazon’s Mechanical Turk, [which] offer a potential paradigm for engaging a large number of users for low time and monetary costs. [They] investigate the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks.”
  • The authors conclude that in addition to providing a means for crowdsourcing small, clearly defined, often non-skill-intensive tasks, “Micro-task markets such as Amazon’s Mechanical Turk are promising platforms for conducting a variety of user study tasks, ranging from surveys to rapid prototyping to quantitative measures. Hundreds of users can be recruited for highly interactive tasks for marginal costs within a timeframe of days or even minutes. However, special care must be taken in the design of the task, especially for user measurements that are subjective or qualitative.”

Kittur, Aniket, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth M. Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton. “The Future of Crowd Work.” In 16th ACM Conference on Computer Supported Cooperative Work (CSCW 2013), 2012. http://bit.ly/1c1GJD3.

  • In this paper, the authors discuss paid crowd work, which “offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale.” However, they caution that, “it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework.”
  • The authors argue that seven key challenges must be met to ensure that crowd work processes evolve and reach their full potential:
    • Designing workflows
    • Assigning tasks
    • Supporting hierarchical structure
    • Enabling real-time crowd work
    • Supporting synchronous collaboration
    • Controlling quality

Madison, Michael J. “Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo.” In Convening Cultural Commons, 2013. http://bit.ly/1ih9Xzm.

  • This paper explores a “case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place, Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis on the Internet. In the second place, Galaxy Zoo is a highly successful example of peer production, some times known as crowdsourcing…In the third place, is a highly visible example of data-intensive science, sometimes referred to as e-science or Big Data science, by which scientific researchers develop methods to grapple with the massive volumes of digital data now available to them via modern sensing and imaging technologies.”
  • Madison concludes that the success of Galaxy Zoo has not been the result of the “character of its information resources (scientific data) and rules regarding their usage,” but rather, the fact that the “community was guided from the outset by a vision of a specific organizational solution to a specific research problem in astronomy, initiated and governed, over time, by professional astronomers in collaboration with their expanding universe of volunteers.”

Malone, Thomas W., Robert Laubacher and Chrysanthos Dellarocas. “Harnessing Crowds: Mapping the Genome of Collective Intelligence.” MIT Sloan Research Paper. February 3, 2009. https://bit.ly/2SPjxTP.

  • In this article, the authors describe and map the phenomenon of collective intelligence – also referred to as “radical decentralization, crowd-sourcing, wisdom of crowds, peer production, and wikinomics – which they broadly define as “groups of individuals doing things collectively that seem intelligent.”
  • The article is derived from the authors’ work at MIT’s Center for Collective Intelligence, where they gathered nearly 250 examples of Web-enabled collective intelligence. To map the building blocks or “genes” of collective intelligence, the authors used two pairs of related questions:
    • Who is performing the task? Why are they doing it?
    • What is being accomplished? How is it being done?
  • The authors concede that much work remains to be done “to identify all the different genes for collective intelligence, the conditions under which these genes are useful, and the constraints governing how they can be combined,” but they believe that their framework provides a useful start and gives managers and other institutional decisionmakers looking to take advantage of collective intelligence activities the ability to “systematically consider many possible combinations of answers to questions about Who, Why, What, and How.”

Mulgan, Geoff. “True Collective Intelligence? A Sketch of a Possible New Field.” Philosophy & Technology 27, no. 1. March 2014. http://bit.ly/1p3YSdd.

  • In this paper, Mulgan explores the concept of a collective intelligence, a “much talked about but…very underdeveloped” field.
  • With a particular focus on health knowledge, Mulgan “sets out some of the potential theoretical building blocks, suggests an experimental and research agenda, shows how it could be analysed within an organisation or business sector and points to possible intellectual barriers to progress.”
  • He concludes that the “central message that comes from observing real intelligence is that intelligence has to be for something,” and that “turning this simple insight – the stuff of so many science fiction stories – into new theories, new technologies and new applications looks set to be one of the most exciting prospects of the next few years and may help give shape to a new discipline that helps us to be collectively intelligent about our own collective intelligence.”

Sauermann, Henry and Chiara Franzoni. “Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation.” SSRN Working Papers Series. November 28, 2013. http://bit.ly/1o6YB7f.

  • In this paper, Sauremann and Franzoni explore the issue of interest-based motivation in crowd-based knowledge production – in particular the use of the crowd science platform Zooniverse – by drawing on “research in psychology to discuss important static and dynamic features of interest and deriv[ing] a number of research questions.”
  • The authors find that interest-based motivation is often tied to a “particular object (e.g., task, project, topic)” not based on a “general trait of the person or a general characteristic of the object.” As such, they find that “most members of the installed base of users on the platform do not sign up for multiple projects, and most of those who try out a project do not return.”
  • They conclude that “interest can be a powerful motivator of individuals’ contributions to crowd-based knowledge production…However, both the scope and sustainability of this interest appear to be rather limited for the large majority of contributors…At the same time, some individuals show a strong and more enduring interest to participate both within and across projects, and these contributors are ultimately responsible for much of what crowd science projects are able to accomplish.”

Schmitt-Sands, Catherine E. and Richard J. Smith. “Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk.” SSRN Working Papers Series. January 9, 2014. http://bit.ly/1ugaYja.

  • In this paper, the authors describe an experiment involving the nascent use of Amazon’s Mechanical Turk as a social science research tool. “While researchers have used crowdsourcing to find research subjects or classify texts, [they] used Mechanical Turk to conduct a policy scan of local government websites.”
  • Schmitt-Sands and Smith found that “crowdsourcing worked well for conducting an online policy program and scan.” The microtasked workers were helpful in screening out local governments that either did not have websites or did not have the types of policies and services for which the researchers were looking. However, “if the task is complicated such that it requires ongoing supervision, then crowdsourcing is not the best solution.”

Shirky, Clay. Here Comes Everybody: The Power of Organizing Without Organizations. New York: Penguin Press, 2008. https://bit.ly/2QysNif.

  • In this book, Shirky explores our current era in which, “For the first time in history, the tools for cooperating on a global scale are not solely in the hands of governments or institutions. The spread of the Internet and mobile phones are changing how people come together and get things done.”
  • Discussing Wikipedia’s “spontaneous division of labor,” Shirky argues that the process is like, “the process is more like creating a coral reef, the sum of millions of individual actions, than creating a car. And the key to creating those individual actions is to hand as much freedom as possible to the average user.”

Silvertown, Jonathan. “A New Dawn for Citizen Science.” Trends in Ecology & Evolution 24, no. 9 (September 2009): 467–471. http://bit.ly/1iha6CR.

  • This article discusses the move from “Science for the people,” a slogan adopted by activists in the 1970s to “’Science by the people,’ which is “a more inclusive aim, and is becoming a distinctly 21st century phenomenon.”
  • Silvertown identifies three factors that are responsible for the explosion of activity in citizen science, each of which could be similarly related to the crowdsourcing of skills by governing institutions:
    • “First is the existence of easily available technical tools for disseminating information about products and gathering data from the public.
    • A second factor driving the growth of citizen science is the increasing realisation among professional scientists that the public represent a free source of labour, skills, computational power and even finance.
    • Third, citizen science is likely to benefit from the condition that research funders such as the National Science Foundation in the USA and the Natural Environment Research Council in the UK now impose upon every grantholder to undertake project-related science outreach. This is outreach as a form of public accountability.”

Szkuta, Katarzyna, Roberto Pizzicannella, David Osimo. “Collaborative approaches to public sector innovation: A scoping study.” Telecommunications Policy. 2014. http://bit.ly/1oBg9GY.

  • In this article, the authors explore cases where government collaboratively delivers online public services, with a focus on success factors and “incentives for services providers, citizens as users and public administration.”
  • The authors focus on six types of collaborative governance projects:
    • Services initiated by government built on government data;
    • Services initiated by government and making use of citizens’ data;
    • Services initiated by civil society built on open government data;
    • Collaborative e-government services; and
    • Services run by civil society and based on citizen data.
  • The cases explored “are all designed in the way that effectively harnesses the citizens’ potential. Services susceptible to collaboration are those that require computing efforts, i.e. many non-complicated tasks (e.g. citizen science projects – Zooniverse) or citizens’ free time in general (e.g. time banks). Those services also profit from unique citizens’ skills and their propensity to share their competencies.”

Heteromation and its (dis)contents: The invisible division of labor between humans and machines


Paper by Hamid Ekbia and Bonnie Nardi in First Monday: “The division of labor between humans and computer systems has changed along both technical and human dimensions. Technically, there has been a shift from technologies of automation, the aim of which was to disallow human intervention at nearly all points in the system, to technologies of “heteromation” that push critical tasks to end users as indispensable mediators. As this has happened, the large population of human beings who have been driven out by the first type of technology are drawn back into the computational fold by the second type. Turning artificial intelligence on its head, one technology fills the gap created by the other, but with a vengeance that unsettles established mechanisms of reward, fulfillment, and compensation. In this fashion, replacement of human beings and their irrelevance to technological systems has given way to new “modes of engagement” with remarkable social, economic, and ethical implications. In this paper we provide a historical backdrop for heteromation and explore and explicate some of these displacements through analysis of a number of cases, including Mechanical Turk, the video games FoldIt and League of Legends, and social media.

Full Text: HTML

Big Data, new epistemologies and paradigm shifts


Paper by Rob Kitchin in the Journal “Big Data and Society”: This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology”