The Parable of Google Flu: Traps in Big Data Analysis


David Lazer: “…big data last winter had its “Dewey beats Truman” moment, when the poster child of big data (at least for behavioral data), Google Flu Trends (GFT), went way off the rails in “nowcasting” the flu–overshooting the peak last winter by 130% (and indeed, it has been systematically overshooting by wide margins for 3 years). Tomorrow we (Ryan Kennedy, Alessandro Vespignani, and Gary King) have a paper out in Science dissecting why GFT went off the rails, how that could have been prevented, and the broader lessons to be learned regarding big data.
[We are The Parable of Google Flu (WP-Final).pdf we submitted before acceptance. We have also posted an SSRN paper evaluating GFT for 2013-14, since it was reworked in the Fall.]Key lessons that I’d highlight:
1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.
2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]
3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.
4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”

Participatory Budgeting Platform


Hollie Gilman:  “Stanford’s Social Algorithm’s Lab SOAL has built an interactive Participatory Budgeting Platform that allows users to simulate budgetary decision making on $1 million dollars of public monies.  The center brings together economics, computer science, and networking to work on problems and understand the impact of social networking.   This project is part of Stanford’s Widescope Project to enable people to make political decisions on the budgets through data driven social networks.
The Participatory Budgeting simulation highlights the fourth annual Participatory Budgeting in Chicago’s 49th ward — the first place to implement PB in the U.S.  This year $1 million, out of $1.3 million in Alderman capital funds, will be allocated through participatory budgeting.
One goal of the platform is to build consensus. The interactive geo-spatial mapping software enables citizens to more intuitively identify projects in a given area.  Importantly, the platform forces users to make tough choices and balance competing priorities in real time.
The platform is an interesting example of a collaborative governance prototype that could be transformative in its ability to engage citizens with easily accessible mapping software.”

New Research Network to Study and Design Innovative Ways of Solving Public Problems


Network

MacArthur Foundation Research Network on Opening Governance formed to gather evidence and develop new designs for governing 

NEW YORK, NY, March 4, 2014 The Governance Lab (The GovLab) at New York University today announced the formation of a Research Network on Opening Governance, which will seek to develop blueprints for more effective and legitimate democratic institutions to help improve people’s lives.
Convened and organized by the GovLab, the MacArthur Foundation Research Network on Opening Governance is made possible by a three-year grant of $5 million from the John D. and Catherine T. MacArthur Foundation as well as a gift from Google.org, which will allow the Network to tap the latest technological advances to further its work.
Combining empirical research with real-world experiments, the Research Network will study what happens when governments and institutions open themselves to diverse participation, pursue collaborative problem-solving, and seek input and expertise from a range of people. Network members include twelve experts (see below) in computer science, political science, policy informatics, social psychology and philosophy, law, and communications. This core group is supported by an advisory network of academics, technologists, and current and former government officials. Together, they will assess existing innovations in governing and experiment with new practices and how institutions make decisions at the local, national, and international levels.
Support for the Network from Google.org will be used to build technology platforms to solve problems more openly and to run agile, real-world, empirical experiments with institutional partners such as governments and NGOs to discover what can enhance collaboration and decision-making in the public interest.
The Network’s research will be complemented by theoretical writing and compelling storytelling designed to articulate and demonstrate clearly and concretely how governing agencies might work better than they do today. “We want to arm policymakers and practitioners with evidence of what works and what does not,” says Professor Beth Simone Noveck, Network Chair and author of Wiki Government: How Technology Can Make Government Better, Democracy Stronger and Citi More Powerful, “which is vital to drive innovation, re-establish legitimacy and more effectively target scarce resources to solve today’s problems.”
“From prize-backed challenges to spur creative thinking to the use of expert networks to get the smartest people focused on a problem no matter where they work, this shift from top-down, closed, and professional government to decentralized, open, and smarter governance may be the major social innovation of the 21st century,” says Noveck. “The MacArthur Research Network on Opening Governance is the ideal crucible for helping  transition from closed and centralized to open and collaborative institutions of governance in a way that is scientifically sound and yields new insights to inform future efforts, always with an eye toward real-world impacts.”
MacArthur Foundation President Robert Gallucci added, “Recognizing that we cannot solve today’s challenges with yesterday’s tools, this interdisciplinary group will bring fresh thinking to questions about how our governing institutions operate, and how they can develop better ways to help address seemingly intractable social problems for the common good.”
Members
The MacArthur Research Network on Opening Governance comprises:
Chair: Beth Simone Noveck
Network Coordinator: Andrew Young
Chief of Research: Stefaan Verhulst
Faculty Members:

  • Sir Tim Berners-Lee (Massachusetts Institute of Technology (MIT)/University of Southampton, UK)
  • Deborah Estrin (Cornell Tech/Weill Cornell Medical College)
  • Erik Johnston (Arizona State University)
  • Henry Farrell (George Washington University)
  • Sheena S. Iyengar (Columbia Business School/Jerome A. Chazen Institute of International Business)
  • Karim Lakhani (Harvard Business School)
  • Anita McGahan (University of Toronto)
  • Cosma Shalizi (Carnegie Mellon/Santa Fe Institute)

Institutional Members:

  • Christian Bason and Jesper Christiansen (MindLab, Denmark)
  • Geoff Mulgan (National Endowment for Science Technology and the Arts – NESTA, United Kingdom)
  • Lee Rainie (Pew Research Center)

The Network is eager to hear from and engage with the public as it undertakes its work. Please contact Stefaan Verhulst to share your ideas or identify opportunities to collaborate.”

Coordinating the Commons: Diversity & Dynamics in Open Collaborations


Dissertation by Jonathan T. Morgan: “The success of Wikipedia demonstrates that open collaboration can be an effective model for organizing geographically-distributed volunteers to perform complex, sustained work at a massive scale. However, Wikipedia’s history also demonstrates some of the challenges that large, long-term open collaborations face: the core community of Wikipedia editors—the volunteers who contribute most of the encyclopedia’s content and ensure that articles are correct and consistent — has been gradually shrinking since 2007, in part because Wikipedia’s social climate has become increasingly inhospitable for newcomers, female editors, and editors from other underrepresented demographics. Previous research studies of change over time within other work contexts, such as corporations, suggests that incremental processes such as bureaucratic formalization can make organizations more rule-bound and less adaptable — in effect, less open— as they grow and age. There has been little research on how open collaborations like Wikipedia change over time, and on the impact of those changes on the social dynamics of the collaborating community and the way community members prioritize and perform work. Learning from Wikipedia’s successes and failures can help researchers and designers understand how to support open collaborations in other domains — such as Free/Libre Open Source Software, Citizen Science, and Citizen Journalism.

In this dissertation, I examine the role of openness, and the potential antecedents and consequences of formalization, within Wikipedia through an analysis of three distinct but interrelated social structures: community-created rules within the Wikipedia policy environment, coordination work and group dynamics within self-organized open teams called WikiProjects, and the socialization mechanisms that Wikipedia editors use to teach new community members how to participate.To inquire further, I have designed a new editor peer support space, the Wikipedia Teahouse, based on the findings from my empirical studies. The Teahouse is a volunteer-driven project that provides a welcoming and engaging environment in which new editors can learn how to be productive members of the Wikipedia community, with the goal of increasing the number and diversity of newcomers who go on to make substantial contributions to Wikipedia …”

True Collective Intelligence? A Sketch of a Possible New Field


Paper by Geoff Mulgan in Philosophy & Technology :” Collective intelligence is much talked about but remains very underdeveloped as a field. There are small pockets in computer science and psychology and fragments in other fields, ranging from economics to biology. New networks and social media also provide a rich source of emerging evidence. However, there are surprisingly few useable theories, and many of the fashionable claims have not stood up to scrutiny. The field of analysis should be how intelligence is organised at large scale—in organisations, cities, nations and networks. The paper sets out some of the potential theoretical building blocks, suggests an experimental and research agenda, shows how it could be analysed within an organisation or business sector and points to the possible intellectual barriers to progress.”

Predicting Individual Behavior with Social Networks


Article by Sharad Goel and Daniel Goldstein (Microsoft Research): “With the availability of social network data, it has become possible to relate the behavior of individuals to that of their acquaintances on a large scale. Although the similarity of connected individuals is well established, it is unclear whether behavioral predictions based on social data are more accurate than those arising from current marketing practices. We employ a communications network of over 100 million people to forecast highly diverse behaviors, from patronizing an off-line department store to responding to advertising to joining a recreational league. Across all domains, we find that social data are informative in identifying individuals who are most likely to undertake various actions, and moreover, such data improve on both demographic and behavioral models. There are, however, limits to the utility of social data. In particular, when rich transactional data were available, social data did little to improve prediction.”

Trust, Computing, and Society


New book edited by Richard H. R. Harper: “The Internet has altered how people engage with each other in myriad ways, including offering opportunities for people to act distrustfully. This fascinating set of essays explores the question of trust in computing from technical, socio-philosophical, and design perspectives. Why has the identity of the human user been taken for granted in the design of the Internet? What difficulties ensue when it is understood that security systems can never be perfect? What role does trust have in society in general? How is trust to be understood when trying to describe activities as part of a user requirement program? What questions of trust arise in a time when data analytics are meant to offer new insights into user behavior and when users are confronted with different sorts of digital entities? These questions and their answers are of paramount interest to computer scientists, sociologists, philosophers, and designers confronting the problem of trust.

  • Brings together authors from a variety of disciplines
  • Can be adopted in multiple course areas: computer science, philosophy, sociology, anthropology
  • Integrated, multidisciplinary approach to understanding trust as it relates to modern computing”

Table of Contents

Table of Contents

1. Introduction and overview Richard Harper
Part I. The Topography of Trust and Computing:
2. The role of trust in cyberspace David Clark
3. The new face of the internet Thomas Karagiannis
4. Trust as a methodological tool in security engineering George Danezis
Part II. Conceptual Points of View:
5. Computing and the search for trust Tom Simpson
6. The worry about trust Olli Lagerspetz
7. The inescapability of trust Bob Anderson and Wes Sharrock
8. Trust in interpersonal interaction and cloud computing Rod Watson
9. Trust, social identity, and computation Charles Ess
Part III. Trust in Design:
10. Design for trusted and trustworthy services M. Angela Sasse and Iacovos Kirlappos
11. Dialogues: trust in design Richard Banks
12. Trusting oneself Richard Harper and William Odom
13. Reflections on trust, computing and society Richard Harper
Bibliography.

The Web at 25 in the U.S.


Paper by Lee Rainie and Susannah Fox from Pew: “The overall verdict: The internet has been a plus for society and an especially good thing for individual users… This report is the first part of a sustained effort through 2014 by the Pew Research Center to mark the 25th anniversary of the creation of the World Wide Web by Sir Tim Berners-Lee. Lee wrote a paper on March 12, 1989 proposing an “information management” system that became the conceptual and architectural structure for the Web.  He eventually released the code for his system—for free—to the world on Christmas Day in 1990. It became a milestone in easing the way for ordinary people to access documents and interact over a network of computers called the internet—a system that linked computers and that had been around for years. The Web became especially appealing after Web browsers were perfected in the early 1990s to facilitate graphical displays of pages on those linked computers.”

The Power to Give


Press Release: “HTC, a global leader in mobile innovation and design, today unveiled HTC Power To Give™, an initiative that aims to create the a supercomputer by harnessing the collective processing power of Android smartphones.
Currently in beta, HTC Power To Give aims to galvanize smartphone owners to unlock their unused processing power in order to help answer some of society’s biggest questions. Currently, the fight against cancer, AIDS and Alzheimer’s; the drive to ensure every child has clean water to drink and even the search for extra-terrestrial life are all being tackled by volunteer computing platforms.
Empowering people to use their Android smartphones to offer tangible support for vital fields of research, including medicine, science and ecology, HTC Power To Give has been developed in partnership with Dr. David Anderson of the University of California, Berkeley.  The project will support the world’s largest volunteer computing initiative and tap into the powerful processing capabilities of a global network of smartphones.
Strength in numbers
One million HTC One smartphones, working towards a project via HTC Power To Give, could provide similar processing power to that of one of the world’s 30 supercomputers (one PetaFLOP). This could drastically shorten the research cycles for organizations that would otherwise have to spend years analyzing the same volume of data, potentially bringing forward important discoveries in vital subjects by weeks, months, years or even decades. For example, one of the programs available at launch is IBM’s World Community Grid, which gives anyone an opportunity to advance science by donating their computer, smartphone or tablet’s unused computing power to humanitarian research. To date, the World Community Grid volunteers have contributed almost 900,000 years’ worth of processing time to cutting-edge research.
Limitless future potential
Cher Wang, Chairwoman, HTC commented, “We’ve often used innovation to bring about change in the mobile industry, but this programme takes our vision one step further. With HTC Power To Give, we want to make it possible for anyone to dedicate their unused smartphone processing power to contribute to projects that have the potential to change the world.”
“HTC Power To Give will support the world’s largest volunteer computing initiative, and the impact that this project will have on the world over the years to come is huge. This changes everything,” noted Dr. David Anderson, Inventor of the Shared Computing Initiative BOINC, University of California, Berkeley.
Cher Wang added, “We’ve been discussing the impact that just one million HTC Power To Give-enabled smartphones could make, however analysts estimate that over 780 million Android phones were shipped in 2013i alone. Imagine the difference we could make to our children’s future if just a fraction of these Android users were able to divert some of their unused processing power to help find answers to the questions that concern us all.”
Opt-in with ease
After downloading the HTC Power To Give app from the Google Play™ store, smartphone owners can select the research programme to which they will divert a proportion of their phone’s processing power. HTC Power To Give will then run while the phone is chargingii  and connected to a WiFi network, enabling people to change the world whilst sitting at their desk or relaxing at home.
The beta version of HTC Power To Give will be available to download from the Google Play store and will initially be compatible with the HTC One family, HTC Butterfly and HTC Butterfly s. HTC plans to make the app more widely available to other Android smartphone owners in the coming six months as the beta trial progresses.”

Crowdsourcing voices to study Parkinson’s disease


TedMed: “Mathematician Max Little is launching a project that aims to literally give Parkinson’s disease (PD) patients a voice in their own diagnosis and help them monitor their disease progression.
Patients Voice Analysis (PVA) is an open science project that uses phone-based voice recordings and self-reported symptoms, along with software Little designed, to track disease progression. Little, a TEDMED 2013 speaker and TED Fellow, is partnering with the online community PatientsLikeMe, co-founded by TEDMED 2009 speaker James Heywood, and Sage Bionetworks, a non-profit research organization, to conduct the research.
The new project is an extension of Little’s Parkinson’s Voice Initiative, which used speech analysis algorithms to diagnose Parkinson’s from voice records with the help of 17,000 volunteers. This time, he seeks to not only detect markers of PD, but also to add information reported by patients using PatientsLikeMe’s Parkinson’s Disease Rating Scale (PDRS), a tool that documents patients’ answers to questions that measure treatment effectiveness and disease progression….
As openly shared information, the collected data has potential to help vast numbers of individuals by tapping into collective ingenuity. Little has long argued that for science to progress, researchers need to democratize research and move past jostling for credit. Sage Bionetworks has designed a platform called Synapse to allow data sharing with collaborative version control, an effort led by open data advocate John Wilbanks.
“If you can’t share your data, how can you reproduce your science? One of the big problems we’re facing with this kind of medical research is the data is not open and getting access to it is a nightmare,” Little says.
With the PVA project, “Basically anyone can log on download the anonymized data and play around with data mining techniques. We don’t really care what people are able to come up with. We just want the most accurate prediction we can get.
“In research, you’re almost always constrained by what you think is the best way to do things. Unless you open it to the community at large, you’ll never know,” he says.”