Big Data needs Big Theory


Geoffrey West, former President of the Santa Fe Institute: “As the world becomes increasingly complex and interconnected, some of our biggest challenges have begun to seem intractable. What should we do about uncertainty in the financial markets? How can we predict energy supply and demand? How will climate change play out? How do we cope with rapid urbanization? Our traditional approaches to these problems are often qualitative and disjointed and lead to unintended consequences. To bring scientific rigor to the challenges of our time, we need to develop a deeper understanding of complexity itself….
The digital revolution is driving much of the increasing complexity and pace of life we are now seeing, but this technology also presents an opportunity. The ubiquity of cell phones and electronic transactions, the increasing use of personal medical probes, and the concept of the electronically wired “smart city” are already providing us with enormous amounts of data. With new computational tools and techniques to digest vast, interrelated databases, researchers and practitioners in science, technology, business and government have begun to bring large-scale simulations and models to bear on questions formerly out of reach of quantitative analysis, such as how cooperation emerges in society, what conditions promote innovation, and how conflicts spread and grow.
The trouble is, we don’t have a unified, conceptual framework for addressing questions of complexity. We don’t know what kind of data we need, nor how much, or what critical questions we should be asking. “Big data” without a “big theory” to go with it loses much of its potency and usefulness, potentially generating new unintended consequences.
When the industrial age focused society’s attention on energy in its many manifestations—steam, chemical, mechanical, and so on—the universal laws of thermodynamics came as a response. We now need to ask if our age can produce universal laws of complexity that integrate energy with information. What are the underlying principles that transcend the extraordinary diversity and historical contingency and interconnectivity of financial markets, populations, ecosystems, war and conflict, pandemics and cancer? An overarching predictive, mathematical framework for complex systems would, in principle, incorporate the dynamics and organization of any complex system in a quantitative, computable framework.
We will probably never make detailed predictions of complex systems, but coarse-grained descriptions that lead to quantitative predictions for essential features are within our grasp. We won’t predict when the next financial crash will occur, but we ought to be able to assign a probability of one occurring in the next few years. The field is in the midst of a broad synthesis of scientific disciplines, helping reverse the trend toward fragmentation and specialization, and is groping toward a more unified, holistic framework for tackling society’s big questions. The future of the human enterprise may well depend on it.”

Peacekeeping 4.0: Harnessing the Potential of Big Data, Social Media, and Cyber Technologies


Chapter by John Karlsrud in “Cyberspace and International Relations: Theory, Prospects and Challenges”(Edited by Jan-Frederik Kremer, and Benedikt Müller): “Since the Cold War, peacekeeping has evolved from first-generation peacekeeping that focused on monitoring peace agreements, to third-generation multidimensional peacekeeping operations tasked with rebuilding states and their institutions during and after conflict. However, peacekeeping today is lagging behind the changes marking our time. Big Data, including social media, and the many actors in the field may provide peacekeeping and peacebuilding operations with information and tools to enable them to respond better, faster and more effectively, saving lives and building states. These tools are already well known in the areas of humanitarian action, social activism, and development. Also the United Nations, through the Global Pulse initiative, has begun to discover the potential of “Big Data for Development,” which may in time help prevent violent conflict. However, less has been done in the area of peacekeeping. UN member states should push for change so that the world organization and other multilateral actors can get their act together, mounting a fourth generation of peacekeeping operations that can utilize the potentials of Big Data, social media and modern technology—“Peacekeeping 4.0.” The chapter details some of the initiatives that can be harnessed and further developed, and offers policy recommendations for member states, the UN Security Council, and UN peacekeeping at UN headquarters and at field levels.”

You Are Your Data


in Slate: “We are becoming data. Every day, our smartphones, browsers, cars, and even refrigerators generate information about our habits. When we click “I agree” on terms of service, we opt in to systems in which we are known only by our data. So we need to be able to understand ourselves as data, too.
To understand what that might mean for the average person in the future, we should look to the Quantified Self community, which is at the frontier of understanding what our role as individuals in a data-driven society might look like. Quantified Self began as a Meetup community sharing personal stories of self-tracking techniques, and is now a catchall adjective to describe the emerging set of apps and sensors available to consumers to facilitate self-tracking, such as the Fitbit or Nike Fuelband. Some of the self-tracking practices of this group come across as extreme (experimenting with the correlation between butter consumption and brain function). But what is a niche interest today could be widely marketed tomorrow—and accordingly, their frustrations may soon be yours…

Instead, I propose that we should have a “right to use” our personal data: I should be able to access and make use of data that refers to me. At best, a right to use would reconcile both my personal interest in the small-scale insights and the firms’ large-scale interests in big data insights from the larger population. These interests are not in conflict with each other.
Of course, to translate this concept into practice, we need to work out matters of both technology and policy.
What data are we asking for? Are we asking for data that individuals have opted into creating, like self-tracking fitness applications? Should we broaden that definition to describe any data that refers to our person, such as behavioral data collected by cookies and gathered by third-party data brokers? These definitions will be hard to pin down.
Also, what kind of data? Just that which we’ve actively opted in to creating, or does it expand to the more hidden, passive, transactional data? Will firms exercise control over the line between where “raw” data becomes processed and therefore proprietary? If we can’t begin to define the data representation of a “step” in an activity tracker, how will we standardize access to that information?
Access to personal data also suffers from a chicken-and-egg problem right now. We don’t see greater consumer demand for this because we don’t yet have robust enough tools to make use of disparate sets of data as individuals, and yet such tools are not gaining traction without proven demand.”

White House Unveils Big Data Projects, Round Two


Information Week: “The White House Office of Science and Technology Policy (OSTP) and Networking and Information Technology R&D program (NITRD) on Tuesday introduced a slew of new big-data collaboration projects aimed at stimulating private-sector interest in federal data. The initiatives, announced at the White House-sponsored “Data to Knowledge to Action” event, are targeted at fields as varied as medical research, geointelligence, economics, and linguistics.
The new projects are a continuation of the Obama Administration’s Big Data Initiative, announced in March 2012, when the first round of big-data projects was presented.
Thomas Kalil, OSTP’s deputy director for technology and innovation, said that “dozens of new partnerships — more than 90 organizations,” are pursuing these new collaborative projects, including many of the best-known American technology, pharmaceutical, and research companies.
Among the initiatives, Amazon Web Services (AWS) and NASA have set up the NASA Earth eXchange, or NEX, a collaborative network to provide space-based data about our planet to researchers in Earth science. AWS will host much of NASA’s Earth-observation data as an AWS Public Data Set, making it possible, for instance, to crowdsource research projects.
An estimated 4.4 million jobs are being created between now and 2015 to support big-data projects. Employers, educational institutions, and government agencies are working to build the educational infrastructure to provide students with the skills they need to fill those jobs.
To help train new workers, IBM, for instance, has created a new assessment tool that gives university students feedback on their readiness for number-crunching careers in both the public and private sector. Eight universities that have a big data and analytics curriculum — Fordham, George Washington, Illinois Institute of Technology, University of Massachusetts-Boston, Northwestern, Ohio State, Southern Methodist, and the University of Virginia — will receive the assessment tool.
OSTP is organizing an initiative to create a “weather service” for pandemics, Kalil said, a way to use big data to identify and predict pandemics as early as possible in order to plan and prepare for — and hopefully mitigate — their effects.
The National Institutes of Health (NIH), meanwhile, is undertaking its ” Big Data to Knowledge” (BD2K) initiative to develop a range of standards, tools, software, and other approaches to make use of massive amounts of data being generated by the health and medical research community….”
See also:
November 12, 2013 – Fact Sheet: Progress by Federal Agencies: Data to Knowledge to Action
November 12, 2013 – Fact Sheet: New Announcements: Data to Knowledge to Action
November 12, 2013 – Press Release: Data to Knowledge to Action Event

What future do you want? Commission invites votes on what Europe could look like in 2050 to help steer future policy and research planning


European Commission – MEMO: “Vice-President Neelie Kroes, responsible for the Digital Agenda, is inviting people to join a voting and ranking process on 11 visions of what the world could look like in 20-40 years. The Commission is seeking views on living and learning, leisure and working in Europe in 2050, to steer long-term policy or research planning.
The visions have been gathered over the past year through the Futurium, an online debate platform that allows policymakers to not only consult citizens, but to collaborate and “co-create” with them, and at events throughout Europe. Thousands of thinkers – from high school students, to the Erasmus Students Network; from entrepreneurs and internet pioneers to philosophers and university professors, have engaged in a collective inquiry – a means of crowd-sourcing what our future world could look like.
Eleven over-arching themes have been drawn together from more than 200 ideas for the future. From today, everyone is invited to join the debate and offer their rating and rankings of the various ideas. The results of the feedback will help the European Commission make better decisions about how to fund projects and ideas that both shape the future and get Europe ready for that future….
The Futurium is a foresight project run by DG CONNECT, based on an open source approach. It develops visions of society, technologies, attitudes and trends in 2040-2050 and use these, for example as potential blueprints for future policy choices or EU research and innovation funding priorities.
It is an online platform developed to capture emerging trends and enable interested citizens to co-create compelling visions of the futures that matter to them.

This crowd-sourcing approach provides useful insights on:

  1. vision: where people want to go, how desirable and likely are the visions posted on the platform;
  2. policy ideas: what should ideally be done to realise the futures; the possible impacts and plausibility of policy ideas;
  3. evidence: scientific and other evidence to support the visions and policy ideas.

….
Connecting policy making to people: in an increasingly connected society, online outreach and engagement is an essential response to the growing demand for participation, helping to capture new ideas and to broaden the legitimacy of the policy making process (IP/10/1296). The Futurium is an early prototype of a more general policy-making model described in the paper “The Futurium—a Foresight Platform for Evidence-Based and Participatory Policymaking“.

The Futurium was developed to lay the groundwork for future policy proposals which could be considered by the European Parliament and the European Commission under their new mandates as of 2014. But the Futurium’s open, flexible architecture makes it easily adaptable to any policy-making context, where thinking ahead, stakeholder participation and scientific evidence are needed.”

The GovLab Academy: A Community and Platform for Learning and Teaching Governance Innovations


Press Release: “Today the Governance Lab (The GovLab) launches The GovLab Academy at the Open Government Partnership Annual Meeting in London.
Available at www.thegovlabacademy.org, the Academy is a free online community for those wanting to teach and learn how to solve public problems and improve lives using innovations in governance. A partnership between The GovLab  at New York University and MIT Media Lab’s Online Learning Initiative, the site launching today offers curated videos, podcasts, readings and activities designed to enable the purpose driven learner to deepen his or her practical knowledge at her own pace.
The GovLab Academy is funded by a grant from the John S. and James L. Knight Foundation. “The GovLab Academy addresses a growing need among policy makers at all levels – city, federal and global – to leverage advances in technology to govern differently,” says Carol Coletta, Vice President of Community and National Initiatives at the Knight Foundation.  “By connecting the latest technological innovations to a community of willing mentors, the Academy has the potential to catalyze more experimentation in a sector that badly needs it.”
Initial topics include using data to improve policymaking and cover the role of big data, urban analytics, smart disclosure and open data in governance. A second track focuses on online engagement and includes practical strategies for using crowdsourcing to solicit ideas, organize distributed work and gather data.  The site features both curated content drawn from a variety of sources and original interviews with innovators from government, civil society, the tech industry, the arts and academia talking about their work around the world implementing innovations in practice, what worked and what didn’t, to improve real people’s lives.
Beth Noveck, Founder and Director of The GovLab, describes its mission: “The Academy is an experiment in peer production where every teacher is a learner and every learner a teacher. Consistent with The GovLab’s commitment to measuring what works, we want to measure our success by the people contributing as well as consuming content. We invite everyone with ideas, stories, insights and practical wisdom to contribute to what we hope will be a thriving and diverse community for social change”.”

Big Data


Special Report on Big Data by Volta – A newsletter on Science, Technology and Society in Europe:  “Locating crime spots, or the next outbreak of a contagious disease, Big Data promises benefits for society as well as business. But more means messier. Do policy-makers know how to use this scale of data-driven decision-making in an effective way for their citizens and ensure their privacy?90% of the world’s data have been created in the last two years. Every minute, more than 100 million new emails are created, 72 hours of new video are uploaded to YouTube and Google processes more than 2 million searches. Nowadays, almost everyone walks around with a small computer in their pocket, uses the internet on a daily basis and shares photos and information with their friends, family and networks. The digital exhaust we leave behind every day contributes to an enormous amount of data produced, and at the same time leaves electronic traces that contain a great deal of personal information….
Until recently, traditional technology and analysis techniques have not been able to handle this quantity and type of data. But recent technological developments have enabled us to collect, store and process data in new ways. There seems to be no limitations, either to the volume of data or technology for storing and analyzing them. Big Data can map a driver’s sitting position to identify a car thief, it can use Google searches to predict outbreaks of the H1N1 flu virus, it can data-mine Twitter to predict the price of rice or use mobile phone top-ups to describe unemployment in Asia.
The word ‘data’ means ‘given’ in Latin. It commonly refers to a description of something that can be recorded and analyzed. While there is no clear definition of the concept of ‘Big Data’, it usually refers to the processing of huge amounts and new types of data that have not been possible with traditional tools.

‘The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way.’

The notion of Big Data is kind of misleading, argues Robindra Prabhu, a project manager at the Norwegian Board of Technology. “The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way. The digitalization of society gives us access to both ‘traditional’, structured data – like the content of a database or register – and unstructured data, for example the content in a text, pictures and videos. Information designed to be read by humans is now also readable by machines. And this development makes a whole new world of  data gathering and analysis available. Big Data is exciting not just because of the amount and variety of data out there, but that we can process data about so much more than before.”

Smart Citizens


FutureEverything: “This publication aims to shift the debate on the future of cities towards the central place of citizens, and of decentralised, open urban infrastructures. It provides a global perspective on how cities can create the policies, structures and tools to engender a more innovative and participatory society. The publication contains a series of 23 short essays representing some of the key voices developing an emerging discourse around Smart Citizens.  Contributors include:

  • Dan Hill, Smart Citizens pioneer and CEO of communications research centre and transdisciplinary studio Fabrica on why Smart Citizens Make Smart Cities.
  • Anthony Townsend, urban planner, forecaster and author of Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia on the tensions between place-making and city-making on the role of mobile technologies in changing the way that people interact with their surroundings.
  • Paul Maltby, Director of the Government Innovation Group and of the Open Data and Transparency in the UK Cabinet Office on how government can support a smarter society.
  • Aditya Dev Sood, Founder and CEO of the Center for Knowledge Societies, presents polarised hypothetical futures for India in 2025 that argues for the use of technology to bridge gaps in social inequality.
  • Adam Greenfield, New York City-based writer and urbanist, on Recuperating the Smart City.

Editors: Drew Hemment, Anthony Townsend
Download Here.

Google’s flu fail shows the problem with big data


Adam Kucharski in The Conversation: “When people talk about ‘big data’, there is an oft-quoted example: a proposed public health tool called Google Flu Trends. It has become something of a pin-up for the big data movement, but it might not be as effective as many claim.
The idea behind big data is that large amount of information can help us do things which smaller volumes cannot. Google first outlined the Flu Trends approach in a 2008 paper in the journal Nature. Rather than relying on disease surveillance used by the US Centers for Disease Control and Prevention (CDC) – such as visits to doctors and lab tests – the authors suggested it would be possible to predict epidemics through Google searches. When suffering from flu, many Americans will search for information related to their condition….
Between 2003 and 2008, flu epidemics in the US had been strongly seasonal, appearing each winter. However, in 2009, the first cases (as reported by the CDC) started in Easter. Flu Trends had already made its predictions when the CDC data was published, but it turned out that the Google model didn’t match reality. It had substantially underestimated the size of the initial outbreak.
The problem was that Flu Trends could only measure what people search for; it didn’t analyse why they were searching for those words. By removing human input, and letting the raw data do the work, the model had to make its predictions using only search queries from the previous handful of years. Although those 45 terms matched the regular seasonal outbreaks from 2003–8, they didn’t reflect the pandemic that appeared in 2009.
Six months after the pandemic started, Google – who now had the benefit of hindsight – updated their model so that it matched the 2009 CDC data. Despite these changes, the updated version of Flu Trends ran into difficulties again last winter, when it overestimated the size of the influenza epidemic in New York State. The incidents in 2009 and 2012 raised the question of how good Flu Trends is at predicting future epidemics, as opposed to merely finding patterns in past data.
In a new analysis, published in the journal PLOS Computational Biology, US researchers report that there are “substantial errors in Google Flu Trends estimates of influenza timing and intensity”. This is based on comparison of Google Flu Trends predictions and the actual epidemic data at the national, regional and local level between 2003 and 2013
Even when search behaviour was correlated with influenza cases, the model sometimes misestimated important public health metrics such as peak outbreak size and cumulative cases. The predictions were particularly wide of the mark in 2009 and 2012:

Original and updated Google Flu Trends (GFT) model compared with CDC influenza-like illness (ILI) data. PLOS Computational Biology 9:10
Click to enlarge

Although they criticised certain aspects of the Flu Trends model, the researchers think that monitoring internet search queries might yet prove valuable, especially if it were linked with other surveillance and prediction methods.
Other researchers have also suggested that other sources of digital data – from Twitter feeds to mobile phone GPS – have the potential to be useful tools for studying epidemics. As well as helping to analysing outbreaks, such methods could allow researchers to analyse human movement and the spread of public health information (or misinformation).
Although much attention has been given to web-based tools, there is another type of big data that is already having a huge impact on disease research. Genome sequencing is enabling researchers to piece together how diseases transmit and where they might come from. Sequence data can even reveal the existence of a new disease variant: earlier this week, researchers announced a new type of dengue fever virus….”

Are We Puppets in a Wired World?


Sue Halpern in The New York Review of Books: “Also not obvious was how the Web would evolve, though its open architecture virtually assured that it would. The original Web, the Web of static homepages, documents laden with “hot links,” and electronic storefronts, segued into Web 2.0, which, by providing the means for people without technical knowledge to easily share information, recast the Internet as a global social forum with sites like Facebook, Twitter, FourSquare, and Instagram.
Once that happened, people began to make aspects of their private lives public, letting others know, for example, when they were shopping at H+M and dining at Olive Garden, letting others know what they thought of the selection at that particular branch of H+M and the waitstaff at that Olive Garden, then modeling their new jeans for all to see and sharing pictures of their antipasti and lobster ravioli—to say nothing of sharing pictures of their girlfriends, babies, and drunken classmates, or chronicling life as a high-paid escort, or worrying about skin lesions or seeking a cure for insomnia or rating professors, and on and on.
The social Web celebrated, rewarded, routinized, and normalized this kind of living out loud, all the while anesthetizing many of its participants. Although they likely knew that these disclosures were funding the new information economy, they didn’t especially care…
The assumption that decisions made by machines that have assessed reams of real-world information are more accurate than those made by people, with their foibles and prejudices, may be correct generally and wrong in the particular; and for those unfortunate souls who might never commit another crime even if the algorithm says they will, there is little recourse. In any case, computers are not “neutral”; algorithms reflect the biases of their creators, which is to say that prediction cedes an awful lot of power to the algorithm creators, who are human after all. Some of the time, too, proprietary algorithms, like the ones used by Google and Twitter and Facebook, are intentionally biased to produce results that benefit the company, not the user, and some of the time algorithms can be gamed. (There is an entire industry devoted to “optimizing” Google searches, for example.)
But the real bias inherent in algorithms is that they are, by nature, reductive. They are intended to sift through complicated, seemingly discrete information and make some sort of sense of it, which is the definition of reductive.”
Books reviewed: