Data-Driven Innovation: Big Data for Growth and Well-Being


“A new OECD report on data-driven innovation finds that countries could be getting much more out of data analytics in terms of economic and social gains if governments did more to encourage investment in “Big Data” and promote data sharing and reuse.

The migration of economic and social activities to the Internet and the advent of The Internet of Things – along with dramatically lower costs of data collection, storage and processing and rising computing power – means that data-analytics is increasingly driving innovation and is potentially an important new source of growth.

The report suggest countries act to seize these benefits, by training more and better data scientists, reducing barriers to cross-border data flows, and encouraging investment in business processes to incorporate data analytics.

Few companies outside of the ICT sector are changing internal procedures to take advantage of data. For example, data gathered by companies’ marketing departments is not always used by other departments to drive decisions and innovation. And in particular, small and medium-sized companies face barriers to the adoption of data-related technologies such as cloud computing, partly because they have difficulty implementing organisational change due to limited resources, including the shortage of skilled personnel.

At the same time, governments will need to anticipate and address the disruptive effects of big data on the economy and overall well-being, as issues as broad as privacy, jobs, intellectual property rights, competition and taxation will be impacted. Read the Policy Brief

TABLE OF CONTENTS
Preface
Foreword
Executive summary
The phenomenon of data-driven innovation
Mapping the global data ecosystem and its points of control
How data now drive innovation
Drawing value from data as an infrastructure
Building trust for data-driven innovation
Skills and employment in a data-driven economy
Promoting data-driven scientific research
The evolution of health care in a data-rich environment
Cities as hubs for data-driven innovation
Governments leading by example with public sector data

 

Big Data Privacy Scenarios


E. Bruce, K. Sollins, M. Vernon, and D. Weitzner at D-Space@MIT: “This paper is the first in a series on privacy in Big Data. As an outgrowth of a series of workshops on the topic, the Big Data Privacy Working Group undertook a study of a series of use scenarios to highlight the challenges to privacy that arise in the Big Data arena. This is a report on those scenarios. The deeper question explored by this exercise is what is distinctive about privacy in the context of Big Data. In addition, we discuss an initial list of issues for privacy that derive specifically from the nature of Big Data. These derive from observations across the real world scenarios and use cases explored in this project as well as wider reading and discussions:

* Scale: The sheer size of the datasets leads to challenges in creating, managing and applying privacy policies.

* Diversity: The increased likelihood of more and more diverse participants in Big Data collection, management, and use, leads to differing agendas and objectives. By nature, this is likely to lead to contradictory agendas and objectives.

* Integration: With increased data management technologies (e.g. cloud services, data lakes, and so forth), integration across datasets, with new and often surprising opportunities for cross-product inferences, will also come new information about individuals and their behaviors.

* Impact on secondary participants: Because many pieces of information are reflective of not only the targeted subject, but secondary, often unattended, participants, the inferences and resulting information will increasingly be reflective of other people, not originally considered as the subject of privacy concerns and approaches.

* Need for emergent policies for emergent information: As inferences over merged data sets occur, emergent information or understanding will occur.

Although each unique data set may have existing privacy policies and enforcement mechanisms, it is not clear that it is possible to develop the requisite and appropriate emerged privacy policies and appropriate enforcement of them automatically…(More)”

What we can learn from the failure of Google Flu Trends


David Lazer and Ryan Kennedy at Wired: “….The issue of using big data for the common good is far more general than Google—which deserves credit, after all, for offering the occasional peek at their data. These records exist because of a compact between individual consumers and the corporation. The legalese of that compact is typically obscure (how many people carefully read terms and conditions?), but the essential bargain is that the individual gets some service, and the corporation gets some data.

What is left out that bargain is the public interest. Corporations and consumers are part of a broader society, and many of these big data archives offer insights that could benefit us all. As Eric Schmidt, CEO of Google, has said, “We must remember that technology remains a tool of humanity.” How can we, and corporate giants, then use these big data archives as a tool to serve humanity?

Google’s sequel to GFT, done right, could serve as a model for collaboration around big data for the public good. Google is making flu-related search data available to the CDC as well as select research groups. A key question going forward will be whether Google works with these groups to improve the methodology underlying GFT. Future versions should, for example, continually update the fit of the data to flu prevalence—otherwise, the value of the data stream will rapidly decay.

This is just an example, however, of the general challenge of how to build models of collaboration amongst industry, government, academics, and general do-gooders to use big data archives to produce insights for the public good. This came to the fore with the struggle (and delay) for finding a way to appropriately share mobile phone data in west Africa during the Ebola epidemic (mobile phone data are likely the best tool for understanding human—and thus Ebola—movement). Companies need to develop efforts to share data for the public good in a fashion that respects individual privacy.

There is not going to be a single solution to this issue, but for starters, we are pushing for a “big data” repository in Boston to allow holders of sensitive big data to share those collections with researchers while keeping them totally secure. The UN has its Global Pulse initiative, setting up collaborative data repositories around the world. Flowminder, based in Sweden, is a nonprofit dedicated to gathering mobile phone data that could help in response to disasters. But these are still small, incipient, and fragile efforts.

The question going forward now is how build on and strengthen these efforts, while still guarding the privacy of individuals and the proprietary interests of the holders of big data….(More)”

Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism


Stefan Baack at Big Data and Society: “This article shows how activists in the open data movement re-articulate notions of democracy, participation, and journalism by applying practices and values from open source culture to the creation and use of data. Focusing on the Open Knowledge Foundation Germany and drawing from a combination of interviews and content analysis, it argues that this process leads activists to develop new rationalities around datafication that can support the agency of datafied publics. Three modulations of open source are identified: First, by regarding data as a prerequisite for generating knowledge, activists transform the sharing of source code to include the sharing of raw data. Sharing raw data should break the interpretative monopoly of governments and would allow people to make their own interpretation of data about public issues. Second, activists connect this idea to an open and flexible form of representative democracy by applying the open source model of participation to political participation. Third, activists acknowledge that intermediaries are necessary to make raw data accessible to the public. This leads them to an interest in transforming journalism to become an intermediary in this sense. At the same time, they try to act as intermediaries themselves and develop civic technologies to put their ideas into practice. The article concludes with suggesting that the practices and ideas of open data activists are relevant because they illustrate the connection between datafication and open source culture and help to understand how datafication might support the agency of publics and actors outside big government and big business….(More)”

Personalising data for development


Wolfgang Fengler and Homi Kharas in the Financial Times: “When world leaders meet this week for the UN’s general assembly to adopt the Sustainable Development Goals (SDGs), they will also call for a “data revolution”. In a world where almost everyone will soon have access to a mobile phone, where satellites will take high-definition pictures of the whole planet every three days, and where inputs from sensors and social media make up two thirds of the world’s new data, the opportunities to leverage this power for poverty reduction and sustainable development are enormous. We are also on the verge of major improvements in government administrative data and data gleaned from the activities of private companies and citizens, in big and small data sets.

But these opportunities are yet to materialize in any scale. In fact, despite the exponential growth in connectivity and the emergence of big data, policy making is rarely based on good data. Almost every report from development institutions starts with a disclaimer highlighting “severe data limitations”. Like castaways on an island, surrounded with water they cannot drink unless the salt is removed, today’s policy makers are in a sea of data that need to be refined and treated (simplified and aggregated) to make them “consumable”.

To make sense of big data, we used to depend on data scientists, computer engineers and mathematicians who would process requests one by one. But today, new programs and analytical solutions are putting big data at anyone’s fingertips. Tomorrow, it won’t be technical experts driving the data revolution but anyone operating a smartphone. Big data will become personal. We will be able to monitor and model social and economic developments faster, more reliably, more cheaply and on a far more granular scale. The data revolution will affect both the harvesting of data through new collection methods, and the processing of data through new aggregation and communication tools.

In practice, this means that data will become more actionable by becoming more personal, more timely and more understandable. Today, producing a poverty assessment and poverty map takes at least a year: it involves hundreds of enumerators, lengthy interviews and laborious data entry. In the future, thanks to hand-held connected devices, data collection and aggregation will happen in just a few weeks. Many more instances come to mind where new and higher-frequency data could generate development breakthroughs: monitoring teacher attendance, stocks and quality of pharmaceuticals, or environmental damage, for example…..

Despite vast opportunities, there are very few examples that have generated sufficient traction and scale to change policy and behaviour and create the feedback loops to further improve data quality. Two tools have personalised the abstract subjects of environmental degradation and demography (see table):

  • Monitoring forest fires. The World Resources Institute has launched Global Forest Watch, which enables users to monitor forest fires in near real time, and overlay relevant spatial information such as property boundaries and ownership data to be developed into a model to anticipate the impact on air quality in affected areas in Indonesia, Singapore and Malaysia.
  • Predicting your own life expectancy. The World Population Program developed a predictive tool – www.population.io – showing each person’s place in the distribution of world population and corresponding statistical life expectancy. In just a few months, this prototype attracted some 2m users who shared their results more than 25,000 times on social media. The traction of the tool resulted from making demography personal and converting an abstract subject matter into a question of individual ranking and life expectancy.

A new Global Partnership for Sustainable Development Data will be launched at the time of the UN General Assembly….(More)”

Research on digital identity ecosystems


Francesca Bria et al at NESTA/D-CENT: “This report presents a concrete analysis of the latest evolution of the identity ecosystem in the big data context, focusing on the economic and social value of data and identity within the current digital economy. This report also outlines economic, policy, and technical alternatives to develop an identity ecosystem and management of data for the common good that respects citizens’ rights, privacy and data protection.

Key findings

  • This study presents a review of the concept of identity and a map of the key players in the identity industry (such as data brokers and data aggregators), including empirical case studies of identity management in key sectors.
    ….
  • The “datafication” of individuals’ social lives, thoughts and moves is a valuable commodity and constitutes the backbone of the “identity market” within which “data brokers” (collectors, purchasers or sellers) play key different roles in creating the market by offering various services such as fraud, customer relation, predictive analytics, marketing and advertising.
  • Economic, political and technical alternatives for identity to preserve trust, privacy and data ownership in today’s big data environments are formulated. The report looks into access to data, economic strategies to manage data as commons, consent and licensing, tools to control data, and terms of services. It also looks into policy strategies such as privacy and data protection by design and trust and ethical frameworks. Finally, it assesses technical implementations looking at identity and anonymity, cryptographic tools; security; decentralisation and blockchains. It also analyses the future steps needed in order to move into the suggested technical strategies….(More)”

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

Algorithm predicts and prevents train delays two hours in advance


Springwise: “Transport apps such as Ototo make it easier than ever for passengers to stay informed about problems with public transport, but real-time information can only help so much — by the time users find out about a delayed service, it is often too late to take an alternative route. Now, Stockholmstag — the company that runs Sweden’s trains — have found a solution in the form of an algorithm called ‘The Commuter Prognosis’, which can predict network delays up to two hours in advance, giving train operators time to issue extra services or provide travelers with adequate warning.
The system was created by mathematician Wilhelm Landerholm. It uses historical data to predict how a small delay, even as little as two minutes, will affect the running of the rest of the network. Often the initial late train causes a ripple effect, with subsequent services being delayed to accommodate new platform arrival time, which then affect subsequent trains, and so on. But soon, using ‘The Commuter Prognosis’, Stockholmstag train operators will be able to make the necessary adjustments to prevent this. In addition, the information will be relayed to commuters, enabling them to take a different train and therefore reducing overcrowding. The prediction tool is expected to be put into use in Sweden by the end of the year….(More)”

Using Big Data to Understand the Human Condition: The Kavli HUMAN Project


Azmak Okan et al in the Journal “Big Data”: “Until now, most large-scale studies of humans have either focused on very specific domains of inquiry or have relied on between-subjects approaches. While these previous studies have been invaluable for revealing important biological factors in cardiac health or social factors in retirement choices, no single repository contains anything like a complete record of the health, education, genetics, environmental, and lifestyle profiles of a large group of individuals at the within-subject level. This seems critical today because emerging evidence about the dynamic interplay between biology, behavior, and the environment point to a pressing need for just the kind of large-scale, long-term synoptic dataset that does not yet exist at the within-subject level. At the same time that the need for such a dataset is becoming clear, there is also growing evidence that just such a synoptic dataset may now be obtainable—at least at moderate scale—using contemporary big data approaches. To this end, we introduce the Kavli HUMAN Project (KHP), an effort to aggregate data from 2,500 New York City households in all five boroughs (roughly 10,000 individuals) whose biology and behavior will be measured using an unprecedented array of modalities over 20 years. It will also richly measure environmental conditions and events that KHP members experience using a geographic information system database of unparalleled scale, currently under construction in New York. In this manner, KHP will offer both synoptic and granular views of how human health and behavior coevolve over the life cycle and why they evolve differently for different people. In turn, we argue that this will allow for new discovery-based scientific approaches, rooted in big data analytics, to improving the health and quality of human life, particularly in urban contexts….(More)”

(US) Administration Announces New “Smart Cities” Initiative to Help Communities Tackle Local Challenges and Improve City Services


Factsheet from the White House: “Today, the Administration is announcing a new “Smart Cities” Initiative that will invest over $160 million in federal research and leverage more than 25 new technology collaborations to help local communities tackle key challenges such as reducing traffic congestion, fighting crime, fostering economic growth, managing the effects of a changing climate, and improving the delivery of city services. The new initiative is part of this Administration’s overall commitment to target federal resources to meet local needs and support community-led solutions.

Over the past six years, the Administration has pursued a place-based approach to working with communities as they tackle a wide range of challenges, from investing in infrastructure and filling open technology jobs to bolstering community policing. Advances in science and technology have the potential to accelerate these efforts. An emerging community of civic leaders, data scientists, technologists, and companies are joining forces to build “Smart Cities” – communities that are building an infrastructure to continuously improve the collection, aggregation, and use of data to improve the life of their residents – by harnessing the growing data revolution, low-cost sensors, and research collaborations, and doing so securely to protect safety and privacy.

As part of the initiative, the Administration is announcing:

  • More than $35 million in new grants and over $10 million in proposed investments to build a research infrastructure for Smart Cities by the National Science Foundation and National Institute of Standards and Technology.
  • Nearly $70 million in new spending and over $45 million in proposed investments to unlock new solutions in safety, energy, climate preparedness, transportation, health and more, by the Department of Homeland Security, Department of Transportation, Department of Energy, Department of Commerce, and the Environmental Protection Agency.
  • More than 20 cities participating in major new multi-city collaborations that will help city leaders effectively collaborate with universities and industry.

Today, the Administration is also hosting a White House Smart Cities Forum, coinciding with Smart Cities Week hosted by the Smart Cities Council, to highlight new steps and brainstorm additional ways that science and technology can support municipal efforts.

The Administration’s Smart Cities Initiative will begin with a focus on key strategies:

  • Creating test beds for “Internet of Things” applications and developing new multi-sector collaborative models: Technological advancements and the diminishing cost of IT infrastructure have created the potential for an “Internet of Things,” a ubiquitous network of connected devices, smart sensors, and big data analytics. The United States has the opportunity to be a global leader in this field, and cities represent strong potential test beds for development and deployment of Internet of Things applications. Successfully deploying these and other new approaches often depends on new regional collaborations among a diverse array of public and private actors, including industry, academia, and various public entities.
  • Collaborating with the civic tech movement and forging intercity collaborations: There is a growing community of individuals, entrepreneurs, and nonprofits interested in harnessing IT to tackle local problems and work directly with city governments. These efforts can help cities leverage their data to develop new capabilities. Collaborations across communities are likewise indispensable for replicating what works in new places.
  • Leveraging existing Federal activity: From research on sensor networks and cybersecurity to investments in broadband infrastructure and intelligent transportation systems, the Federal government has an existing portfolio of activities that can provide a strong foundation for a Smart Cities effort.
  • Pursuing international collaboration: Fifty-four percent of the world’s population live in urban areas. Continued population growth and urbanization will add 2.5 billion people to the world’s urban population by 2050. The associated climate and resource challenges demand innovative approaches. Products and services associated with this market present a significant export opportunity for the U.S., since almost 90 percent of this increase will occur in Africa and Asia.

Complementing this effort, the President’s Council of Advisors on Science and Technology is examining how a variety of technologies can enhance the future of cities and the quality of life for urban residents. The Networking and Information Technology Research and Development (NITRD) Program is also announcing the release of a new framework to help coordinate Federal agency investments and outside collaborations that will guide foundational research and accelerate the transition into scalable and replicable Smart City approaches. Finally, the Administration’s growing work in this area is reflected in the Science and Technology Priorities Memo, issued by the Office of Management and Budget and Office of Science and Technology Policy in preparation for the President’s 2017 budget proposal, which includes a focus on cyber-physical systems and Smart Cities….(More)”