Harnessing the Data Revolution for Sustainable Development


US State Department Fact Sheet on “U.S. Government Commitments and Collaboration with the Global Partnership for Sustainable Development Data”: “On September 27, 2015, the member states of the United Nations agreed to a set of Sustainable Development Goals (Global Goals) that define a common agenda to achieve inclusive growth, end poverty, and protect the environment by 2030. The Global Goals build on tremendous development gains made over the past decade, particularly in low- and middle-income countries, and set actionable steps with measureable indicators to drive progress. The availability and use of high quality data is essential to measuring and achieving the Global Goals. By harnessing the power of technology, mobilizing new and open data sources, and partnering across sectors, we will achieve these goals faster and make their progress more transparent.

Harnessing the data revolution is a critical enabler of the global goals—not only to monitor progress, but also to inclusively engage stakeholders at all levels – local, regional, national, global—to advance evidence-based policies and programs to reach those who need it most. Data can show us where girls are at greatest risk of violence so we can better prevent it; where forests are being destroyed in real-time so we can protect them; and where HIV/AIDS is enduring so we can focus our efforts and finish the fight. Data can catalyze private investment; build modern and inclusive economies; and support transparent and effective investment of resources for social good…..

The Global Partnership for Sustainable Development Data (Global Data Partnership), launched on the sidelines of the 70th United Nations General Assembly, is mobilizing a range of data producers and users—including governments, companies, civil society, data scientists, and international organizations—to harness the data revolution to achieve and measure the Global Goals. Working together, signatories to the Global Data Partnership will address the barriers to accessing and using development data, delivering outcomes that no single stakeholder can achieve working alone….The United States, through the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR), is joining a consortium of funders to seed this initiative. The U.S. Government has many initiatives that are harnessing the data revolution for impact domestically and internationally. Highlights of our international efforts are found below:

Health and Gender

Country Data Collaboratives for Local Impact – PEPFAR and the Millennium Challenge Corporation(MCC) are partnering to invest $21.8 million in Country Data Collaboratives for Local Impact in sub-Saharan Africa that will use data on HIV/AIDS, global health, gender equality, and economic growth to improve programs and policies. Initially, the Country Data Collaboratives will align with and support the objectives of DREAMS, a PEPFAR, Bill & Melinda Gates Foundation, and Girl Effect partnership to reduce new HIV infections among adolescent girls and young women in high-burden areas.

Measurement and Accountability for Results in Health (MA4Health) Collaborative – USAID is partnering with the World Health Organization, the World Bank, and over 20 other agencies, countries, and civil society organizations to establish the MA4Health Collaborative, a multi-stakeholder partnership focused on reducing fragmentation and better aligning support to country health-system performance and accountability. The Collaborative will provide a vehicle to strengthen country-led health information platforms and accountability systems by improving data and increasing capacity for better decision-making; facilitating greater technical collaboration and joint investments; and developing international standards and tools for better information and accountability. In September 2015, partners agreed to a set of common strategic and operational principles, including a strong focus on 3–4 pathfinder countries where all partners will initially come together to support country-led monitoring and accountability platforms. Global actions will focus on promoting open data, establishing common norms and standards, and monitoring progress on data and accountability for the Global Goals. A more detailed operational plan will be developed through the end of the year, and implementation will start on January 1, 2016.

Data2X: Closing the Gender GapData2X is a platform for partners to work together to identify innovative sources of data, including “big data,” that can provide an evidence base to guide development policy and investment on gender data. As part of its commitment to Data2X—an initiative of the United Nations Foundation, Hewlett Foundation, Clinton Foundation, and Bill & Melinda Gates Foundation—PEPFAR and the Millennium Challenge Corporation (MCC) are working with partners to sponsor an open data challenge to incentivize the use of gender data to improve gender policy and practice….(More)”

See also: Data matters: the Global Partnership for Sustainable Development Data. Speech by UK International Development Secretary Justine Greening at the launch of the Global Partnership for Sustainable Development Data.

Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Can Open Data Drive Innovative Healthcare?


Will Greene at Huffington Post: “As healthcare systems worldwide become increasingly digitized, medical scientists and health researchers have more data than ever. Yet much valuable health information remains locked in proprietary or hidden databases. A growing number of open data initiatives aim to change this, but it won’t be easy….

To overcome these challenges, a growing array of stakeholders — including healthcare and tech companies, research institutions, NGOs, universities, governments, patient groups, and individuals — are banding together to develop new regulations and guidelines, and generally promote open data in healthcare.

Some of these initiatives focus on improving transparency in clinical trials. Among those pushing for researchers to share more clinical trials data are groups like AllTrials and the Yale Open Data Access (YODA) Project, donor organizations like the Gates Foundation, and biomedical journals like The BMJ. Private healthcare companies, including some that resisted data sharing in the past, are increasingly seeing value in open collaboration as well.

Other initiatives focus on empowering patients to share their own health data. Consumer genomics companies, personal health records providers, disease management apps, online patient communities and other healthcare services give patients greater access to personal health data than ever before. Some also allow consumers to share it with researchers, enroll in clinical trials, or find other ways to leverage it for the benefit of others.

Another group of initiatives seek to improve the quality and availability of public health data, such as that pertaining to epidemiological trends, health financing, and human behavior.

Governments often play a key role in collecting this kind of data, but some are more open and effective than others. “Open government is about more than a mere commitment to share data,” says Peter Speyer, Chief Data and Technology Officer at the Institute for Health Metrics and Evaluation (IHME), a health research center at the University of Washington. “It’s also about supporting a whole ecosystem for using these data and tapping into creativity and resources that are not available within any single organization.”

Open data may be particularly important in managing infectious disease outbreaks and other public health emergencies. Following the recent Ebola crisis, the World Health Organization issued a statement on the need for rapid data sharing in emergency situations. It laid out guidelines that could help save lives when the next pandemic strikes.

But on its own, open data does not lead to healthcare innovation. “Simply making large amounts of data accessible is good for transparency and trust,” says Craig Lipset, Head of Clinical Innovation at Pfizer, “but it does not inherently improve R&D or health research. We still need important collaborations and partnerships that make full use of these vast data stores.”

Many such collaborations and partnerships are already underway. They may help drive a new era of healthcare innovation ..(More)”

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

Why interdisciplinary research matters


Special issue of Nature: “To solve the grand challenges facing society — energy, water, climate, food, health — scientists and social scientists must work together. But research that transcends conventional academic boundaries is harder to fund, do, review and publish — and those who attempt it struggle for recognition and advancement (see World View, page 291). This special issue examines what governments, funders, journals, universities and academics must do to make interdisciplinary work a joy rather than a curse.

A News Feature on page 308 asks where the modern trend for interdisciplinary research came from — and finds answers in the proliferation of disciplines in the twentieth century, followed by increasingly urgent calls to bridge them. An analysis of publishing data explores which fields and countries are embracing interdisciplinary research the most, and what impact such research has (page 306). Onpage 313, Rick Rylance, head of Research Councils UK and himself a researcher with one foot in literature and one in neuroscience, explains why interdisciplinarity will be the focus of a 2015–16 report from the Global Research Council. Around the world, government funding agencies want to know what it is, whether they should they invest in it, whether they are doing so effectively and, if not, what must change.

How can scientists successfully pursue research outside their comfort zone? Some answers come from Rebekah Brown, director of Monash University’s Monash Sustainability Institute in Melbourne, Australia, and her colleagues. They set out five principles for successful interdisciplinary working that they have distilled from years of encouraging researchers of many stripes to seek sustainability solutions (page 315). Similar ideas help scientists, curators and humanities scholars to work together on a collection that includes clay tablets, papyri, manuscripts and e-mail archives at the John Rylands Research Institute in Manchester, UK, reveals its director, Peter Pormann, on page 318.

Finally, on page 319, Clare Pettitt reassesses the multidisciplinary legacy of Richard Francis Burton — Victorian explorer, ethnographer, linguist and enthusiastic amateur natural scientist who got some things very wrong, but contributed vastly to knowledge of other cultures and continents. Today’s would-be interdisciplinary scientists can draw many lessons from those of the past — and can take our polymathy quiz online at nature.com/inter. (Nature special:Interdisciplinarity)

Addressing Inequality and the ‘Data Divide’


Daniel Castro at the US Chamber of Commerce Foundation: “In the coming years, communities across the nation will increasingly rely on data to improve quality of life for their residents, such as by improving educational outcomes, reducing healthcare costs, and increasing access to financial services. However, these opportunities require that individuals have access to high-quality data about themselves and their communities. Should certain individuals or communities not routinely have data about them collected, distributed, or used, they may suffer social and economic consequences. Just as the digital divide has held back many communities from reaping the benefits of the modern digital era, a looming “data divide” threatens to stall the benefits of data-driven innovation for a wide swathe of America. Given this risk, policymakers should make a concerted effort to combat data poverty.

Data already plays a crucial role in guiding decision making, and it will only become more important over time. In the private sector, businesses use data for everything from predicting inventory demand to responding to customer feedback to determining where to open new stores. For example, an emerging group of financial service providers use non-traditional data sources, such as an individual’s social network, to assess credit risk and make lending decisions. And health insurers and pharmacies are offering discounts to customers who use fitness trackers to monitor and share data about their health. In the public sector, data is at the heart of important efforts like improving patient safety, cutting government waste, and helping children succeed in school. For example, public health officials in states like Indiana and Maryland have turned to data science in an effort to reduce infant mortality rates.

Many of these exciting advancements are made possible by a new generation of technologies that make it easier to collect, share, and disseminate data. In particular, the Internet of Everything is creating a plethora of always-on devices that record and transmit a wealth of information about our world and the people and objects in it. Individuals are using social media to create a rich tapestry of interactions tied to particular times and places. In addition, government investments in critical data systems, such as statewide databases to track healthcare spending and student performance over time, are integral to efforts to harness data for social good….(More)”

The Data Revolution for Sustainable Development


Jeffrey D. Sachs at Project Syndicate: “There is growing recognition that the success of the Sustainable Development Goals (SDGs), which will be adopted on September 25 at a special United Nations summit, will depend on the ability of governments, businesses, and civil society to harness data for decision-making…

One way to improve data collection and use for sustainable development is to create an active link between the provision of services and the collection and processing of data for decision-making. Take health-care services. Every day, in remote villages of developing countries, community health workers help patients fight diseases (such as malaria), get to clinics for checkups, receive vital immunizations, obtain diagnoses (through telemedicine), and access emergency aid for their infants and young children (such as for chronic under-nutrition). But the information from such visits is usually not collected, and even if it is put on paper, it is never used again.
We now have a much smarter way to proceed. Community health workers are increasingly supported by smart-phone applications, which they can use to log patient information at each visit. That information can go directly onto public-health dashboards, which health managers can use to spot disease outbreaks, failures in supply chains, or the need to bolster technical staff. Such systems can provide a real-time log of vital events, including births and deaths, and even use so-called verbal autopsies to help identify causes of death. And, as part of electronic medical records, the information can be used at future visits to the doctor or to remind patients of the need for follow-up visits or medical interventions….
Fortunately, the information and communications technology revolution and the spread of broadband coverage nearly everywhere can quickly make such time lags a thing of the past. As indicated in the report A World that Counts: Mobilizing the Data Revolution for Sustainable Development, we must modernize the practices used by statistical offices and other public agencies, while tapping into new sources of data in a thoughtful and creative way that complements traditional approaches.
Through more effective use of smart data – collected during service delivery, economic transactions, and remote sensing – the fight against extreme poverty will be bolstered; the global energy system will be made much more efficient and less polluting; and vital services such as health and education will be made far more effective and accessible.
With this breakthrough in sight, several governments, including that of the United States, as well as businesses and other partners, have announced plans to launch a new “Global Partnership for Sustainable Development Data” at the UN this month. The new partnership aims to strengthen data collection and monitoring efforts by raising more funds, encouraging knowledge-sharing, addressing key barriers to access and use of data, and identifying new big-data strategies to upgrade the world’s statistical systems.
The UN Sustainable Development Solutions Network will support the new Global Partnership by creating a new Thematic Network on Data for Sustainable Development, which will bring together leading data scientists, thinkers, and academics from across multiple sectors and disciplines to form a center of data excellence….(More)”

Using Big Data to Understand the Human Condition: The Kavli HUMAN Project


Azmak Okan et al in the Journal “Big Data”: “Until now, most large-scale studies of humans have either focused on very specific domains of inquiry or have relied on between-subjects approaches. While these previous studies have been invaluable for revealing important biological factors in cardiac health or social factors in retirement choices, no single repository contains anything like a complete record of the health, education, genetics, environmental, and lifestyle profiles of a large group of individuals at the within-subject level. This seems critical today because emerging evidence about the dynamic interplay between biology, behavior, and the environment point to a pressing need for just the kind of large-scale, long-term synoptic dataset that does not yet exist at the within-subject level. At the same time that the need for such a dataset is becoming clear, there is also growing evidence that just such a synoptic dataset may now be obtainable—at least at moderate scale—using contemporary big data approaches. To this end, we introduce the Kavli HUMAN Project (KHP), an effort to aggregate data from 2,500 New York City households in all five boroughs (roughly 10,000 individuals) whose biology and behavior will be measured using an unprecedented array of modalities over 20 years. It will also richly measure environmental conditions and events that KHP members experience using a geographic information system database of unparalleled scale, currently under construction in New York. In this manner, KHP will offer both synoptic and granular views of how human health and behavior coevolve over the life cycle and why they evolve differently for different people. In turn, we argue that this will allow for new discovery-based scientific approaches, rooted in big data analytics, to improving the health and quality of human life, particularly in urban contexts….(More)”

Open data is not just for startups


Mike Altendorf at CIO: “…Surely open data is just for start-ups, market research companies and people that want to save the world? Well there are two reasons why I wanted to dedicate a bit of time to the subject of open data. First, one of the major barriers to internal innovation that I hear about all the time is the inability to use internal data to inform that innovation. This is usually because data is deemed too sensitive, too complex, too siloed or too difficult to make usable. Leaving aside the issues that any of those problems are going to cause for the organisation more generally, it is easy to see how this can create a problem. So why not use someone else’s data?

The point of creating internal labs and innovation centres is to explore the art of the possible. I quite agree that insight from your own data is a good place to start but it isn’t the only place. You could also argue that by using your own data you are restricting your thinking because you are only looking at information that already relates to your business. If the point of a lab is to explore ideas for supporting the business then you may be better off looking outwards at what is happening in the world around you rather than inwards into the constrained world of the industry you already inhabit….

The fact is there is vast amounts of data sets that are freely available that can be made to work for you if you can just apply the creativity and technical smarts to them.

My second point is less about open data than about opening up data. Organisations collect information on their business operations, customers and suppliers all the time. The smart ones know how to use it to build competitive advantage but the really smart ones also know that there is significant extra value to be gained from sharing that data with the customer or supplier that it relates to. The customer or supplier can then use it to make informed decisions themselves. Some organisations have been doing this for a while. Customers of First Direct have been able to analyse their own spending patterns for years (although the data has been somewhat limited). The benefit to the customer is that they can make informed decisions based on actual data about their past behaviours and so adapt their spending habits accordingly (or put their head firmly in the sand and carry on as before in my case!). The benefit to the bank is that they are able to suggest ideas for how to improve a customer’s financial health alongside the data. Others have looked at how they can help customers by sharing (anonymised) information about what people with similar lifestyles/needs are doing/buying so customers can learn from each other. Trials have shown that customers welcomed the insight….(More)”

 

Flutrack.org: Open-source and linked data for epidemiology


Chorianopoulos K, and Talvis K at Health Informatics Journal: “Epidemiology has made advances, thanks to the availability of real-time surveillance data and by leveraging the geographic analysis of incidents. There are many health information systems that visualize the symptoms of influenza-like illness on a digital map, which is suitable for end-users, but it does not afford further processing and analysis. Existing systems have emphasized the collection, analysis, and visualization of surveillance data, but they have neglected a modular and interoperable design that integrates high-resolution geo-location with real-time data. As a remedy, we have built an open-source project and we have been operating an open service that detects flu-related symptoms and shares the data in real-time with anyone who wants to built upon this system. An analysis of a small number of precisely geo-located status updates (e.g. Twitter) correlates closely with the Google Flu Trends and the Centers for Disease Control and Prevention flu-positive reports. We suggest that public health information systems should embrace an open-source approach and offer linked data, in order to facilitate the development of an ecosystem of applications and services, and in order to be transparent to the general public interest…(More)” See also http://www.flutrack.org/