French digital rights bill published in ‘open democracy’ first


France24: “A proposed law on the Internet and digital rights in France has been opened to public consultation before it is debated in parliament in an “unprecedented” exercise in “open democracy”.

The text of the “Digital Republic” bill was published online on Saturday and is open to suggestions for amendments by French citizens until October 17.

It can be found on the “Digital Republic” web page, and is even available in English.

“We are opening a new page in the history of our democracy,” Prime Minister Manuel Valls said at a press conference as the consultation was launched. “This is the first time in France, or indeed in any European country, that a proposed law has been opened to citizens in this way.”

“And it won’t be the last time,” he said, adding that the move was an attempt to redress a “growing distrust of politics”.

Participants will be able to give their opinions and make suggestions for changes to the text of the bill.

Suggestions that get the highest number of public votes will be guaranteed an official response before the bill is presented to parliament.

Freedoms and fairness

In its original and unedited form, the text of the bill pushes heavily towards online freedoms as well as improving the transparency of government.

An “Open Data” policy would make official documents and public sector research available online, while a “Net Neutrality” clause would prevent Internet services such as Netflix or YouTube from paying for faster connection speeds at the expense of everyone else.

For personal freedoms, the law would allow citizens the right to recover emails, files and other data such as pictures stored on “cloud” services….(More)”

Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Personalising data for development


Wolfgang Fengler and Homi Kharas in the Financial Times: “When world leaders meet this week for the UN’s general assembly to adopt the Sustainable Development Goals (SDGs), they will also call for a “data revolution”. In a world where almost everyone will soon have access to a mobile phone, where satellites will take high-definition pictures of the whole planet every three days, and where inputs from sensors and social media make up two thirds of the world’s new data, the opportunities to leverage this power for poverty reduction and sustainable development are enormous. We are also on the verge of major improvements in government administrative data and data gleaned from the activities of private companies and citizens, in big and small data sets.

But these opportunities are yet to materialize in any scale. In fact, despite the exponential growth in connectivity and the emergence of big data, policy making is rarely based on good data. Almost every report from development institutions starts with a disclaimer highlighting “severe data limitations”. Like castaways on an island, surrounded with water they cannot drink unless the salt is removed, today’s policy makers are in a sea of data that need to be refined and treated (simplified and aggregated) to make them “consumable”.

To make sense of big data, we used to depend on data scientists, computer engineers and mathematicians who would process requests one by one. But today, new programs and analytical solutions are putting big data at anyone’s fingertips. Tomorrow, it won’t be technical experts driving the data revolution but anyone operating a smartphone. Big data will become personal. We will be able to monitor and model social and economic developments faster, more reliably, more cheaply and on a far more granular scale. The data revolution will affect both the harvesting of data through new collection methods, and the processing of data through new aggregation and communication tools.

In practice, this means that data will become more actionable by becoming more personal, more timely and more understandable. Today, producing a poverty assessment and poverty map takes at least a year: it involves hundreds of enumerators, lengthy interviews and laborious data entry. In the future, thanks to hand-held connected devices, data collection and aggregation will happen in just a few weeks. Many more instances come to mind where new and higher-frequency data could generate development breakthroughs: monitoring teacher attendance, stocks and quality of pharmaceuticals, or environmental damage, for example…..

Despite vast opportunities, there are very few examples that have generated sufficient traction and scale to change policy and behaviour and create the feedback loops to further improve data quality. Two tools have personalised the abstract subjects of environmental degradation and demography (see table):

  • Monitoring forest fires. The World Resources Institute has launched Global Forest Watch, which enables users to monitor forest fires in near real time, and overlay relevant spatial information such as property boundaries and ownership data to be developed into a model to anticipate the impact on air quality in affected areas in Indonesia, Singapore and Malaysia.
  • Predicting your own life expectancy. The World Population Program developed a predictive tool – www.population.io – showing each person’s place in the distribution of world population and corresponding statistical life expectancy. In just a few months, this prototype attracted some 2m users who shared their results more than 25,000 times on social media. The traction of the tool resulted from making demography personal and converting an abstract subject matter into a question of individual ranking and life expectancy.

A new Global Partnership for Sustainable Development Data will be launched at the time of the UN General Assembly….(More)”

Can Open Data Drive Innovative Healthcare?


Will Greene at Huffington Post: “As healthcare systems worldwide become increasingly digitized, medical scientists and health researchers have more data than ever. Yet much valuable health information remains locked in proprietary or hidden databases. A growing number of open data initiatives aim to change this, but it won’t be easy….

To overcome these challenges, a growing array of stakeholders — including healthcare and tech companies, research institutions, NGOs, universities, governments, patient groups, and individuals — are banding together to develop new regulations and guidelines, and generally promote open data in healthcare.

Some of these initiatives focus on improving transparency in clinical trials. Among those pushing for researchers to share more clinical trials data are groups like AllTrials and the Yale Open Data Access (YODA) Project, donor organizations like the Gates Foundation, and biomedical journals like The BMJ. Private healthcare companies, including some that resisted data sharing in the past, are increasingly seeing value in open collaboration as well.

Other initiatives focus on empowering patients to share their own health data. Consumer genomics companies, personal health records providers, disease management apps, online patient communities and other healthcare services give patients greater access to personal health data than ever before. Some also allow consumers to share it with researchers, enroll in clinical trials, or find other ways to leverage it for the benefit of others.

Another group of initiatives seek to improve the quality and availability of public health data, such as that pertaining to epidemiological trends, health financing, and human behavior.

Governments often play a key role in collecting this kind of data, but some are more open and effective than others. “Open government is about more than a mere commitment to share data,” says Peter Speyer, Chief Data and Technology Officer at the Institute for Health Metrics and Evaluation (IHME), a health research center at the University of Washington. “It’s also about supporting a whole ecosystem for using these data and tapping into creativity and resources that are not available within any single organization.”

Open data may be particularly important in managing infectious disease outbreaks and other public health emergencies. Following the recent Ebola crisis, the World Health Organization issued a statement on the need for rapid data sharing in emergency situations. It laid out guidelines that could help save lives when the next pandemic strikes.

But on its own, open data does not lead to healthcare innovation. “Simply making large amounts of data accessible is good for transparency and trust,” says Craig Lipset, Head of Clinical Innovation at Pfizer, “but it does not inherently improve R&D or health research. We still need important collaborations and partnerships that make full use of these vast data stores.”

Many such collaborations and partnerships are already underway. They may help drive a new era of healthcare innovation ..(More)”

Ethical, Safe, and Effective Digital Data Use in Civil Society


Blog by Lucy Bernholz, Rob Reich, Emma Saunders-Hastings, and Emma Leeds Armstrong: “How do we use digital data ethically, safely, and effectively in civil society. We have developed three early principles for consideration:

  • Default to person-centered consent.
  • Prioritize privacy and minimum viable data collection.
  • Plan from the beginning to open (share) your work.

This post provides a synthesis from a one day workshop that informed these principles. It concludes with links to draft guidelines you can use to inform partnerships between data consultants/volunteers and nonprofit organizations….(More)

These three values — consent, minimum viable data collection, and open sharing- comprise a basic framework for ethical, safe, and effective use of digital data by civil society organizations. They should be integrated into partnerships with data intermediaries and, perhaps, into general data practices in civil society.

We developed two tools to guide conversations between data volunteers and/or consultants and nonprofits. These are downloadable below. Please use them, share them, improve them, and share them again….

  1. Checklist for NGOs and external data consultants
  2. Guidelines for NGOs and external data consultants (More)”

Smoke Signals: Open data & analytics for preventing fire deaths


Enigma: “Today we are launching Smoke Signals, an open source civic analytics tool that helps local communities determine which city blocks are at the highest risk of not having a smoke alarm.

25,000 people are killed or injured in 1 million fires across the United States each year. With over 130 million housing units across the country, 4.5 million of them do not have smoke detectors, placing their inhabitants at substantial risk. Driving this number down is the single most important factor for saving lives put at risk by fire.

Organizations like the Red Cross are investing a lot of resources to buy and install smoke alarms in people’s homes. But a big challenge remains: in a city of millions, what doors should you knock on first when conducting an outreach effort?

We began working on the problem of targeting the blocks at highest risk of not having a smoke alarm with the City of New Orleans last spring. (You can read about this work here.) Over the past few months, with collaboration from the Red Cross and DataKind, we’ve built out a generalized model and a set of tools to offer the same analytics potential to 178 American cities, all in a way that is simple to use and sensitive to how on-the-ground operations are organized.

We believe that Smoke Signals is more a collection of tools and collaborations than it is a slick piece of software that can somehow act as a panacea to the problem of fire fatalities. Core to its purpose and mission are a set of commitments:

  • an ongoing collaboration with the Red Cross wherein our smoke alarm work informs their on-the-ground outreach
  • a collaboration with DataKind to continue applying volunteer work to the improvement of the underlying models and data that drive the risk analysis
  • a working relationship with major American cities to help integrate our prediction models into their outreach programs

and tools:

  • a downloadable CSV for 178 American municipalities that associate city streets to risk scores
  • an interactive map for an immediate bird’s eye assessment of at-risk city blocks
  • an API endpoint to which users can upload a CSV of local fire incidents in order to improve scores for their area

We believe this is an important contribution to public safety and the better delivery of government services. However, we also consider it a work in progress, a demonstration of how civic analytic solutions can be shared and generalized across the country. We are open sourcing all of the components that went into it and invite anyone with an interest in making it better to get involved….(More)”

Research on digital identity ecosystems


Francesca Bria et al at NESTA/D-CENT: “This report presents a concrete analysis of the latest evolution of the identity ecosystem in the big data context, focusing on the economic and social value of data and identity within the current digital economy. This report also outlines economic, policy, and technical alternatives to develop an identity ecosystem and management of data for the common good that respects citizens’ rights, privacy and data protection.

Key findings

  • This study presents a review of the concept of identity and a map of the key players in the identity industry (such as data brokers and data aggregators), including empirical case studies of identity management in key sectors.
    ….
  • The “datafication” of individuals’ social lives, thoughts and moves is a valuable commodity and constitutes the backbone of the “identity market” within which “data brokers” (collectors, purchasers or sellers) play key different roles in creating the market by offering various services such as fraud, customer relation, predictive analytics, marketing and advertising.
  • Economic, political and technical alternatives for identity to preserve trust, privacy and data ownership in today’s big data environments are formulated. The report looks into access to data, economic strategies to manage data as commons, consent and licensing, tools to control data, and terms of services. It also looks into policy strategies such as privacy and data protection by design and trust and ethical frameworks. Finally, it assesses technical implementations looking at identity and anonymity, cryptographic tools; security; decentralisation and blockchains. It also analyses the future steps needed in order to move into the suggested technical strategies….(More)”

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

Openness an Essential Building Block for Inclusive Societies


 (Mexico) in the Huffington Post: “The international community faces a complex environment that requires transforming the way we govern. In that sense, 2015 marks a historic milestone, as 193 Member States of the United Nations will come together to agree on the adoption of the 2030 Agenda. With the definition of the 17 Sustainable Development Goals (SDGs), we will set an ambitious course toward a better and more inclusive world for the next 15 years.

The SDGs will be established just when governments deal with new and more defiant challenges, which require increased collaboration with multiple stakeholders to deliver innovative solutions. For that reason, cutting-edge technologies, fueled by vast amounts of data, provide an efficient platform to foster a global transformation and consolidate more responsive, collaborative and open governments.

Goal 16 seeks to promote just, peaceful and inclusive societies by ensuring access to public information, strengthening the rule of law, as well as building stronger and more accountable institutions. By doing so, we will contribute to successfully achieve the rest of the 2030 Agenda objectives.

During the 70th United Nations General Assembly, the 11 countries of the Steering Committee of the Open Government Partnership (OGP), along with civil-society leaders, will gather to acknowledge Goal 16 as a common target through a Joint Declaration: Open Government for the Implementation of the 2030 Agenda for Sustainable Development. As the Global Summit of OGP convenes this year in Mexico City, on October 28th and 29th, my government will call on all 65 members to subscribe to this fundamental declaration.

The SDGs will be reached only through trustworthy, effective and inclusive institutions. This is why Mexico, as current chair of the OGP, has committed to promote citizen participation, innovative policies, transparency and accountability.

Furthermore, we have worked with a global community of key players to develop the international Open Data Charter (ODC), which sets the founding principles for a greater coherence and increased use of open data across the world. We seek to recognize the value of having timely, comprehensive, accessible, and comparable data to improve governance and citizen engagement, as well as to foster inclusive development and innovation….(More)”

Addressing Inequality and the ‘Data Divide’


Daniel Castro at the US Chamber of Commerce Foundation: “In the coming years, communities across the nation will increasingly rely on data to improve quality of life for their residents, such as by improving educational outcomes, reducing healthcare costs, and increasing access to financial services. However, these opportunities require that individuals have access to high-quality data about themselves and their communities. Should certain individuals or communities not routinely have data about them collected, distributed, or used, they may suffer social and economic consequences. Just as the digital divide has held back many communities from reaping the benefits of the modern digital era, a looming “data divide” threatens to stall the benefits of data-driven innovation for a wide swathe of America. Given this risk, policymakers should make a concerted effort to combat data poverty.

Data already plays a crucial role in guiding decision making, and it will only become more important over time. In the private sector, businesses use data for everything from predicting inventory demand to responding to customer feedback to determining where to open new stores. For example, an emerging group of financial service providers use non-traditional data sources, such as an individual’s social network, to assess credit risk and make lending decisions. And health insurers and pharmacies are offering discounts to customers who use fitness trackers to monitor and share data about their health. In the public sector, data is at the heart of important efforts like improving patient safety, cutting government waste, and helping children succeed in school. For example, public health officials in states like Indiana and Maryland have turned to data science in an effort to reduce infant mortality rates.

Many of these exciting advancements are made possible by a new generation of technologies that make it easier to collect, share, and disseminate data. In particular, the Internet of Everything is creating a plethora of always-on devices that record and transmit a wealth of information about our world and the people and objects in it. Individuals are using social media to create a rich tapestry of interactions tied to particular times and places. In addition, government investments in critical data systems, such as statewide databases to track healthcare spending and student performance over time, are integral to efforts to harness data for social good….(More)”