Viscous Open Data: The Roles of Intermediaries in an Open Data Ecosystem


François van Schalkwyk, Michelle Willmers & Maurice McNaughton in Journal: “Information Technology for Development”: “Open data have the potential to improve the governance of universities as public institutions. In addition, open data are likely to increase the quality, efficacy and efficiency of the research and analysis of higher education systems by providing a shared empirical base for critical interrogation and reinterpretation. Drawing on research conducted by the Emerging Impacts of Open Data in Developing Countries project, and using an ecosystems approach, this research paper considers the supply, demand and use of open data as well as the roles of intermediaries in the governance of South African public higher education. It shows that government’s higher education database is a closed and isolated data source in the data ecosystem; and that the open data that are made available by government is inaccessible and rarely used. In contrast, government data made available by data intermediaries in the ecosystem are being used by key stakeholders. Intermediaries are found to play several important roles in the ecosystem: (i) they increase the accessibility and utility of data; (ii) they may assume the role of a “keystone species” in a data ecosystem; and (iii) they have the potential to democratize the impacts and use of open data. The article concludes that despite poor data provision by government, the public university governance open data ecosystem has evolved because intermediaries in the ecosystem have reduced the viscosity of government data. Further increasing the fluidity of government open data will improve access and ensure the sustainability of open data supply in the ecosystem….(More)”

US Administration Celebrates Five-Year Anniversary of Challenge.gov


White House Fact Sheet: “Today, the Administration is celebrating the five-year anniversary of Challenge.gov, a historic effort by the Federal Government to collaborate with members of the public through incentive prizes to address our most pressing local, national, and global challenges. True to the spirit of the President’s charge from his first day in office, Federal agencies have collaborated with more than 200,000 citizen solvers—entrepreneurs, citizen scientists, students, and more—in more than 440 challenges, on topics ranging from accelerating the deployment of solar energy, to combating breast cancer, to increasing resilience after Hurricane Sandy.

Highlighting continued momentum from the President’s call to harness the ingenuity of the American people, the Administration is announcing:

  • Nine new challenges from Federal agencies, ranging from commercializing NASA technology, to helping students navigate their education and career options, to protecting marine habitats.
  • Expanding support for use of challenges and prizes, including new mentoring support from the General Services Administration (GSA) for interested agencies and a new $244 million innovation platform opened by the U.S. Agency for International Development (USAID) with over 70 partners.

In addition, multiple non-governmental institutions are announcing 14 new challenges, ranging from improving cancer screenings, to developing better technologies to detect, remove, and recover excess nitrogen and phosphorus from water, to increasing the resilience of island communities….

Expanding the Capability for Prize Designers to find one another

The GovLab and MacArthur Foundation Research Network on Opening Governance will launch an expert network for prizes and challenges. The Governance Lab (GovLab) and MacArthur Foundation Research Network on Opening Governance will develop and launch the Network of Innovators (NoI) expert networking platform. NoI will make easily searchable the know-how of innovators on topics ranging from developing prize-backed challenges, opening up data, and use of crowdsourcing for public good. Platform users will answer questions about their skills and experiences, creating a profile that enables them to be matched to those with complementary knowledge to enable mutual support and learning. A beta version for user testing within the Federal prize community will launch in early October, with a full launch at the end of October. NoI will be open to civil servants around the world…(More)”

Web design plays a role in how much we reveal online


European Commission: “A JRC study, “Nudges to Privacy Behaviour: Exploring an Alternative Approach to Privacy Notices“, used behavioural sciences to look at how individuals react to different types of privacy notices. Specifically, the authors analysed users’ reactions to modified choice architecture (i.e. the environment in which decisions take place) of web interfaces.

Two types of privacy behaviour were measured: passive disclosure, when people unwittingly disclose personal information, and direct disclosure, when people make an active choice to reveal personal information. After testing different designs with over 3 000 users from the UK, Italy, Germany and Poland, results show web interface affects decisions on disclosing personal information. The study also explored differences related to country of origin, gender, education level and age.

A depiction of a person’s face on the website led people to reveal more personal information. Also, this design choice and the visualisation of the user’s IP or browsing history had an impact on people’s awareness of a privacy notice. If confirmed, these features are particularly relevant for habitual and instinctive online behaviour.

With regard to education, users who had attended (though not necessarily graduated from) college felt significantly less observed or monitored and more comfortable answering questions than those who never went to college. This result challenges the assumption that the better educated are more aware of information tracking practices. Further investigation, perhaps of a qualitative nature, could help dig deeper into this issue. On the other hand, people with a lower level of education were more likely to reveal personal information unwittingly. This behaviour appeared to be due to the fact that non-college attendees were simply less aware that some online behaviour revealed personal information about themselves.

Strong differences between countries were noticed, indicating a relation between cultures and information disclosure. Even though participants in Italy revealed the most personal information in passive disclosure, in direct disclosure they revealed less than in other countries. Approximately 75% of participants in Italy chose to answer positively to at least one stigmatised question, compared to 81% in Poland, 83% in Germany and 92% in the UK.

Approximately 73% of women answered ‘never’ to the questions asking whether they had ever engaged in socially stigmatised behaviour, compared to 27% of males. This large difference could be due to the nature of the questions (e.g. about alcohol consumption, which might be more acceptable for males). It could also suggest women feel under greater social scrutiny or are simply more cautious when disclosing personal information.

These results could offer valuable insights to inform European policy decisions, despite the fact that the study has targeted a sample of users in four countries in an experimental setting. Major web service providers are likely to have extensive amounts of data on how slight changes to their services’ privacy controls affect users’ privacy behaviour. The authors of the study suggest that collaboration between web providers and policy-makers can lead to recommendations for web interface design that allow for conscientious disclosure of privacy information….(More)”

Five principles for applying data science for social good


Jake Porway at O’Reilly: “….Every week, a data or technology company declares that it wants to “do good” and there are countless workshops hosted by major foundations musing on what “big data can do for society.” Add to that a growing number of data-for-good programs from Data Science for Social Good’s fantastic summer program toBayes Impact’s data science fellowships to DrivenData’s data-science-for-good competitions, and you can see how quickly this idea of “data for good” is growing.

Yes, it’s an exciting time to be exploring the ways new datasets, new techniques, and new scientists could be deployed to “make the world a better place.” We’ve already seen deep learning applied to ocean health,satellite imagery used to estimate poverty levels, and cellphone data used to elucidate Nairobi’s hidden public transportation routes. And yet, for all this excitement about the potential of this “data for good movement,” we are still desperately far from creating lasting impact. Many efforts will not only fall short of lasting impact — they will make no change at all….

So how can these well-intentioned efforts reach their full potential for real impact? Embracing the following five principles can drastically accelerate a world in which we truly use data to serve humanity.

1. “Statistics” is so much more than “percentages”

We must convey what constitutes data, what it can be used for, and why it’s valuable.

There was a packed house for the March 2015 release of the No Ceilings Full Participation Report. Hillary Clinton, Melinda Gates, and Chelsea Clinton stood on stage and lauded the report, the culmination of a year-long effort to aggregate and analyze new and existing global data, as the biggest, most comprehensive data collection effort about women and gender ever attempted. One of the most trumpeted parts of the effort was the release of the data in an open and easily accessible way.

I ran home and excitedly pulled up the data from the No Ceilings GitHub, giddy to use it for our DataKind projects. As I downloaded each file, my heart sunk. The 6MB size of the entire global dataset told me what I would find inside before I even opened the first file. Like a familiar ache, the first row of the spreadsheet said it all: “USA, 2009, 84.4%.”

What I’d encountered was a common situation when it comes to data in the social sector: the prevalence of inert, aggregate data. ….

2. Finding problems can be harder than finding solutions

We must scale the process of problem discovery through deeper collaboration between the problem holders, the data holders, and the skills holders.

In the immortal words of Henry Ford, “If I’d asked people what they wanted, they would have said a faster horse.” Right now, the field of data science is in a similar position. Framing data solutions for organizations that don’t realize how much is now possible can be a frustrating search for faster horses. If data cleaning is 80% of the hard work in data science, then problem discovery makes up nearly the remaining 20% when doing data science for good.

The plague here is one of education. …

3. Communication is more important than technology

We must foster environments in which people can speak openly, honestly, and without judgment. We must be constantly curious about each other.

At the conclusion of one of our recent DataKind events, one of our partner nonprofit organizations lined up to hear the results from their volunteer team of data scientists. Everyone was all smiles — the nonprofit leaders had loved the project experience, the data scientists were excited with their results. The presentations began. “We used Amazon RedShift to store the data, which allowed us to quickly build a multinomial regression. The p-value of 0.002 shows …” Eyes glazed over. The nonprofit leaders furrowed their brows in telegraphed concentration. The jargon was standing in the way of understanding the true utility of the project’s findings. It was clear that, like so many other well-intentioned efforts, the project was at risk of gathering dust on a shelf if the team of volunteers couldn’t help the organization understand what they had learned and how it could be integrated into the organization’s ongoing work…..

4. We need diverse viewpoints

To tackle sector-wide challenges, we need a range of voices involved.

One of the most challenging aspects to making change at the sector level is the range of diverse viewpoints necessary to understand a problem in its entirety. In the business world, profit, revenue, or output can be valid metrics of success. Rarely, if ever, are metrics for social change so cleanly defined….

Challenging this paradigm requires diverse, or “collective impact,” approaches to problem solving. The idea has been around for a while (h/t Chris Diehl), but has not yet been widely implemented due to the challenges in successful collective impact. Moreover, while there are many diverse collectives committed to social change, few have the voice of expert data scientists involved. DataKind is piloting a collective impact model called DataKind Labs, that seeks to bring together diverse problem holders, data holders, and data science experts to co-create solutions that can be applied across an entire sector-wide challenge. We just launchedour first project with Microsoft to increase traffic safety and are hopeful that this effort will demonstrate how vital a role data science can play in a collective impact approach.

5. We must design for people

Data is not truth, and tech is not an answer in-and-of-itself. Without designing for the humans on the other end, our work is in vain.

So many of the data projects making headlines — a new app for finding public services, a new probabilistic model for predicting weather patterns for subsistence farmers, a visualization of government spending — are great and interesting accomplishments, but don’t seem to have an end user in mind. The current approach appears to be “get the tech geeks to hack on this problem, and we’ll have cool new solutions!” I’ve opined that, though there are many benefits to hackathons, you can’t just hack your way to social change….(More)”

Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Routledge International Handbook of Ignorance Studies


Book edited by Matthias Gross and Linsey McGoey: “Once treated as the absence of knowledge, ignorance today has become a highly influential topic in its own right, commanding growing attention across the natural and social sciences where a wide range of scholars have begun to explore the social life and political issues involved in the distribution and strategic use of not knowing. The field is growing fast and this handbook reflects this interdisciplinary field of study by drawing contributions from economics, sociology, history, philosophy, cultural studies, anthropology, feminist studies, and related fields in order to serve as a seminal guide to the political, legal and social uses of ignorance in social and political life….(More)”

The Data Revolution for Sustainable Development


Jeffrey D. Sachs at Project Syndicate: “There is growing recognition that the success of the Sustainable Development Goals (SDGs), which will be adopted on September 25 at a special United Nations summit, will depend on the ability of governments, businesses, and civil society to harness data for decision-making…

One way to improve data collection and use for sustainable development is to create an active link between the provision of services and the collection and processing of data for decision-making. Take health-care services. Every day, in remote villages of developing countries, community health workers help patients fight diseases (such as malaria), get to clinics for checkups, receive vital immunizations, obtain diagnoses (through telemedicine), and access emergency aid for their infants and young children (such as for chronic under-nutrition). But the information from such visits is usually not collected, and even if it is put on paper, it is never used again.
We now have a much smarter way to proceed. Community health workers are increasingly supported by smart-phone applications, which they can use to log patient information at each visit. That information can go directly onto public-health dashboards, which health managers can use to spot disease outbreaks, failures in supply chains, or the need to bolster technical staff. Such systems can provide a real-time log of vital events, including births and deaths, and even use so-called verbal autopsies to help identify causes of death. And, as part of electronic medical records, the information can be used at future visits to the doctor or to remind patients of the need for follow-up visits or medical interventions….
Fortunately, the information and communications technology revolution and the spread of broadband coverage nearly everywhere can quickly make such time lags a thing of the past. As indicated in the report A World that Counts: Mobilizing the Data Revolution for Sustainable Development, we must modernize the practices used by statistical offices and other public agencies, while tapping into new sources of data in a thoughtful and creative way that complements traditional approaches.
Through more effective use of smart data – collected during service delivery, economic transactions, and remote sensing – the fight against extreme poverty will be bolstered; the global energy system will be made much more efficient and less polluting; and vital services such as health and education will be made far more effective and accessible.
With this breakthrough in sight, several governments, including that of the United States, as well as businesses and other partners, have announced plans to launch a new “Global Partnership for Sustainable Development Data” at the UN this month. The new partnership aims to strengthen data collection and monitoring efforts by raising more funds, encouraging knowledge-sharing, addressing key barriers to access and use of data, and identifying new big-data strategies to upgrade the world’s statistical systems.
The UN Sustainable Development Solutions Network will support the new Global Partnership by creating a new Thematic Network on Data for Sustainable Development, which will bring together leading data scientists, thinkers, and academics from across multiple sectors and disciplines to form a center of data excellence….(More)”

Using Big Data to Understand the Human Condition: The Kavli HUMAN Project


Azmak Okan et al in the Journal “Big Data”: “Until now, most large-scale studies of humans have either focused on very specific domains of inquiry or have relied on between-subjects approaches. While these previous studies have been invaluable for revealing important biological factors in cardiac health or social factors in retirement choices, no single repository contains anything like a complete record of the health, education, genetics, environmental, and lifestyle profiles of a large group of individuals at the within-subject level. This seems critical today because emerging evidence about the dynamic interplay between biology, behavior, and the environment point to a pressing need for just the kind of large-scale, long-term synoptic dataset that does not yet exist at the within-subject level. At the same time that the need for such a dataset is becoming clear, there is also growing evidence that just such a synoptic dataset may now be obtainable—at least at moderate scale—using contemporary big data approaches. To this end, we introduce the Kavli HUMAN Project (KHP), an effort to aggregate data from 2,500 New York City households in all five boroughs (roughly 10,000 individuals) whose biology and behavior will be measured using an unprecedented array of modalities over 20 years. It will also richly measure environmental conditions and events that KHP members experience using a geographic information system database of unparalleled scale, currently under construction in New York. In this manner, KHP will offer both synoptic and granular views of how human health and behavior coevolve over the life cycle and why they evolve differently for different people. In turn, we argue that this will allow for new discovery-based scientific approaches, rooted in big data analytics, to improving the health and quality of human life, particularly in urban contexts….(More)”

Syria refugees tap in to legal advice by text


Hannah Kuchler in the Financial Times: “Syrian refugees can now access free legal advice by text message after a Palestinian start-up launched a service in Turkey, which it hopes to expand to reach refugees across Europe.

Refugees fleeing the conflict in Syria can receive legal guidance via their mobile phones on everything from whether they have the right to work to education services available for their children, after Souktel, a small start-up partnered with the American Bar Association.

The 30-person start-up employs both former humanitarian workers from Oxfam and USAID, who understand the problems faced by refugees, and software engineers who tackle the challenge of sorting, tagging and translating enquiries which are then sent to a team of Turkish lawyers.

Jacob Korenblum, president and chief executive of Souktel, said more than 10,000 individuals have used the service since it launched less than three weeks ago, with lawyers busy answering a steady stream of questions.

“Given the strength and rapid interest in this service and the uptake since its launch, we want to scale into Greece and other European countries to meet the same need,” he said. “This is very much becoming a pan-European problem at the very least.”…

The American Bar Association approached Souktel and asked them to build a service that could offer remote legal support and uses funds from international donors to pay the company….

Smartphones — or even basic mobile phones — have fast become one of the easiest ways of communicating for the poor or dispossessed. Even when basic infrastructure has failed, people are able to access information and connect with relatives abroad via their devices.

Mr Korenblum, a Canadian former aid worker, helped found Souktel after he saw young people in Palestine relying on their mobile devices when working there 10 years ago. The company has built similar services on behalf of humanitarian organisations working in other areas — including the UK’s department for international development in Gaza, Iraq and Somalia, among other places…(More)”

 

Civic engagement platform brings the town meeting online


Springwise: “Citizens may have the ability to express enthusiasm or disgust for government policies online, but these opinions are only as valuable as the ears they reach. We recently saw Balancing Act offer citizens the ability to view and play around with their city’s budget, providing governments with a better understanding of the wants and needs of their constituents. Now, CitizenLab is another civic engagement platform, which is bringing the town meeting into the digital age — providing a space for citizens to communicate with their government, and for governments to ‘citizensource’ opinions on their policies.

citizenlab

To begin, participants visit the platform and enter their city. This will take them to a collection of ‘labs’ — categories such as education, health and public spaces. They can then post new ideas, join existing conversations and upvote interesting topics. Local governments can then use the platform as a resource to discover the priorities of its citizens. They can respond directly to discussions and consult the public opinion on important issues. Governments can also acknowledge the most vital issues raised by taking them to city council for discussion. The platform is designed to host positive ideas, rather than raise issues.

Website: www.citizenlab.co