Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Algorithm predicts and prevents train delays two hours in advance


Springwise: “Transport apps such as Ototo make it easier than ever for passengers to stay informed about problems with public transport, but real-time information can only help so much — by the time users find out about a delayed service, it is often too late to take an alternative route. Now, Stockholmstag — the company that runs Sweden’s trains — have found a solution in the form of an algorithm called ‘The Commuter Prognosis’, which can predict network delays up to two hours in advance, giving train operators time to issue extra services or provide travelers with adequate warning.
The system was created by mathematician Wilhelm Landerholm. It uses historical data to predict how a small delay, even as little as two minutes, will affect the running of the rest of the network. Often the initial late train causes a ripple effect, with subsequent services being delayed to accommodate new platform arrival time, which then affect subsequent trains, and so on. But soon, using ‘The Commuter Prognosis’, Stockholmstag train operators will be able to make the necessary adjustments to prevent this. In addition, the information will be relayed to commuters, enabling them to take a different train and therefore reducing overcrowding. The prediction tool is expected to be put into use in Sweden by the end of the year….(More)”

Revolution Delayed: The Impact of Open Data on the Fight against Corruption


Report by RiSSC – Research Centre on Security and Crime (Italy): “In the recent years, the demand for Open Data picked up stream among stakeholders to increasing transparency and accountability of the Public Sector. Governments are supporting Open Data supply, to achieve social and economic benefits, return on investments, and political consensus.

While it is self-evident that Open Data contributes to greater transparency – as it makes data more available and easy to use by the public and governments, its impact on fighting corruption largely depends on the ability to analyse it and develop initiatives that trigger both social accountability mechanisms, and government responsiveness against illicit or inappropriate behaviours.

To date, Open Data Revolution against corruption is delayed. The impact of Open Data on the prevention and repression of corruption, and on the development of anti- corruption tools, appears to be limited, and the return on investments not yet forthcoming. Evidence remains anecdotal, and a better understanding on the mechanisms and dynamics of using Open Data against corruption is needed.

The overall objective of this exploratory study is to provide evidence on the results achieved by Open Data, and recommendations for the European Commission and Member States’ authorities, for the implementation of effective anti-corruption strategies based on transparency and openness, to unlock the potential impact of “Open Data revolution” against Corruption.

The project has explored the legal framework and the status of implementation of Open Data policies in four EU Countries – Italy, United Kingdom, Spain, and Austria. TACOD project has searched for evidence on Open Data role on law enforcement cooperation, anti-corruption initiatives, public campaigns, and investigative journalism against corruption.

RiSSC – Research Centre on Security and Crime (Italy), the University of Oxford and the University of Nottingham (United Kingdom), Transparency International (Italy and United Kingdom), the Institute for Conflict Resolution (Austria), and Blomeyer&Sanz (Spain), have carried out the research between January 2014 and February 2015, under an agreement with the European Commission – DH Migration and Home Affairs. The project has been coordinated by RiSSC, with the support of a European Working Group of Experts, chaired by prof. Richard Rose, and an external evaluator, Mr. Andrea Menapace, and it has benefited from the contribution of many experts, activists, representatives of Institutions in the four Countries….(More)

The Website That Visualizes Human Activity in Cities Across the World


Emerging Technology From the arXiv: “The data from mobile phones is revolutionizing our understanding of human activity. In recent years, it has revealed commuting patterns in major cities, wealth distribution in African countries, and even reproductive strategies in western societies. That has provided unprecedented insight for economists, sociologists, and city planners among others.

But this kind of advanced research is just a first step in a much broader trend. Phone data is set to become a standard resource that almost anyone can use to study and watch humanity continuously, much as they can now watch the weather unfold anywhere on the planet almost in real time.

But one thing is holding them back—the lack of powerful computational tools that can gather, crunch, and present the data in meaningful ways.

Today, that looks set to change to the work of Dániel Kondor and a few pals at the SENSEable City Laboratory, part of MIT, and at Ericsson, a company that produces network infrastructure technologies. These guys have unveiled a powerful online tool that uses mobile phone data to visualize human activity in cities all over the world.

This new tool, called ManyCities, allows anybody to study human activity in various cities with unprecedented detail.  But the key is that it organizes and presents the data in intuitive ways that quickly reveals trends and special events….

ManyCities then presents the data in three simple ways. The first shows how phone usage varies over time, revealing clear daily and weekly patterns as well as longer term trends. For example, ManyCities clearly shows a steady, long-term increase in data traffic, the effect of holidays, and how usage patterns change dramatically during important events like the Wimbledon tennis championship in London.

ManyCities also allows user to drill down into the data to compare patterns in different neighborhoods or in different cities. It shows, for example, that text message activity peaks in the morning in Hong Kong, in the evening in New York and at midday in London….Kondor and co have made it available at www.ManyCities.org for anybody to try.

This kind of tool is clearly evolving into a real time analytics tool. It’s not hard to imagine how people could use it to plan events such as conferences, sporting contests, or concerts or to plan emergency city infrastructure. One day people may even tune in to a “smartphone forecast” to find out if their phone will work when the big game kicks off that evening.

Ref: arxiv.org/abs/1509.00459 : Visualizing Signatures Of Human Activity In Cities Across The Globe”

Can the crowd deliver more open government?


  at GovernmentNews: “…Crowdsourcing and policy making was the subject of a lecture by visiting academic Dr Tanja Aitamurto at Victoria’s Swinburne University of Technology earlier this month. Dr Aitamurto wrote “Crowdsourcing for Democracy: New Era in Policy-Making” and led the design and implementation of the Finnish Experiment, a pioneering case study in crowdsourcing policy making.

She spoke about how Scandinavian countries have used crowdsourcing to “tap into the collective intelligence of a large and diverse crowd” in an “open ended knowledge information search process” in an open call for anybody to participate online and complete a task.

It has already been used widely and effectively by companies  such as Proctor and Gamble who offer a financial reward for solutions to their R&D problems.

The Finnish government recently used crowdsourcing when it came to reform the country’s Traffic Act following a rash of complaints to the Minister of the Environment about it. The Act, which regulates issues such as off-road traffic, is an emotive issue in Finland where snow mobiles are used six months of the year and many people live in remote areas.

The idea was for people to submit problems and solutions online, covering areas such as safety, noise, environmental protection, the rights of snowmobile owners and landowners’ rights. Everyone could see what was written and could comment on it.

Dr Aitamurto said crowdsourcing had four stages:

• The problem mapping space, where people were asked to outline the issues that needed solving
• An appeal for solutions
• An expert panel evaluated the comments received based on the criteria of: effectiveness, cost efficiency, ease of implementation and fairness. The crowd also had the chance to evaluate and rank solutions online
• The findings were then handed over to the government for the law writing process

Dr Aitamurto said active participation seemed to create a strong sense of empowerment for those involved.

She said some people reported that it was the first time in their lives they felt they were really participating in democracy and influencing decision making in society. They said it felt much more real than voting in an election, which felt alien and remote.

“Participation becomes a channel for advocacy, not just for self-interest but a channel to hear what others are saying and then also to make yourself heard. People expected a compromise at the end,” Dr Aitamurto said.

Being able to participate online was ideal for people who lived remotely and turned crowdsourcing into a democratic innovation which brought citizens closer to policy and decision making between elections.

Other benefits included reaching out to tap into new pools of knowledge, rather than relying on a small group of homogenous experts to solve the problem.

“When we use crowdsourcing we actually extend our knowledge search to multiple, hundreds of thousands of distant neighbourhoods online and that can be the power of crowdsourcing: to find solutions and information that we wouldn’t find otherwise. We find also unexpected information because it’s a self-selecting crowd … people that we might not have in our networks already,” Dr Aitamurto said.

The process can increase transparency as people interact on online platforms and where the government keeps feedback loops going.

Dr Aitamurto is also a pains to highlight what crowdsourcing is not and cannot be, because participants are self-selecting and not statistically representative.

“The crowd doesn’t make decisions, it provides information. It’s not a method or tool for direct democracy and it’s not a public opinion poll either”.

Crowdsourcing has fed into policy in other countries too, for example, during Iceland’s constitutional reform and in the United States where the federal Emergency Management Agency overhauled its strategy after a string of natural disasters.

Australian government has been getting in on the act using cloud-based software Citizen Space to gain input into a huge range of topics. While much of it is technically consultation, rather than feeding into actual policy design, it is certainly a step towards more open government.

British company Delib, which is behind the software, bills it as “managing, publicising and archiving all of your organisation’s consultation activity”.

One council who has used Citizens Space is Wyong Shire on the NSW Central Coast. The council has used the consultation hub to elicit ratepayers’ views on a number of topics, including a special rate variation, community precinct forums, strategic plans and planning decisions.

One of Citizen Space’s most valuable features is the section ‘we asked, you said, we did’….(More)”

A new journal wants to publish your research ideas


at ScienceInsider: “Do you have a great idea for a study that you want to share with the world? A new journal will gladly publish it. Research Ideas and Outcomes(RIO) will also publish papers on your methods, workflows, data, reports, and software—in short, “all outputs of the research cycle.” RIO, an open-access (OA) journal, was officially launched today and will start accepting submissions in November.

“We’re interested in making the full process of science open,” says RIO founding editor Ross Mounce, a researcher at the Natural History Museum in London. Many good research proposals fall by the wayside because funding agencies have limited budgets, Mounce says; RIO is a way to give them another chance. Mounce hopes that funders will use the journal to spot interesting new projects.

Publishing proposals can also help create links between research teams, Mounce says. “Let’s say you’re going to Madagascar for 6 months to sample turtle DNA,” he suggests. ”If you can let other researchers know ahead of time, you can agree to do things together.”

RIO‘s idea to publish research proposals is “exactly what we need if we really want to have open science,” says Iryna Kuchma, the OA program manager at the nonprofit organization Electronic Information for Libraries in Rome. Pensoft, the publishing company behind RIO, is a “strong open-access publishing venue” that has proven its worth with more than a dozen journals in the biodiversity field, Kuchma says.

The big question is, of course: Will researchers want to share promising ideas, at the risk that rivals run with them?…(More)”

The Silo Effect – The Peril of Expertise and the Promise of Breaking Down Barriers


Book by Gillian Tett: “From award-winning columnist and journalist Gillian Tett comes a brilliant examination of how our tendency to create functional departments—silos—hinders our work…and how some people and organizations can break those silos down to unleash innovation.

One of the characteristics of industrial age enterprises is that they are organized around functional departments. This organizational structure results in both limited information and restricted thinking. The Silo Effect asks these basic questions: why do humans working in modern institutions collectively act in ways that sometimes seem stupid? Why do normally clever people fail to see risks and opportunities that later seem blindingly obvious? Why, as psychologist Daniel Kahneman put it, are we sometimes so “blind to our own blindness”?

Gillian Tett, journalist and senior editor for the Financial Times, answers these questions by plumbing her background as an anthropologist and her experience reporting on the financial crisis in 2008. In The Silo Effect, she shares eight different tales of the silo syndrome, spanning Bloomberg’s City Hall in New York, the Bank of England in London, Cleveland Clinic hospital in Ohio, UBS bank in Switzerland, Facebook in San Francisco, Sony in Tokyo, the BlueMountain hedge fund, and the Chicago police. Some of these narratives illustrate how foolishly people can behave when they are mastered by silos. Others, however, show how institutions and individuals can master their silos instead. These are stories of failure and success.

From ideas about how to organize office spaces and lead teams of people with disparate expertise, Tett lays bare the silo effect and explains how people organize themselves, interact with each other, and imagine the world can take hold of an organization and lead from institutional blindness to 20/20 vision. – (More)”

Open data can unravel the complex dealings of multinationals


 in The Guardian: “…Just like we have complementary currencies to address shortcomings in national monetary systems, we now need to encourage an alternative accounting sector to address shortcomings in global accounting systems.

So what might this look like? We already are seeing the genesis of this in the corporate open data sector. OpenCorporates in London has been a pioneer in this field, creating a global unique identifier system to make it easier to map corporations. Groups like OpenOil in Berlin are now using the OpenCorporates classification system to map companies like BP. Under the tagline “Imagine an open oil industry”, they have also begun mapping ground-level contract and concession data, and are currently building tools to allow the public to model the economics of particular mines and oil fields. This could prove useful in situations where doubt is cast on the value of particular assets controlled by public companies in politically fragile states.

 OpenOil’s objective is not just corporate transparency. Merely disclosing information does not advance understanding. OpenOil’s real objective is to make reputable sources of information on oil companies usable to the general public. In the case of BP, company data is already deposited in repositories like Companies House, but in unusable, jumbled and jargon-filled pdf formats. OpenOil seeks to take such transparency, and turn it into meaningful transparency.

According to OpenOil’s Anton Rühling, a variety of parties have started to use their information. “During the recent conflicts in Yemen we had a sudden spike in downloads of our Yemeni oil contract information. We traced this to UAE, where a lot of financial lawyers and investors are based. They were clearly wanting to see how the contracts could be affected.” Their BP map even raised interest from senior BP officials. “We were contacted by finance executives who were eager to discuss the results.”

Open mapping

Another pillar of the alternative accounting sector that is emerging are supply chain mapping systems. The supply chain largely remains a mystery. In standard corporate accounts suppliers appear as mere expenses. No information is given about where the suppliers are based and what their standards are. In the absence of corporate management volunteering that information, Sourcemap has created an open platform for people to create supply chain maps themselves. Progressively-minded companies – such as Fairphone – have now begun to volunteer supply chain information on the platform.

One industry forum that is actively pondering alternative accounting is ICAEW’s AuditFutures programme. They recently teamed up with the Royal College of Art’s service design programme to build design thinking into accounting practice. AuditFuture’s Martin Martinoff wants accountants’ to perceive themselves as being creative innovators for the public interest. “Imagine getting 10,000 auditors online together to develop an open crowdsourced audit platform.”…(More)

Local open data ecosystems – a prototype map


Ed Parkes and Gail Dawes at Nesta: “It is increasingly recognised that some of the most important open data is published by local authorities (LAs) – data which is important to us like bin collection days, planning applications and even where your local public toilet is. Also given the likely move towards greater decentralisation, firstly through devolution to cities, the importance of the publication of local open data could arguably become more important over the next couple of years. In addition, as of 1st April, there is a new transparency code for local government requiring local authorities to publish further information on things like spending to local land assets. To pre-empt this likely renewed focus on local open data we have begun to develop a prototype map to highlight the UK’s local open data ecosystem.

Already there is some great practice in the publication of open data at a local level – such as Leeds Data Mill, London Datastore, and Open Data Sheffield. This regional activity is also characterised not just by high quality data publication, but also by pulling together through hackdays, challenges and meetups a community interested in the power of open data. This creates an ecosystem of publishers and re-users at a local level. Some of the best practice in relation to developing such an ecosystem was recognised by the last government in the announcement of a group of Local Authority Open Data Champions. Some of these were also recipients of the funding for projects from both the Cabinet Office and through the Open Data User Group.

Outside of this best practice it isn’t always easy to understand how developed smaller, less urban open data agendas are. Other than looking at each councils’ website or increasingly on the data portals that forwarding thinking councils are providing, there is a surprisingly large number of places that local authorities could make their open data available. The most well known of these is the Openly Local project but at the time of writing this now seems to be retired. Perhaps the best catalogue of local authority data is on Data.gov.uk itself. This has 1,449 datasets published by LAs across 200 different organisations. Following that there is the Open Data Communities website which hosts links to LA linked datasets. Using data from the latter, Steve Peters has developed the local data dashboard (which was itself based on the UK Local Government Open Data resource map from Owen Boswarva). In addition, local authorities can also register their open data in the LGA’s Open Data Inventory Service and take it through the ODI’s data certification process.

Prototype map of local open data eco-systems

To try to highlight patterns in local authority open data publication we decided to make a map of activity around the country (although in the first instance we’ve focused on England)….(More)

This Is What Controversies Look Like in the Twittersphere


Emerging Technology From the arXiv: “A new way of analyzing disagreement on social media reveals that arguments in the Twittersphere look like fireworks.

Many a controversy has raged on social media platforms such as Twitter. Some last for weeks or months, others blow themselves in an afternoon. And yet most go unnoticed by most people. That would change if there was a reliable way of spotting controversies in the Twitterstream in real time.

That could happen thanks to the work of Kiran Garimella and pals at Aalto University in Finland. These guys have found a way to spot the characteristics of a controversy in a collection of tweets and distinguish this from a noncontroversial conversation.

Various researchers have studied controversies on Twitter but these have all focused on preidentified arguments, whereas Garimella and co want to spot them in the first place. Their key idea is that the structure of conversations that involve controversy are different from those that are benign.

And they think this structure can be spotted by studying various properties of the conversation, such as the network of connections between those involved in a topic; the structure of endorsements, who agrees with whom; and the sentiment of the discussion, whether positive and negative.

They test this idea by first studying ten conversations associated with hashtags that are known to be controversial and ten that are known to be benign. Garimella and co map out the structure of these discussion by looking at the networks of retweets, follows, keywords and combinations of these….(More)

More: arxiv.org/abs/1507.05224 : Quantifying Controversy in Social Media