Open Data Barometer (second edition)


The second edition of the Open Data Barometer: “A global movement to make government “open by default” picked up steam in 2013 when the G8 leaders signed an Open Data Charter – promising to make public sector data openly available, without charge and in re-useable formats. In 2014 the G20 largest industrial economies followed up by pledging to advance open data as a tool against corruption, and the UN recognized the need for a “Data Revolution” to achieve global development goals.
However, this second edition of the Open Data Barometer shows that there is still a long way to go to put the power of data in the hands of citizens. Core data on how governments are spending our money and how public services are performing remains inaccessible or paywalled in most countries. Information critical to fight corruption and promote fair competition, such as company registers, public sector contracts, and land titles, is even harder to get. In most countries, proactive disclosure of government data is not mandated in law or policy as part of a wider right to information, and privacy protections are weak or uncertain.
Our research suggests some of the key steps needed to ensure the “Data Revolution” will lead to a genuine revolution in the transparency and performance of governments:

  • High-level political commitment to proactive disclosure of public sector data, particularly the data most critical to accountability
  • Sustained investment in supporting and training a broad cross-section of civil society and entrepreneurs to understand and use data effectively
  • Contextualizing open data tools and approaches to local needs, for example by making data visually accessible in countries with lower literacy levels.
  • Support for city-level open data initiatives as a complement to national-level programmes
  • Legal reform to ensure that guarantees of the right to information and the right to privacy underpin open data initiatives

Over the next six months, world leaders have several opportunities to agree these steps, starting with the United Nation’s high-level data revolution in Africa conference in March, Canada’s global International Open Data Conference in May and the G7 summit in Germany this June. It is crucial that these gatherings result in concrete actions to address the political and resource barriers that threaten to stall open data efforts….(More)”.

Donated Personal Data Could Aid Lifestyle Researchers


Anya Skatova and James Goulding at Scientific American: “In the future it will be possible to donate our personal data to charitable causes. All sorts of data is recorded about us as we go about our daily lives—what we buy, where we go, who we call on the phone and our use of the internet. The time is approaching when we could liberate that data in support of good causes. Given many people already donate precious resources such as money or even blood for the benefit of society at large, this step might not be far away.
How could donated data help our society? Data is a rich source of people’s habits—shopping data from loyalty cards, for example, can reflect our diet. If people donate their personal data for research, analysis of it can provide scope to improve everything from understandings of the dietary pre-cursors to diabetes to the impact of lifestyle on heart disease.
But there are vital issues around the collection and use of personal data that must be addressed. Donation rests on trust: would people give their data away knowing that researchers will examine it, even if anonymously? Would they want others scrutinising their diet, or their shopping habits? Would people feel their privacy was being invaded, even if they had chosen to donate to help medical research?
Who would donate data to research?
Our recent research has found that around 60% of people are willing to donate their data for uses that will benefit the public. In some ways this is not surprising. As previous research demonstrated, people help others and take part in various pro-social activities. People voluntarily give to benefit society at large: they donate money to charities, or run marathons to raise money without knowing exactly who will benefit; they give blood, bone marrow, or even organs. They often do so out of concern for the welfare of others, or in other cases for more selfish reasons, such as enhancing their reputation, professional benefit, or just to feel good about themselves….
Donating data is certainly different from donating money or blood—there is very little obvious cost to us when donating our data. Unlike blood or money, data is something for which most of us have no use, nor has it any real monetary value to those of us that generate it, but it becomes valuable when combined with the data of others.
Currently companies leverage personal data to make money because it provides them with sophisticated understanding of consumer behaviour, from which they in turn can profit. But shouldn’t our data benefit us too?…(More)”

Computer-based personality judgments are more accurate than those made by humans


Paper by Wu Youyou, Michal Kosinski and David Stillwell at PNAS (Proceedings of the National Academy of Sciences): “Judging others’ personalities is an essential skill in successful social living, as personality is a key driver behind people’s interactions, behaviors, and emotions. Although accurate personality judgments stem from social-cognitive skills, developments in machine learning show that computer models can also make valid judgments. This study compares the accuracy of human and computer-based personality judgments, using a sample of 86,220 volunteers who completed a 100-item personality questionnaire. We show that (i) computer predictions based on a generic digital footprint (Facebook Likes) are more accurate (r = 0.56) than those made by the participants’ Facebook friends using a personality questionnaire (r = 0.49); (ii) computer models show higher interjudge agreement; and (iii) computer personality judgments have higher external validity when predicting life outcomes such as substance use, political attitudes, and physical health; for some outcomes, they even outperform the self-rated personality scores. Computers outpacing humans in personality judgment presents significant opportunities and challenges in the areas of psychological assessment, marketing, and privacy…(More)”.

Driving Solutions To Build Smarter Cities


Uber Blogpost: “Since day one, Uber’s mission has been to improve city life by connecting people with safe, reliable, hassle-free rides through the use of technology. As we have grown, so has our ability to share information that can serve a greater good. By sharing data with municipal partners we can help cities become more liveable, resilient, and innovative.
Today, Boston joins Uber in a first-of-its-kind partnership to help expand the city’s capability to solve problems by leveraging data provided by Uber. The data will provide new insights to help manage urban growth, relieve traffic congestion, expand public transportation, and reduce greenhouse gas emissions….
Uber is committed to sharing data, compiled in a manner that protects the privacy of riders and drivers, that can help cities target solutions for their unique challenges. This initiative presents a new standard for the future development of our cities – in communities big or small we can bridge data and policy to build sophisticated solutions for a stronger society. For this effort, we will deliver anonymized trip-level data by ZIP Code Tabulation Area (ZCTA) which is the U.S. Census’ geographical representation of zip codes….

How Can This Data Help Cities?

To date, most cities have not had access to granular data describing the flows and trends of private traffic. The data provided by Uber will help policymakers and city planners develop a more detailed understanding of where people in the city need to go and how to improve traffic flows and congestion to get them there, with data-driven decisions about:

  • Vision Zero-related passenger safety policies
  • Traffic planning
  • Congestion reduction
  • Flow of residents across the City
  • Impact of events, disasters and other activities on City transportation
  • Identification of zoning changes and needs
  • Creation or reduction of parking
  • Facilitation of additional transportation solutions for marquee City initiatives

uber_SafeCities_BlogInfographic


This data can be utilized to help cities achieve their transportation and planning goals without compromising personal privacy. By helping cities understand the way their residents move, we can work together to make our communities stronger. Smart Cities can benefit from smart data and we will champion municipal efforts devoted to achieving data-driven urban growth, mobility and safety for communities (More).”

Transparency isn’t what keeps government from working


in the Washington Post: “In 2014, a number of big thinkers made the surprising claim that government openness and transparency are to blame for today’s gridlock. They have it backward: Not only is there no relationship between openness and dysfunction, but more secrecy can only add to that dysfunction.

As transparency advocates, we never take openness for granted. The latest example of the dangers of secrecy was the “cromnibus” bill, with its surprise lifting of campaign finance limits for political parties to an astonishing $3 million per couple per cycle, and its suddenly revealed watering down of Dodd-Frank’s derivatives safeguards. And in parallel to the controversy over the release of the CIA’s torture report, that agency proposed to delete e-mail from nearly all employees and contractors, destroying potential documentary evidence of wrongdoing. Openness doesn’t happen without a struggle…..

Academics, such as Francis Fuku­yama, make the case that politicians need privacy and discretion — back-door channels — to get the business of government done. “The obvious solution to this problem would be to roll back some of the would-be democratizing reforms, but no one dares suggest that what the country needs is a bit less participation and transparency,” writes Fukuyama in his newest book. At a time when voter participation is as low as during World War II, it seems strange to call for less participation and democracy. And more secrecy in Congress isn’t going to suddenly create dealmaking. The 2011 congressional “supercommittee” tasked with developing a $1.5 trillion deficit reduction deal operated almost entirely in secret. The problem wasn’t transparency or openness. Instead, as the committee’s Republican co-chairman, Jeb Hensarling, stated, the real problem was “two dramatically competing visions of the role [of] government.” These are the real issues, not openness….
We are not transparency absolutists. Not everything government and Congress do should occur in a fishbowl; that said, there is already plenty of room today for private deliberations. The problem isn’t transparency. It is that the political landscape punishes those who try to work together. And if various accountability measures create procedural challenges, let’s fix them. When it comes to holding government accountable, it is in the nation’s best interest to allow the media, nonprofit groups and the public full access to decision-making.”

Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency


at Medium: “…So why, then, does granular, social data make people uncomfortable? Well, ultimately—and at the risk of stating the obvious—it’s because data of this sort brings up issues regarding ethics, privacy, bias, fairness, and inclusion. In turn, these issues make people uncomfortable because, at least as the popular narrative goes, these are new issues that fall outside the expertise of those those aggregating and analyzing big data. But the thing is, these issues aren’t actually new. Sure, they may be new to computer scientists and software engineers, but they’re not new to social scientists.

This is why I think the world of big data and those working in it — ranging from the machine learning researchers developing new analysis tools all the way up to the end-users and decision-makers in government and industry — can learn something from computational social science….

So, if technology companies and government organizations — the biggest players in the big data game — are going to take issues like bias, fairness, and inclusion seriously, they need to hire social scientists — the people with the best training in thinking about important societal issues. Moreover, it’s important that this hiring is done not just in a token, “hire one social scientist for every hundred computer scientists” kind of way, but in a serious, “creating interdisciplinary teams” kind of kind of way.


Thanks to Moritz Hardt for the picture!

While preparing for my talk, I read an article by Moritz Hardt, entitled “How Big Data is Unfair.” In this article, Moritz notes that even in supposedly large data sets, there is always proportionally less data available about minorities. Moreover, statistical patterns that hold for the majority may be invalid for a given minority group. He gives, as an example, the task of classifying user names as “real” or “fake.” In one culture — comprising the majority of the training data — real names might be short and common, while in another they might be long and unique. As a result, the classic machine learning objective of “good performance on average,” may actually be detrimental to those in the minority group….

As an alternative, I would advocate prioritizing vital social questions over data availability — an approach more common in the social sciences. Moreover, if we’re prioritizing social questions, perhaps we should take this as an opportunity to prioritize those questions explicitly related to minorities and bias, fairness, and inclusion. Of course, putting questions first — especially questions about minorities, for whom there may not be much available data — means that we’ll need to go beyond standard convenience data sets and general-purpose “hammer” methods. Instead we’ll need to think hard about how best to instrument data aggregation and curation mechanisms that, when combined with precise, targeted models and tools, are capable of elucidating fine-grained, hard-to-see patterns….(More).”

Geneticists Begin Tests of an Internet for DNA


Antonio Regalado in MIT Technology Review: “A coalition of geneticists and computer programmers calling itself the Global Alliance for Genomics and Health is developing protocols for exchanging DNA information across the Internet. The researchers hope their work could be as important to medical science as HTTP, the protocol created by Tim Berners-Lee in 1989, was to the Web.
One of the group’s first demonstration projects is a simple search engine that combs through the DNA letters of thousands of human genomes stored at nine locations, including Google’s server farms and the University of Leicester, in the U.K. According to the group, which includes key players in the Human Genome Project, the search engine is the start of a kind of Internet of DNA that may eventually link millions of genomes together.
The technologies being developed are application program interfaces, or APIs, that let different gene databases communicate. Pooling information could speed discoveries about what genes do and help doctors diagnose rare birth defects by matching children with suspected gene mutations to others who are known to have them.
The alliance was conceived two years ago at a meeting in New York of 50 scientists who were concerned that genome data was trapped in private databases, tied down by legal consent agreements with patients, limited by privacy rules, or jealously controlled by scientists to further their own scientific work. It styles itself after the World Wide Web Consortium, or W3C, a body that oversees standards for the Web.
“It’s creating the Internet language to exchange genetic information,” says David Haussler, scientific director of the genome institute at the University of California, Santa Cruz, who is one of the group’s leaders.
The group began releasing software this year. Its hope—as yet largely unrealized—is that any scientist will be able to ask questions about genome data possessed by other laboratories, without running afoul of technical barriers or privacy rules….(More)”

The Free 'Big Data' Sources Everyone Should Know


Bernard Marr at Linkedin Pulse: “…The moves by companies and governments to put large amounts of information into the public domain have made large volumes of data accessible to everyone….here’s my rundown of some of the best free big data sources available today.

Data.gov

The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime. To check it out, click here.

US Census Bureau

A wealth of information on the lives of US citizens covering population data, geographic data and education. To check it out, click here. To check it out, click here.

European Union Open Data Portal

As the above, but based on data from European Union institutions. To check it out, click here.

Data.gov.uk

Data from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950. To check it out, click here.

The CIA World Factbook

Information on history, population, economy, government, infrastructure and military of 267 countries. To check it out, click here.

Healthdata.gov

125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics. To check it out, click here.

NHS Health and Social Care Information Centre

Health data sets from the UK National Health Service. To check it out, click here.

Amazon Web Services public datasets

Huge resource of public data, including the 1000 Genome Project, an attempt to build the most comprehensive database of human genetic information and NASA’s database of satellite imagery of Earth. To check it out, click here.

Facebook Graph

Although much of the information on users’ Facebook profile is private, a lot isn’t – Facebook provide the Graph API as a way of querying the huge amount of information that its users are happy to share with the world (or can’t hide because they haven’t worked out how the privacy settings work). To check it out, click here.

Gapminder

Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world. To check it out, click here.

Google Trends

Statistics on search volume (as a proportion of total search) for any given term, since 2004. To check it out, click here.

Google Finance

40 years’ worth of stock market data, updated in real time. To check it out, click here.

Google Books Ngrams

Search and analyze the full text of any of the millions of books digitised as part of the Google Books project. To check it out, click here.

National Climatic Data Center

Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data. To check it out, click here.

DBPedia

Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. DBPedia is an ambitious project to catalogue and create a public, freely distributable database allowing anyone to analyze this data. To check it out, click here.

Topsy

Free, comprehensive social media data is hard to come by – after all their data is what generates profits for the big players (Facebook, Twitter etc) so they don’t want to give it away. However Topsy provides a searchable database of public tweets going back to 2006 as well as several tools to analyze the conversations. To check it out, click here.

Likebutton

Mines Facebook’s public data – globally and from your own network – to give an overview of what people “Like” at the moment. To check it out, click here.

New York Times

Searchable, indexed archive of news articles going back to 1851. To check it out, click here.

Freebase

A community-compiled database of structured data about people, places and things, with over 45 million entries. To check it out, click here.

Million Song Data Set

Metadata on over a million songs and pieces of music. Part of Amazon Web Services. To check it out, click here.”
See also Bernard Marr‘s blog at Big Data Guru

Pricey privacy: Framing the economy of information in the digital age


Paper by Federica Fornaciari in FirstMonday: “As new information technologies become ubiquitous, individuals are often prompted rethinking disclosure. Available media narratives may influence one’s understanding of the benefits and costs related to sharing personal information. This study, guided by frame theory, undertakes a Critical Discourse Analysis (CDA) of media discourse developed to discuss the privacy concerns related to the corporate collection and trade of personal information. The aim is to investigate the frames — the central organizing ideas — used in the media to discuss such an important aspect of the economics of personal data. The CDA explored 130 articles published in the New York Times between 2000 and 2012. Findings reveal that the articles utilized four frames: confusion and lack of transparency, justification and private interests, law and self-regulation, and commodification of information. Articles used episodic framing often discussing specific instances of infringements rather than broader thematic accounts. Media coverage tended to frame personal information as a commodity that may be traded, rather than as a fundamental value.”

Digital Sociology


New book by Deborah Lupton: “We now live in a digital society. New digital technologies have had a profound influence on everyday life, social relations, government, commerce, the economy and the production and dissemination of knowledge. People’s movements in space, their purchasing habits and their online communication with others are now monitored in detail by digital technologies. We are increasingly becoming digital data subjects, whether we like it or not, and whether we choose this or not.
The sub-discipline of digital sociology provides a means by which the impact, development and use of these technologies and their incorporation into social worlds, social institutions and concepts of selfhood and embodiment may be investigated, analysed and understood. This book introduces a range of interesting social, cultural and political dimensions of digital society and discusses some of the important debates occurring in research and scholarship on these aspects. It covers the new knowledge economy and big data, reconceptualising research in the digital era, the digitisation of higher education, the diversity of digital use, digital politics and citizen digital engagement, the politics of surveillance, privacy issues, the contribution of digital devices to embodiment and concepts of selfhood and many other topics.”