Report by Rwitwika Bhattacharya and Mohitkumar Daga: “The importance of data in informing the policy-making process is being increasingly realized across the world. With India facing significant developmental challenges, use of data offers an important opportunity to improve the quality of public services. However, lack of formal structures to internalize a data-informed decision-making process impedes the path to robust policy formation. This paper seeks to highlight these challenges through a case study of data dashboard implementation in the state of Andhra Pradesh. The study suggests the importance of capacity building, improvement of data collection and engagement of non-governmental players as measures to address issues….(More)”
Twitter, UN Global Pulse announce data partnership
PressRelease: “Twitter and UN Global Pulse today announced a partnership that will provide the United Nations with access to Twitter’s data tools to support efforts to achieve the Sustainable Development Goals, which were adopted by world leaders last year.
Every day, people around the world send hundreds of millions of Tweets in dozens of languages. This public data contains real-time information on many issues including the cost of food, availability of jobs, access to health care, quality of education, and reports of natural disasters. This partnership will allow the development and humanitarian agencies of the UN to turn these social conversations into actionable information to aid communities around the globe.
“The Sustainable Development Goals are first and foremost about people, and Twitter’s unique data stream can help us truly take a real-time pulse on priorities and concerns — particularly in regions where social media use is common — to strengthen decision-making. Strong public-private partnerships like this show the vast potential of big data to serve the public good,” said Robert Kirkpatrick, Director of UN Global Pulse.
“We are incredibly proud to partner with the UN in support of the Sustainable Development Goals,” said Chris Moody, Twitter’s VP of Data Services. “Twitter data provides a live window into the public conversations that communities around the world are having, and we believe that the increased potential for research and innovation through this partnership will further the UN’s efforts to reach the Sustainable Development Goals.”
Organizations and business around the world currently use Twitter data in many meaningful ways, and this unique data source enables them to leverage public information at scale to better inform their policies and decisions. These partnerships enable innovative uses of Twitter data, while protecting the privacy and safety of Twitter users.
UN Global Pulse’s new collaboration with Twitter builds on existing R&D that has shown the power of social media for social impact, like measuring the impact of public health campaigns, tracking reports of rising food prices, or prioritizing needs after natural disasters….(More)”
Beware of the gaps in Big Data
Edd Gent at E&T: “When the municipal authority in charge of Boston, Massachusetts, was looking for a smarter way to find which roads it needed to repair, it hit on the idea of crowdsourcing the data. The authority released a mobile app called Street Bump in 2011 that employed an elegantly simple idea: use a smartphone’s accelerometer to detect jolts as cars go over potholes and look up the location using the Global Positioning System. But the approach ran into a pothole of its own.The system reported a disproportionate number of potholes in wealthier neighbourhoods. It turned out it was oversampling the younger, more affluent citizens who were digitally clued up enough to download and use the app in the first place. The city reacted quickly, but the incident shows how easy it is to develop a system that can handle large quantities of data but which, through its own design, is still unlikely to have enough data to work as planned.
As we entrust more of our lives to big data analytics, automation problems like this could become increasingly common, with their errors difficult to spot after the fact. Systems that ‘feel like they work’ are where the trouble starts.
Harvard University professor Gary King, who is also founder of social media analytics company Crimson Hexagon, recalls a project that used social media to predict unemployment. The model was built by correlating US unemployment figures with the frequency that people used words like ‘jobs’, ‘unemployment’ and ‘classifieds’. A sudden spike convinced researchers they had predicted a big rise in joblessness, but it turned out Steve Jobs had died and their model was simply picking up posts with his name. “This was an example of really bad analytics and it’s even worse because it’s the kind of thing that feels like it should work and does work a little bit,” says King.
Big data can shed light on areas with historic information deficits, and systems that seem to automatically highlight the best course of action can be seductive for executives and officials. “In the vacuum of no decision any decision is attractive,” says Jim Adler, head of data at Toyota Research Institute in Palo Alto. “Policymakers will say, ‘there’s a decision here let’s take it’, without really looking at what led to it. Was the data trustworthy, clean?”…(More)”
Data Love: The Seduction and Betrayal of Digital Technologies
Book by Roberto Simanowski: “Intelligence services, government administrations, businesses, and a growing majority of the population are hooked on the idea that big data can reveal patterns and correlations in everyday life. Initiated by software engineers and carried out through algorithms, the mining of big data has sparked a silent revolution. But algorithmic analysis and data mining are not simply byproducts of media development or the logical consequences of computation. They are the radicalization of the Enlightenment’s quest for knowledge and progress. Data Love argues that the “cold civil war” of big data is taking place not among citizens or between the citizen and government but within each of us.
Roberto Simanowski elaborates on the changes data love has brought to the human condition while exploring the entanglements of those who―out of stinginess, convenience, ignorance, narcissism, or passion―contribute to the amassing of ever more data about their lives, leading to the statistical evaluation and individual profiling of their selves. Writing from a philosophical standpoint, Simanowski illustrates the social implications of technological development and retrieves the concepts, events, and cultural artifacts of past centuries to help decode the programming of our present….(More)”
National Transit Map Seeks to Close the Transit Data Gap
Ben Miller at GovTech: “In bringing together the first ever map illustrating the nation’s transit system, the U.S. Department of Transportation isn’t just making data more accessible — it’s also aiming to modernize data collection and dissemination for many of the country’s transit agencies.
With more than 10,000 routes and 98,000 stops represented, the National Transit Map is already enormous. But Dan Morgan, chief data officer of the department, says it’s not enough. When measuring vehicles operated in maximum service — a metric illustrating peak service at a transit agency — the National Transit Map captures only about half of all transit in the U.S.
“Not all of these transit agencies have this data available,” Morgan said, “so this is an ongoing project to really close the transit data gap.”Which is why, in the process of building out the map, the DOT is working with transit agencies to make their data available.
Which is why, in the process of building out the map, the DOT is working with transit agencies to make their data available.
On the whole, transit data is easier to collect and process than a lot of transportation data because many agencies have adopted a standard called General Transit Feed Specification (GTFS) that applies to schedule-related data. That’s what made the National Transit Map an easy candidate for completion, Morgan said.
But as popular as GTFS has become, many agencies — especially smaller ones — haven’t been able to use it. The tools to convert to GTFS come with a learning curve.
“It’s really a matter of priority and availability of resources,” he said.
Bringing those agencies into the mainstream is important to achieving the goals of the map. In the map, Morgan said he sees an opportunity to achieve a new level of clarity where it has never existed before.
That’s because transit has long suffered from difficulty in seeing its own history. Transit officials can describe their systems as they exist, but looking at how they got there is trickier.
“There’s no archive,” Morgan said, “there’s no picture of how transit changes over time.”
And that’s a problem for assessing what works and what doesn’t, for understanding why the system operates the way it does and how it responds to changes. …(More)”
Recent Developments in Open Data Policy
Presentation by Paul Uhlir: “Several International organizations have issued policy statements on open data policies in the past two years. This presentation provides an overview of those statements and their relevance to developing countries.
International Statements on Open Data Policy
Open data policies have become much more supported internationally in recent years. Policy statements in just the most recent 2014-2016 period that endorse and promote openness to research data derived from public funding include: the African Data Consensus (UNECA 2014); the CODATA Nairobi Principles for Data Sharing for Science and Development in Developing Countries (PASTD 2014); the Hague Declaration on Knowledge Discovery in the Digital Age (LIBER 2014); Policy Guidelines for Open Access and Data Dissemination and Preservation (RECODE 2015); Accord on Open Data in a Big Data World (Science International 2015). This presentation will present the principal guidelines of these policy statements.
The Relevance of Open Data from Publicly Funded Research for Development
There are many reasons that publicly funded research data should be made as freely and openly available as possible. Some of these are noted here, although many other benefits are possible. For research, it is closing the gap with more economically developed countries, making researchers more visible on the web, enhancing their collaborative potential, and linking them globally. For educational benefits, open data assists greatly in helping students learn how to do data science and to manage data better. From a socioeconomic standpoint, open data policies have been shown to enhance economic opportunities and to enable citizens to improve their lives in myriad ways. Such policies are more ethical in allowing access to those that have no means to pay and not having to pay for the data twice—once through taxes to create the data in the first place and again at the user level . Finally, access to factual data can improve governance, leading to better decision making by policymakers, improved oversight by constituents, and digital repatriation of objects held by former colonial powers.
Some of these benefits are cited directly in the policy statements themselves, while others are developed more fully in other documents (Bailey Mathae and Uhlir 2012, Uhlir 2015). Of course, not all publicly funded data and information can be made available and there are appropriate reasons—such as the protection of national security, personal privacy, commercial concerns, and confidentiality of all kinds—that make the withholding of them legal and ethical. However, the default rule should be one of openness, balanced against a legitimate reason not to make the data public….(More)”
Law in the Future
Paper by Benjamin Alarie, Anthony Niblett and Albert Yoon: “The set of tasks and activities in which humans are strictly superior to computers is becoming vanishingly small. Machines today are not only performing mechanical or manual tasks once performed by humans, they are also performing thinking tasks, where it was long believed that human judgment was indispensable. From self-driving cars to self-flying planes; and from robots performing surgery on a pig to artificially intelligent personal assistants, so much of what was once unimaginable is now reality. But this is just the beginning of the big data and artificial intelligence revolution. Technology continues to improve at an exponential rate. How will the big data and artificial intelligence revolutions affect law? We hypothesize that the growth of big data, artificial intelligence, and machine learning will have important effects that will fundamentally change the way law is made, learned, followed, and practiced. It will have an impact on all facets of the law, from the production of micro-directives to the way citizens learn of their legal obligations. These changes will present significant challenges to human lawmakers, judges, and lawyers. While we do not attempt to address all these challenges, we offer a short and positive preview of the future of law: a world of self-driving law, of legal singularity, and of the democratization of the law…(More)”
Data Driven Governments: Creating Value Through Open Government Data
Big Data and Public Policy: Can It Succeed Where E-Participation Has Failed?
Jonathan Bright and Helen Margetts at Policy & Society: “This editorial introduces a special issue resulting from a panel on Internet and policy organized by the Oxford Internet Institute (University of Oxford) at the 2015 International Conference on Public Policy (ICPP) held in Milan. Two main themes emerged from the panel: the challenges of high cost and low participation which many e-participation initiatives have faced; and the potential Big Data seems to hold for remedying these problems. This introduction briefly presents these themes and links them to the papers in the issue. It argues that Big Data can fix some of the problems typically encountered by e-participation initiatives: it can offer a solution to the problem of low turnout which is furthermore accessible to government bodies even if they have low levels of financial resources. However, the use of Big Data in this way is also a radically different approach to the problem of involving citizens in policymaking; and the editorial concludes by reflecting on the significance of this for the policymaking process….(More)”
“Big Data Europe” addresses societal challenges with data technologies
Press Release: “Across society, from health to agriculture and transport, from energy to climate change and security, practitioners in every discipline recognise the potential of the enormous amounts of data being created every day. The challenge is to capture, manage and process that information to derive meaningful results and make a difference to people’s lives. The Big Data Europe project has just released the first public version of its open source platform designed to do just that. In 7 pilot studies, it is helping to solve societal challenges by putting cutting edge technology in the hands of experts in fields other than IT.
Although many crucial big data technologies are freely available as open source software, they are often difficult for non-experts to integrate and deploy. Big Data Europe solves that problem by providing a package that can readily be installed locally or at any scale in a cloud infrastructure by a systems administrator, and configured via a simple user interface. Tools like Apache Hadoop, Apache Spark, Apache Flink and many others can be instantiated easily….
The tools included in the platform were selected after a process of requirements-gathering across the seven societal challenges identified by the European Commission (Health, Food, Energy, Transport, Climate, Social Sciences and Security). Tasks like message passing are handled using Kafka and Flume, storage by Hive and Cassandra, or publishing through geotriples. The platform uses the Docker system to make it easy to add new tools and, again, for them to operate at a scale limited only by the computing infrastructure….
See also the installation instructions, Getting Started and video.”