Creating Value through Open Data


Press Release: “Capgemini Consulting, the global strategy and transformation consulting arm of the Capgemini Group, today published two new reports on the state of play of Open Data in Europe, to mark the launch of the European Open Data Portal. The first report addresses “Open Data Maturity in Europe 2015: Insights into the European state of play” and the second focuses on “Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources.” The countries covered by these assessments include the EU28 countries plus Iceland, Liechtenstein, Norway, and Switzerland – commonly referred to as the EU28+ countries. The reports were requested by the European Commission within the framework of the Connecting Europe Facility program, supporting the deployment of European Open Data infrastructure.

Open Data refers to the information collected, produced or paid for by public bodies and can be freely used, modified and shared by anyone.. For the period 2016-2020, the direct market size for Open Data is estimated at EUR 325 billion for Europe. Capgemini’s study “Creating Value through Open Data” illustrates how Open Data can create economic value in multiple ways including increased market transactions, job creation from producing services and products based on Open Data, to cost savings and efficiency gains. For instance, effective use of Open Data could help save 629 million hours of unnecessary waiting time on the roads in the EU; and help reduce energy consumption by 16%. The accumulated cost savings for public administrations making use of Open Data across the EU28+ in 2020 are predicted to equal 1.7 bn EUR. Reaping these benefits requires reaching a high level of Open Data maturity.

In order to address the accessibility and the value of Open Data across European countries, the European Union has launched the Beta version of the European Data Portal. The Portal addresses the whole Data Value Chain, from data publishing to data re-use. Over 240,000 data sets are referenced on the Portal and 34 European countries. It offers seamless access to public data across Europe, with over 13 content categories to categorize data, ranging from health or education to transport or even science and justice. Anyone, citizens, businesses, journalists or administrations can search, access and re-use the full data collection. A wide range of data is available, from crime records in Helsinki, labor mobility in the Netherlands, forestry maps in France to the impact of digitization in Poland…..The study, “Open Data Maturity in Europe 2015: Insights into the European state of play”, uses two key indicators: Open Data Readiness and Portal Maturity. These indicators cover both the maturity of national policies supporting Open Data as well as an assessment of the features made available on national data portals. The study shows that the EU28+ have completed just 44% of the journey towards achieving full Open Data Maturity and there are large discrepancies across countries. A third of European countries (32%), recognized globally, are leading the way with solid policies, licensing norms, good portal traffic and many local initiatives and events to promote Open Data and its re-use….(More)”

Decoding the Future for National Security


George I. Seffers at Signal: “U.S. intelligence agencies are in the business of predicting the future, but no one has systematically evaluated the accuracy of those predictions—until now. The intelligence community’s cutting-edge research and development agency uses a handful of predictive analytics programs to measure and improve the ability to forecast major events, including political upheavals, disease outbreaks, insider threats and cyber attacks.

The Office for Anticipating Surprise at the Intelligence Advanced Research Projects Activity (IARPA) is a place where crystal balls come in the form of software, tournaments and throngs of people. The office sponsors eight programs designed to improve predictive analytics, which uses a variety of data to forecast events. The programs all focus on incidents outside of the United States, and the information is anonymized to protect privacy. The programs are in different stages, some having recently ended as others are preparing to award contracts.

But they all have one more thing in common: They use tournaments to advance the state of the predictive analytic arts. “We decided to run a series of forecasting tournaments in which people from around the world generate forecasts about, now, thousands of real-world events,” says Jason Matheny, IARPA’s new director. “All of our programs on predictive analytics do use this tournament style of funding and evaluating research.” The Open Source Indicators program used a crowdsourcing technique in which people across the globe offered their predictions on such events as political uprisings, disease outbreaks and elections.

The data analyzed included social media trends, Web search queries and even cancelled dinner reservations—an indication that people are sick. “The methods applied to this were all automated. They used machine learning to comb through billions of pieces of data to look for that signal, that leading indicator, that an event was about to happen,” Matheny explains. “And they made amazing progress. They were able to predict disease outbreaks weeks earlier than traditional reporting.” The recently completed Aggregative Contingent Estimation (ACE) program also used a crowdsourcing competition in which people predicted events, including whether weapons would be tested, treaties would be signed or armed conflict would break out along certain borders. Volunteers were asked to provide information about their own background and what sources they used. IARPA also tested participants’ cognitive reasoning abilities. Volunteers provided their forecasts every day, and IARPA personnel kept score. Interestingly, they discovered the “deep domain” experts were not the best at predicting events. Instead, people with a certain style of thinking came out the winners. “They read a lot, not just from one source, but from multiple sources that come from different viewpoints. They have different sources of data, and they revise their judgments when presented with new information. They don’t stick to their guns,” Matheny reveals. …

The ACE research also contributed to a recently released book, Superforecasting: The Art and Science of Prediction, according to the IARPA director. The book was co-authored, along with Dan Gardner, by Philip Tetlock, the Annenberg University professor of psychology and management at the University of Pennsylvania who also served as a principal investigator for the ACE program. Like ACE, the Crowdsourcing Evidence, Argumentation, Thinking and Evaluation program uses the forecasting tournament format, but it also requires participants to explain and defend their reasoning. The initiative aims to improve analytic thinking by combining structured reasoning techniques with crowdsourcing.

Meanwhile, the Foresight and Understanding from Scientific Exposition (FUSE) program forecasts science and technology breakthroughs….(More)”

Tech and Innovation to Re-engage Civic Life


Hollie Russon Gilman at the Stanford Social Innovation Review: “Sometimes even the best-intentioned policymakers overlook the power of people. And even the best-intentioned discussions on social impact and leveraging big data for the social sector can obscure the power of every-day people in their communities.

But time and time again, I’ve seen the transformative power of civic engagement when initiatives are structured well. For example, the other year I witnessed a high school student walk into a school auditorium one evening during Boston’s first-ever youth-driven participatory budgeting project. Participatory budgeting gives residents a structured opportunity to work together to identify neighborhood priorities, work in tandem with government officials to draft viable projects, and prioritize projects to fund. Elected officials in turn pledge to implement these projects and are held accountable to their constituents. Initially intrigued by an experiment in democracy (and maybe the free pizza), this student remained engaged over several months, because she met new members of her community; got to interact with elected officials; and felt like she was working on a concrete objective that could have a tangible, positive impact on her neighborhood.

For many of the young participants, ages 12-25, being part of a participatory budgeting initiative is the first time they are involved in civic life. Many were excited that the City of Boston, in collaboration with the nonprofit Participatory Budgeting Project, empowered young people with the opportunity to allocate $1 million in public funds. Through participating, young people gain invaluable civic skills, and sometimes even a passion that can fuel other engagements in civic and communal life.

This is just one example of a broader civic and social innovation trend. Across the globe, people are working together with their communities to solve seemingly intractable problems, but as diverse as those efforts are, there are also commonalities. Well-structured civic engagement creates the space and provides the tools for people to exert agency over policies. When citizens have concrete objectives, access to necessary technology (whether it’s postcards, trucks, or open data portals), and an eye toward outcomes, social change happens.

Using Technology to Distribute Expertise

Technology is allowing citizens around the world to participate in solving local, national, and global problems. When it comes to large, public bureaucracies, expertise is largely top-down and concentrated. Leveraging technology creates opportunities for people to work together in new ways to solve public problems. One way is through civic crowdfunding platforms like Citizinvestor.com, which cities can use to develop public sector projects for citizen support; several cities in Rhode Island, Oregon, and Philadelphia have successfully pooled citizen resources to fund new public works. Another way is through citizen science. Old Weather, a crowdsourcing project from the National Archives and Zooniverse, enrolls people to transcribe old British ship logs to identify climate change patterns. Platforms like these allow anyone to devote a small amount of time or resources toward a broader public good. And because they have a degree of transparency, people can see the progress and impact of their efforts. ….(More)”

Political Turbulence: How Social Media Shape Collective Action


Book by Helen Margetts, Peter John, Scott Hale, & Taha Yasseri: “As people spend increasing proportions of their daily lives using social media, such as Twitter and Facebook, they are being invited to support myriad political causes by sharing, liking, endorsing, or downloading. Chain reactions caused by these tiny acts of participation form a growing part of collective action today, from neighborhood campaigns to global political movements. Political Turbulence reveals that, in fact, most attempts at collective action online do not succeed, but some give rise to huge mobilizations—even revolutions.

Drawing on large-scale data generated from the Internet and real-world events, this book shows how mobilizations that succeed are unpredictable, unstable, and often unsustainable. To better understand this unruly new force in the political world, the authors use experiments that test how social media influence citizens deciding whether or not to participate. They show how different personality types react to social influences and identify which types of people are willing to participate at an early stage in a mobilization when there are few supporters or signals of viability. The authors argue that pluralism is the model of democracy that is emerging in the social media age—not the ordered, organized vision of early pluralists, but a chaotic, turbulent form of politics.

This book demonstrates how data science and experimentation with social data can provide a methodological toolkit for understanding, shaping, and perhaps even predicting the outcomes of this democratic turbulence….(More)”

Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life


Paper by Edward L. Glaeser et al: “New, “big” data sources allow measurement of city characteristics and outcome variables higher frequencies and finer geographic scales than ever before. However, big data will not solve large urban social science questions on its own. Big data has the most value for the study of cities when it allows measurement of the previously opaque, or when it can be coupled with exogenous shocks to people or place. We describe a number of new urban data sources and illustrate how they can be used to improve the study and function of cities. We first show how Google Street View images can be used to predict income in New York City, suggesting that similar image data can be used to map wealth and poverty in previously unmeasured areas of the developing world. We then discuss how survey techniques can be improved to better measure willingness to pay for urban amenities. Finally, we explain how Internet data is being used to improve the quality of city services….(More)”

Tackling quality concerns around (volunteered) big data


University of Twente: “… Improvements in online information communication and mobile location-aware technologies have led to a dramatic increase in the amount of volunteered geographic information (VGI) in recent years. The collection of volunteered data on geographic phenomena has a rich history worldwide. For example, the Christmas Bird Count has studied the impacts of climate change on spatial distribution and population trends of selected bird species in North America since 1900. Nowadays, several citizen observatories collect information about our environment. This information is complementary or, in some cases, essential to tackle a wide range of geographic problems.

Despite the wide applicability and acceptability of VGI in science, many studies argue that the quality of the observations remains a concern. Data collected by volunteers does not often follow scientific principles of sampling design, and levels of expertise vary among volunteers. This makes it hard for scientists to integrate VGI in their research.

Low quality, inconsistent, observations can bias analysis and modelling results because they are not representative for the variable studied, or because they decrease the ratio of signal to noise. Hence, the identification of inconsistent observations clearly benefits VGI-based applications and provide more robust datasets to the scientific community.

In their paper the researchers describe a novel automated workflow to identify inconsistencies in VGI. “Leveraging a digital control mechanism means we can give value to the millions of observations collected by volunteers” and “it allows a new kind of science where citizens can directly contribute to the analysis of global challenges like climate change” say Hamed Mehdipoor and Dr. Raul Zurita-Milla, who work at the Geo-Information Processing department of ITC….

While some inconsistent observations may reflect real, unusual events, the researchers demonstrated that these observations also bias the trends (advancement rates), in this case of the date of lilac flowering onset. This shows that identifying inconsistent observations is a pre-requisite for studying and interpreting the impact of climate change on the timing of life cycle events….(More)”

How Big Data is Helping to Tackle Climate Change


Bernard Marr at DataInformed: “Climate scientists have been gathering a great deal of data for a long time, but analytics technology’s catching up is comparatively recent. Now that cloud, distributed storage, and massive amounts of processing power are affordable for almost everyone, those data sets are being put to use. On top of that, the growing number of Internet of Things devices we are carrying around are adding to the amount of data we are collecting. And the rise of social media means more and more people are reporting environmental data and uploading photos and videos of their environment, which also can be analyzed for clues.

Perhaps one of the most ambitious projects that employ big data to study the environment is Microsoft’s Madingley, which is being developed with the intention of creating a simulation of all life on Earth. The project already provides a working simulation of the global carbon cycle, and it is hoped that, eventually, everything from deforestation to animal migration, pollution, and overfishing will be modeled in a real-time “virtual biosphere.” Just a few years ago, the idea of a simulation of the entire planet’s ecosphere would have seemed like ridiculous, pie-in-the-sky thinking. But today it’s something into which one of the world’s biggest companies is pouring serious money. Microsoft is doing this because it believes that analytical technology has finally caught up with the ability to collect and store data.

Another data giant that is developing tools to facilitate analysis of climate and ecological data is EMC. Working with scientists at Acadia National Park in Maine, the company has developed platforms to pull in crowd-sourced data from citizen science portals such as eBird and iNaturalist. This allows park administrators to monitor the impact of climate change on wildlife populations as well as to plan and implement conservation strategies.

Last year, the United Nations, under its Global Pulse data analytics initiative, launched the Big Data Climate Challenge, a competition aimed to promote innovate data-driven climate change projects. Among the first to receive recognition under the program is Global Forest Watch, which combines satellite imagery, crowd-sourced witness accounts, and public datasets to track deforestation around the world, which is believed to be a leading man-made cause of climate change. The project has been promoted as a way for ethical businesses to ensure that their supply chain is not complicit in deforestation.

Other initiatives are targeted at a more personal level, for example by analyzing transit routes that could be used for individual journeys, using Google Maps, and making recommendations based on carbon emissions for each route.

The idea of “smart cities” is central to the concept of the Internet of Things – the idea that everyday objects and tools are becoming increasingly connected, interactive, and intelligent, and capable of communicating with each other independently of humans. Many of the ideas put forward by smart-city pioneers are grounded in climate awareness, such as reducing carbon dioxide emissions and energy waste across urban areas. Smart metering allows utility companies to increase or restrict the flow of electricity, gas, or water to reduce waste and ensure adequate supply at peak periods. Public transport can be efficiently planned to avoid wasted journeys and provide a reliable service that will encourage citizens to leave their cars at home.

These examples raise an important point: It’s apparent that data – big or small – can tell us if, how, and why climate change is happening. But, of course, this is only really valuable to us if it also can tell us what we can do about it. Some projects, such as Weathersafe, which helps coffee growers adapt to changing weather patterns and soil conditions, are designed to help humans deal with climate change. Others are designed to tackle the problem at the root, by highlighting the factors that cause it in the first place and showing us how we can change our behavior to minimize damage….(More)”

Public Participation Organizations and Open Policy


Paper by Helen Pallett at Science Communication: “This article builds on work in Science and Technology Studies and cognate disciplines concerning the institutionalization of public engagement and participation practices. It describes and analyses ethnographic qualitative research into one “organization of participation,” the UK government–funded Sciencewise program. Sciencewise’s interactions with broader political developments are explored, including the emergence of “open policy” as a key policy object in the UK context. The article considers what the new imaginary of openness means for institutionalized forms of public participation in science policymaking, asking whether this is illustrative of a “constitutional moment” in relations between society and science policymaking….(More)

Looking for Open Data from a different country? Try the European Data portal


Wendy Carrara in DAE blog: “The Open Data movement is reaching all countries in Europe. Data Portals give you access to re-usable government information. But have you ever tried to find Open Data from another country whose language you do not speak? Or have you tried to see whether data from one country exist also in a similar way in another? The European Data Portal that we just launched can help you….

The European Data Portal project main work streams is the development of a new pan-European open data infrastructure. Its goal is to be a gateway offering access to data published by administrations in countries across Europe, from the EU and beyond.
The portal is launched during the European Data Forum in Luxembourg.

Additionally we will support public administrations in publishing more data as open data and have targeted actions to stimulate re-use. By taking a look at the data released by other countries and made available on the European Data Portal, governments can also be inspired to publish new data sets they had not though about in the first place.

The re-use of Open Data will further boost the economy. The benefits of Open Data are diverse and range from improved performance of public administrations and economic growth in the private sector to wider social welfare. The economic studyconducted by the European Data Portal team estimates that between 2016 and 2020, the market size of Open Data is expected to increase by 36.9% to a value of 75.7 bn EUR in 2020.

For data to be re-used, it has to be accessible

Currently, the portal includes over 240.000 datasets from 34 European countries. Information about the data available is structured into thirteen different categories ranging from agriculture to transport, including science, justice, health and so on. This enables you to quickly browse through categories and feel inspired by the data made accessible….(More)”

The promise and perils of predictive policing based on big data


H. V. Jagadish in the Conversation: “Police departments, like everyone else, would like to be more effective while spending less. Given the tremendous attention to big data in recent years, and the value it has provided in fields ranging from astronomy to medicine, it should be no surprise that police departments are using data analysis to inform deployment of scarce resources. Enter the era of what is called “predictive policing.”

Some form of predictive policing is likely now in force in a city near you.Memphis was an early adopter. Cities from Minneapolis to Miami have embraced predictive policing. Time magazine named predictive policing (with particular reference to the city of Santa Cruz) one of the 50 best inventions of 2011. New York City Police Commissioner William Bratton recently said that predictive policing is “the wave of the future.”

The term “predictive policing” suggests that the police can anticipate a crime and be there to stop it before it happens and/or apprehend the culprits right away. As the Los Angeles Times points out, it depends on “sophisticated computer analysis of information about previous crimes, to predict where and when crimes will occur.”

At a very basic level, it’s easy for anyone to read a crime map and identify neighborhoods with higher crime rates. It’s also easy to recognize that burglars tend to target businesses at night, when they are unoccupied, and to target homes during the day, when residents are away at work. The challenge is to take a combination of dozens of such factors to determine where crimes are more likely to happen and who is more likely to commit them. Predictive policing algorithms are getting increasingly good at such analysis. Indeed, such was the premise of the movie Minority Report, in which the police can arrest and convict murderers before they commit their crime.

Predicting a crime with certainty is something that science fiction can have a field day with. But as a data scientist, I can assure you that in reality we can come nowhere close to certainty, even with advanced technology. To begin with, predictions can be only as good as the input data, and quite often these input data have errors.

But even with perfect, error-free input data and unbiased processing, ultimately what the algorithms are determining are correlations. Even if we have perfect knowledge of your troubled childhood, your socializing with gang members, your lack of steady employment, your wacko posts on social media and your recent gun purchases, all that the best algorithm can do is to say it is likely, but not certain, that you will commit a violent crime. After all, to believe such predictions as guaranteed is to deny free will….

What data can do is give us probabilities, rather than certainty. Good data coupled with good analysis can give us very good estimates of probability. If you sum probabilities over many instances, you can usually get a robust estimate of the total.

For example, data analysis can provide a probability that a particular house will be broken into on a particular day based on historical records for similar houses in that neighborhood on similar days. An insurance company may add this up over all days in a year to decide how much to charge for insuring that house….(More)”