A rationale for data governance as an approach to tackle recurrent drawbacks in open data portals


Conference paper by Juan Ribeiro Reis et al: “Citizens and developers are gaining broad access to public data sources, made available in open data portals. These machine-readable datasets enable the creation of applications that help the population in several ways, giving them the opportunity to actively participate in governance processes, such as decision taking and policy-making.

While the number of open data portals grows over the years, researchers have been able to identify recurrent problems with the data they provide, such as lack of data standards, difficulty in data access and poor understandability. Such issues make difficult the effective use of data. Several works in literature propose different approaches to mitigate these issues, based on novel or well-known data management techniques.

However, there is a lack of general frameworks for tackling these problems. On the other hand, data governance has been applied in large companies to manage data problems, ensuring that data meets business needs and become organizational assets. In this paper, firstly, we highlight the main drawbacks pointed out in literature for government open data portals. Eventually, we bring around how data governance can tackle much of the issues identified…(More)”.

The economic value of data: discussion paper


HM Treasury (UK): “Technological change has radically increased both the volume of data in the economy, and our ability to process it. This change presents an opportunity to transform our economy and society for the better.

Data-driven innovation holds the keys to addressing some of the most significant challenges confronting modern Britain, whether that is tackling congestion and improving air quality in our cities, developing ground-breaking diagnosis systems to support our NHS, or making our businesses more productive.

The UK’s strengths in cutting-edge research and the intangible economy make it well-placed to be a world leader, and estimates suggest that data-driven technologies will contribute over £60 billion per year to the UK economy by 2020.1 Recent events have raised public questions and concerns about the way that data, and particularly personal data, can be collected, processed, and shared with third party organisations.

These are concerns that this government takes seriously. The Data Protection Act 2018 updates the UK’s world-leading data protection framework to make it fit for the future, giving individuals strong new rights over how their data is used. Alongside maintaining a secure, trusted data environment, the government has an important role to play in laying the foundations for a flourishing data-driven economy.

This means pursuing policies that improve the flow of data through our economy, and ensure that those companies who want to innovate have appropriate access to high-quality and well-maintained data.

This discussion paper describes the economic opportunity presented by data-driven innovation, and highlights some of the key challenges that government will need to address, such as: providing clarity around ownership and control of data; maintaining a strong, trusted data protection framework; making effective use of public sector data; driving interoperability and standards; and enabling safe, legal and appropriate data sharing.

Over the last few years, the government has taken significant steps to strengthen the UK’s position as a world leader in data-driven innovation, including by agreeing the Artificial Intelligence Sector Deal, establishing the Geospatial Commission, and making substantial investments in digital skills. The government will build on those strong foundations over the coming months, including by commissioning an Expert Panel on Competition in Digital Markets. This Expert Panel will support the government’s wider review of competition law by considering how competition policy can better enable innovation and support consumers in the digital economy.

There are still big questions to be answered. This document marks the beginning of a wider set of conversations that government will be holding over the coming year, as we develop a new National Data Strategy….(More)”.

Reclaiming the Smart City: Personal Data, Trust and the New Commons


Report by Theo Bass, Emma Sutherland and Tom Symons: “Cities are becoming a major focal point in the personal data economy. In city governments, there is a clamour for data-informed approaches to everything from waste management and public transport through to policing and emergency response

This is a triumph for advocates of the better use of data in how we run cities. After years of making the case, there is now a general acceptance that social, economic and environmental pressures can be better responded to by harnessing data.

But as that argument is won, a fresh debate is bubbling up under the surface of the glossy prospectus of the smart city: who decides what we do with all this data, and how do we ensure that its generation and use does not result in discrimination, exclusion and the erosion of privacy for citizens?

This report brings together a range of case studies featuring cities which have pioneered innovative practices and policies around the responsible use of data about people. Our methods combined desk research and over 20 interviews with city administrators in a number of cities across the world.

Recommendations

Based on our case studies, we also compile a range of lessons that policymakers can use to build an alternative version to the smart city – one which promotes ethical data collection practices and responsible innovation with new technologies:

  1. Build consensus around clear ethical principles, and translate them into practical policies.
  2. Train public sector staff in how to assess the benefits and risks of smart technologies.
  3. Look outside the council for expertise and partnerships, including with other city governments.
  4. Find and articulate the benefits of privacy and digital ethics to multiple stakeholders
  5. Become a test-bed for new services that give people more privacy and control.
  6. Make time and resources available for genuine public engagement on the use of surveillance technologies.
  7. Build digital literacy and make complex or opaque systems more understandable and accountable.
  8. Find opportunities to involve citizens in the process of data collection and analysis from start to finish….(More)”.

To Better Predict Traffic, Look to the Electric Grid


Linda Poon at CityLab: “The way we consume power after midnight can reveal how we bad the morning rush hour will be….

Commuters check Google Maps for traffic updates the same way they check the weather app for rain predictions. And for good reasons: By pooling information from millions of drivers already on the road, Google can paint an impressively accurate real-time portrait of congestion. Meanwhile, historical numbers can roughly predict when your morning commutes may be particularly bad.

But “the information we extract from traffic data has been exhausted,” said Zhen (Sean) Qian, who directs the Mobility Data Analytics Center at Carnegie Mellon University. He thinks that to more accurately predict how gridlock varies from day to day, there’s a whole other set of data that cities haven’t mined yet: electricity use.

“Essentially we all use the urban system—the electricity, water, the sewage system and gas—and when people use them and how heavily they do is correlated to the way they use the transportation system,” he said. How we use electricity at night, it turns out, can reveal when we leave for work the next day. “So we might be able to get new information that helps explain travel time one or two hours in advance by having a better understanding of human activity.”

 In a recent study in the journal Transportation Research Part C, Qian and his student Pinchao Zhang used 2014 data to demonstrate how electricity usage patterns can predict when peak congestion begins on various segments of a major highway in Austin, Texas—the 14th most congested city in the U.S. They crunched 79 days worth of electricity usage data for 322 households (stripped of all private information, including location), feeding it into a machine learning algorithm that then categorized the households into 10 groups according to the time and amount of electricity use between midnight and 6 a.m. By extrapolating the most critical traffic-related information about each group for each day, the model then predicted what the commute may look like that morning.
When compared with 2014 traffic data, they found that 8 out of the 10 patterns had an impact on highway traffic. Households that show a spike of electricity use from midnight to 2 a.m., for example, may be night owls who sleep in, leave late, and likely won’t contribute to the early morning congestion. In contrast, households that report low electricity use from midnight to 5 a.m., followed by a rise after 5:30 a.m., could be early risers who will be on the road during rush hour. If the researchers’ model detects more households falling into the former group, it might predict that peak congestion will start closer to, say, 7:45 a.m. rather than the usual 7:30….(More)”.

What’s Wrong with Public Policy Education


Francis Fukuyama at the American Interest: “Most programs train students to become capable policy analysts, but with no understanding of how to implement those policies in the real world…Public policy education is ripe for an overhaul…

Public policy education in most American universities today reflects a broader problem in the social sciences, which is the dominance of economics. Most programs center on teaching students a battery of quantitative methods that are useful in policy analysis: applied econometrics, cost-benefit analysis, decision analysis, and, most recently, use of randomized experiments for program evaluation. Many schools build their curricula around these methods rather than the substantive areas of policy such as health, education, defense, criminal justice, or foreign policy. Students come out of these programs qualified to be policy analysts: They know how to gather data, analyze it rigorously, and evaluate the effectiveness of different public policy interventions. Historically, this approach started with the Rand Graduate School in the 1970s (which has subsequently undergone a major re-thinking of its approach).

There is no question that these skills are valuable and should be part of a public policy education.  The world has undergone a revolution in recent decades in terms of the role of evidence-based policy analysis, where policymakers can rely not just on anecdotes and seat-of-the-pants assessments, but statistically valid inferences that intervention X is likely to result in outcome Y, or that the millions of dollars spent on policy Z has actually had no measurable impact. Evidence-based policymaking is particularly necessary in the age of Donald Trump, amid the broad denigration of inconvenient facts that do not suit politicians’ prior preferences.

But being skilled in policy analysis is woefully inadequate to bring about policy change in the real world. Policy analysis will tell you what the optimal policy should be, but it does not tell you how to achieve that outcome.

The world is littered with optimal policies that don’t have a snowball’s chance in hell of being adopted. Take for example a carbon tax, which a wide range of economists and policy analysts will tell you is the most efficient way to abate carbon emissions, reduce fossil fuel dependence, and achieve a host of other desired objectives. A carbon tax has been a nonstarter for years due to the protestations of a range of interest groups, from oil and chemical companies to truckers and cabbies and ordinary drivers who do not want to pay more for the gas they use to commute to work, or as inputs to their industrial processes. Implementing a carbon tax would require a complex strategy bringing together a coalition of groups that are willing to support it, figuring out how to neutralize the die-hard opponents, and convincing those on the fence that the policy would be a good, or at least a tolerable, thing. How to organize such a coalition, how to communicate a winning message, and how to manage the politics on a state and federal level would all be part of a necessary implementation strategy.

It is entirely possible that an analysis of the implementation strategy, rather than analysis of the underlying policy, will tell you that the goal is unachievable absent an external shock, which might then mean changing the scope of the policy, rethinking its objectives, or even deciding that you are pursuing the wrong objective.

Public policy education that sought to produce change-makers rather than policy analysts would therefore have to be different.  It would continue to teach policy analysis, but the latter would be a small component embedded in a broader set of skills.

The first set of skills would involve problem definition. A change-maker needs to query stakeholders about what they see as the policy problem, understand the local history, culture, and political system, and define a problem that is sufficiently narrow in scope that it can plausibly be solved.

At times reformers start with a favored solution without defining the right problem. A student I know spent a summer working at an NGO in India advocating use of electric cars in the interest of carbon abatement. It turns out, however, that India’s reliance on coal for marginal electricity generation means that more carbon would be put in the air if the country were to switch to electric vehicles, not less, so the group was actually contributing to the problem they were trying to solve….

The second set of skills concerns solutions development. This is where traditional policy analysis comes in: It is important to generate data, come up with a theory of change, and posit plausible options by which reformers can solve the problem they have set for themselves. This is where some ideas from product design, like rapid prototyping and testing, may be relevant.

The third and perhaps most important set of skills has to do with implementation. This begins necessarily with stakeholder analysis: that is, mapping of actors who are concerned with the particular policy problem, either as supporters of a solution, or opponents who want to maintain the status quo. From an analysis of the power and interests of the different stakeholders, one can begin to build coalitions of proponents, and think about strategies for expanding the coalition and neutralizing those who are opposed.  A reformer needs to think about where resources can be obtained, and, very critically, how to communicate one’s goals to the stakeholder audiences involved. Finally comes testing and evaluation—not in the expectation that there will be a continuous and rapid iterative process by which solutions are tried, evaluated, and modified. Randomized experiments have become the gold standard for program evaluation in recent years, but their cost and length of time to completion are often the enemies of rapid iteration and experimentation….(More) (see also http://canvas.govlabacademy.org/).

From Code to Cure


David J. Craig at Columbia Magazine: “Armed with enormous amounts of clinical data, teams of computer scientists, statisticians, and physicians are rewriting the rules of medical research….

The deluge is upon us.

We are living in the age of big data, and with every link we click, every message we send, and every movement we make, we generate torrents of information.

In the past two years, the world has produced more than 90 percent of all the digital data that has ever been created. New technologies churn out an estimated 2.5 quintillion bytes per day. Data pours in from social media and cell phones, weather satellites and space telescopes, digital cameras and video feeds, medical records and library collections. Technologies monitor the number of steps we walk each day, the structural integrity of dams and bridges, and the barely perceptible tremors that indicate a person is developing Parkinson’s disease. These are the building blocks of our knowledge economy.

This tsunami of information is also providing opportunities to study the world in entirely new ways. Nowhere is this more evident than in medicine. Today, breakthroughs are being made not just in labs but on laptops, as biomedical researchers trained in mathematics, computer science, and statistics use powerful new analytic tools to glean insights from enormous data sets and help doctors prevent, treat, and cure disease.

“The medical field is going through a major period of transformation, and many of the changes are driven by information technology,” says George Hripcsak ’85PS,’00PH, a physician who chairs the Department of Biomedical Informatics at Columbia University Irving Medical Center (CUIMC). “Diagnostic techniques like genomic screening and high-resolution imaging are generating more raw data than we’ve ever handled before. At the same time, researchers are increasingly looking outside the confines of their own laboratories and clinics for data, because they recognize that by analyzing the huge streams of digital information now available online they can make discoveries that were never possible before.” …

Consider, for example, what the young computer scientist has been able to accomplish in recent years by mining an FDA database of prescription-drug side effects. The archive, which contains millions of reports of adverse drug reactions that physicians have observed in their patients, is continuously monitored by government scientists whose job it is to spot problems and pull drugs off the market if necessary. And yet by drilling down into the database with his own analytic tools, Tatonetti has found evidence that dozens of commonly prescribed drugs may interact in dangerous ways that have previously gone unnoticed. Among his most alarming findings: the antibiotic ceftriaxone, when taken with the heartburn medication lansoprazole, can trigger a type of heart arrhythmia called QT prolongation, which is known to cause otherwise healthy people to suddenly drop dead…(More)”

Our misguided love affair with techno-politics


The Economist: “What might happen if technology, which smothers us with its bounty as consumers, made the same inroads into politics? Might data-driven recommendations suggest “policies we may like” just as Amazon recommends books? Would we swipe right to pick candidates in elections or answers in referendums? Could businesses expand into every cranny of political and social life, replete with ® and ™ at each turn? What would this mean for political discourse and individual freedom?

This dystopian yet all-too-imaginable world has been conjured up by Giuseppe Porcaro in his novel “Disco Sour”. The story takes place in the near future, after a terrible war and breakdown of nations, when the (fictional) illegitimate son of Roman Polanski creates an app called Plebiscitum that works like Tinder for politics.

Mr Porcaro—who comes armed with a doctorate in political geography—uses the plot to consider questions of politics in the networked age. The Economist’s Open Future initiative asked him to reply to five questions in around 100 words each. An excerpt from the book appears thereafter.

*     *     *

The Economist: In your novel, an entrepreneur attempts to replace elections with an app that asks people to vote on individual policies. Is that science fiction or prediction? And were you influenced by Italy’s Five Star Movement?

Giuseppe Porcaro: The idea of imagining a Tinder-style app replacing elections came up because I see connections between the evolution of dating habits and 21st-century politics. A new sort of “tinderpolitics” kicking in when instant gratification substitutes substantial participation. Think about tweet trolling, for example.

Italy’s Five Star Movement was certainly another inspiration as it is has been a pioneer in using an online platform to successfully create a sort of new political mass movement. Another one was an Australian political party called Flux. They aim to replace the world’s elected legislatures with a new system known as issue-based direct democracy.

The Economist: Is it too cynical to suggest that a more direct relationship between citizens and policymaking would lead to a more reactionary political landscape? Or does the ideal of liberal democracy depend on an ideal citizenry that simply doesn’t exist?  

Mr Porcaro: It would be cynical to put the blame on citizens for getting too close to influence decision-making. That would go against the very essence of the “liberal democracy ideal”. However, I am critical towards the pervasive idea that technology can provide quick fixes to bridge the gap between citizens and the government. By applying computational thinking to democracy, an extreme individualisation and instant participation, we forget democracy is not simply the result of an election or the mathematical sum of individual votes. Citizens risk entering a vicious circle where reactionary politics are easier to go through.

The Economist: Modern representative democracy was in some ways a response to the industrial revolution. If AI and automation radically alter the world we live in, will we have to update the way democracy works too—and if so, how? 

Mr Porcaro: Democracy has already morphed several times. 19th century’s liberal democracy was shaken by universal suffrage, and adapted to the Fordist mode of production with the mass party. May 1968 challenged that model. Today, the massive availability of data and the increasing power of decision-making algorithms will change both political institutions.

The policy “production” process might be utterly redesigned. Data collected by devices we use on a daily basis (such as vehicles, domestic appliances and wearable sensors) will provide evidence about the drivers of personal voting choices, or the accountability of government decisions. …(More)

Big Data Is Getting Bigger. So Are the Privacy and Ethical Questions.


Goldie Blumenstyk at The Chronicle of Higher Education: “…The next step in using “big data” for student success is upon us. It’s a little cool. And also kind of creepy.

This new approach goes beyond the tactics now used by hundreds of colleges, which depend on data collected from sources like classroom teaching platforms and student-information systems. It not only makes a technological leap; it also raises issues around ethics and privacy.

Here’s how it works: Whenever you log on to a wireless network with your cellphone or computer, you leave a digital footprint. Move from one building to another while staying on the same network, and that network knows how long you stayed and where you went. That data is collected continuously and automatically from the network’s various nodes.

Now, with the help of a company called Degree Analytics, a few colleges are beginning to use location data collected from students’ cellphones and laptops as they move around campus. Some colleges are using it to improve the kind of advice they might send to students, like a text-message reminder to go to class if they’ve been absent.

Others see it as a tool for making decisions on how to use their facilities. St. Edward’s University, in Austin, Tex., used the data to better understand how students were using its computer-equipped spaces. It found that a renovated lounge, with relatively few computers but with Wi-Fi access and several comfy couches, was one of the most popular such sites on campus. Now the university knows it may not need to buy as many computers as it once thought.

As Gary Garofalo, a co-founder and chief revenue officer of Degree Analytics, told me, “the network data has very intriguing advantages” over the forms of data that colleges now collect.

Some of those advantages are obvious: If you’ve got automatic information on every person walking around with a cellphone, your dataset is more complete than if you need to extract it from a learning-management system or from the swipe-card readers some colleges use to track students’ activities. Many colleges now collect such data to determine students’ engagement with their coursework and campus activities.

Of course, the 24-7 reporting of the data is also what makes this approach seem kind of creepy….

I’m not the first to ask questions like this. A couple of years ago, a group of educators organized by Martin Kurzweil of Ithaka S+R and Mitchell Stevens of Stanford University issued a series of guidelines for colleges and companies to consider as they began to embrace data analytics. Among other principles, the guidelines highlighted the importance of being transparent about how the information is used, and ensuring that institutions’ leaders really understand what companies are doing with the data they collect. Experts at New America weighed in too.

I asked Kurzweil what he makes of the use of Wi-Fi information. Location tracking tends toward the “dicey” side of the spectrum, he says, though perhaps not as far out as using students’ social-media habits, health information, or what they check out from the library. The fundamental question, he says, is “how are they managing it?”… So is this the future? Benz, at least, certainly hopes so. Inspired by the Wi-Fi-based StudentLife research project at Dartmouth College and the experiences Purdue University is having with students’ use of its Forecast app, he’s in talks now with a research university about a project that would generate other insights that might be gleaned from students’ Wi-Fi-usage patterns….(More)

Open Data Use Case: Using data to improve public health


Chris Willsher at ODX: “Studies have shown that a large majority of Canadians spend too much time in sedentary activities. According to the Health Status of Canadians report in 2016, only 2 out of 10 Canadian adults met the Canadian Physical Activity Guidelines. Increasing physical activity and healthy lifestyle behaviours can reduce the risk of chronic illnesses, which can decrease pressures on our health care system. And data can play a role in improving public health.

We are already seeing examples of a push to augment the role of data, with programs recently being launched at home and abroad. Canada and the US established an initiative in the spring of 2017 called the Healthy Behaviour Data Challenge. The goal of the initiative is to open up new methods for generating and using data to monitor health, specifically in the areas of physical activity, sleep, sedentary behaviour, or nutrition. The challenge recently wrapped up with winners being announced in late April 2018. Programs such as this provide incentive to the private sector to explore data’s role in measuring healthy lifestyles and raise awareness of the importance of finding new solutions.

In the UK, Sport England and the Open Data Institute (ODI) have collaborated to create the OpenActive initiative. It has set out to encourage both government and private sector entities to unlock data around physical activities so that others can utilize this information to ease the process of engaging in an active lifestyle. The goal is to “make it as easy to find and book a badminton court as it is to book a hotel room.” As of last fall, OpenActive counted more than 76,000 activities across 1,000 locations from their partner organizations. They have also developed a standard for activity data to ensure consistency among data sources, which eases the ability for developers to work with the data. Again, this initiative serves as a mechanism for open data to help address public health issues.

In Canada, we are seeing more open datasets that could be utilized to devise new solutions for generating higher rates of physical activity. A lot of useful information is available at the municipal level that can provide specifics around local infrastructure. Plus, there is data at the provincial and federal level that can provide higher-level insights useful to developing methods for promoting healthier lifestyles.

Information about cycling infrastructure seems to be relatively widespread among municipalities with a robust open data platform. As an example, the City of Toronto, publishes map data of bicycle routes around the city. This information could be utilized in a way to help citizens find the best bike route between two points. In addition, the city also publishes data on indooroutdoor, and post and ring bicycle parking facilities that can identify where to securely lock your bike. Exploring data from proprietary sources, such as Strava, could further enhance an application by layering on popular cycling routes or allow users to integrate their personal information. And algorithms could allow for the inclusion of data on comparable driving times, projected health benefits, or savings on automotive maintenance.

The City of Calgary publishes data on park sports surfaces and recreation facilities that could potentially be incorporated into sports league applications. This would make it easier to display locations for upcoming games or to arrange pick-up games. Knowing where there are fields nearby that may be available for a last minute soccer game could be useful in encouraging use of the facilities and generating more physical activity. Again, other data sources, such as weather, could be integrated with this information to provide a planning tool for organizing these activities….(More)”.

Predicting Public Interest Issue Campaign Participation on Social Media


Jungyun Won, Linda Hon, Ah Ram Lee in the Journal of Public Interest Communication: “This study investigates what motivates people to participate in a social media campaign in the context of animal protection issues.

Structural equation modeling (SEM) tested a proposed research model with survey data from 326 respondents.

Situational awareness, participation benefits, and social ties influence were positive predictors of social media campaign participation intentions. Situational awareness also partially mediates the relationship between participation benefits and participation intentions as well as strong ties influence and participation intentions.

When designing social media campaigns, public interest communicators should raise situational awareness and emphasize participation benefits. Messages shared through social networks, especially via strong ties, also may be more effective than those posted only on official websites or social networking sites (SNSs)….(More)”.