To turn the open data revolution from idea to reality, we need more evidence


Stefaan Verhulst at apolitical: “The idea that we are living in a data age — one characterised by unprecedented amounts of information with unprecedented potential — has  become mainstream. We regularly read “data is the new oil,” or “data is the most valuable commodity in the global economy.”

Doubtlessly, there is truth in these statements. But a major, often unacknowledged problem is how much data remains inaccessible, hidden in siloes and behind walls.

For close to a decade, the technology and public interest community has pushed the idea of open data. At its core, open data represents a new paradigm of information and information access.

Rooted in notions of an information commons — developed by scholars like Nobel Prize winner Elinor Ostrom — and borrowing from the language of open source, open data begins from the premise that data collected from the public, often using public funds or publicly funded infrastructure, should also belong to the public — or at least, be made broadly accessible to those pursuing public-interest goals.

The open data movement has reached significant milestones in its short history. An ever-increasing number of governments across both developed and developing economies have released large datasets for the public’s benefit….

Similarly, a growing number of private companies have “Data Collaboratives” leveraging their data — with various degrees of limitations — to serve the public interest.

Despite such initiatives, many open data projects (and data collaboratives) remain fledgling. The field has trouble scaling projects beyond initial pilots. In addition, many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain sceptical of open data’s value. Such limitations need to be overcome if open data and its benefits are to spread. We need hard evidence of its impact.

Ironically, the field is held back by an absence of good data on open data — that is, a lack of reliable empirical evidence that could guide new initiatives.

At the GovLab, a do-tank at New York University, we study the impact of open data. One of our overarching conclusions is that we need a far more solid evidence base to move open data from being a good idea to reality.

What do we know? Several initiatives undertaken at the GovLab offer insight. Our ODImpactwebsite now includes more than 35 detailed case studies of open government data projects. These examples provide powerful evidence not only that open data can work but also about howit works….

We have also launched an Open Data Periodic Table to better understand what conditions predispose an open data project toward success or failure. For example, having a clear problem definition, as well as the capacity and culture to carry out open data projects, are vital. Successful projects also build cross-sector partnerships around open data and its potential uses and establish practices to assess and mitigate risks, and have transparent and responsive governance structures….(More)”.

The New York City Business Atlas: Leveling the Playing Field for Small Businesses with Open Data


Chapter by Stefaan Verhulst and Andrew Young in Smarter New York City:How City Agencies Innovate. Edited by André Corrêa d’Almeida: “While retail entrepreneurs, particularly those operating in the small-business space, are experts in their respective trades, they often lack access to high-quality information about social, environmental, and economic conditions in the neighborhoods where they operate or are considering operating.

The New York City Business Atlas, conceived by the Mayor’s Office of Data Analytics (MODA) and the Department of Small Business Services, is designed to alleviate that information gap by providing a public web-based tool that gives small businesses access to high-quality data to help them decide where to establish a new business or expand an existing one. e tool brings together a diversity of data, including business-fling data from the Department of Consumer Affairs, sales-tax data from the Department of Finance, demographic data from the census, and traffic data from Placemeter, a New York City startup focusing on real-time traffic information.

The initial iteration of the Business Atlas made useful and previously inaccessible data available to small-business owners and entrepreneurs in an innovative manner. After a few years, however, it became clear that the tool was not experiencing the level of use or creating the level of demonstrable impact anticipated. Rather than continuing down the same path or abandoning the effort entirely, MODA pivoted to a new approach, moving from the Business Atlas as a single information-providing tool to the Business Atlas as a suite of capabilities aimed at bolstering New York’s small-business community.

Through problem- and user-centered efforts, the Business Atlas is now making important insights available to stakeholders who can put it to meaningful use—from how long it takes to open a restaurant in the city to which areas are most in need of education and outreach to improve their code compliance. This chapter considers the open data environment from which the Business Atlas was launched, details the initial version of the Business Atlas and the lessons it generated and describes the pivot to this new approach….(More)”.

Causal mechanisms and institutionalisation of open government data in Kenya


Paper by Paul W. Mungai: “Open data—including open government data (OGD)—has become a topic of prominence during the last decade. However, most governments have not realised the desired value streams or outcomes from OGD. The Kenya Open Data Initiative (KODI), a Government of Kenya initiative, is no exception with some moments of success but also sustainability struggles. Therefore, the focus for this paper is to understand the causal mechanisms that either enable or constrain institutionalisation of OGD initiatives. Critical realism is ideally suited as a paradigm to identify such mechanisms, but guides to its operationalisation are few. This study uses the operational approach of Bygstad, Munkvold & Volkoff’s six‐step framework, a hybrid approach that melds concepts from existing critical realism models with the idea of affordances. The findings suggest that data demand and supply mechanisms are critical in institutionalising KODI and that, underpinning basic data‐related affordances, are mechanisms engaging with institutional capacity, formal policy, and political support. It is the absence of such elements in the Kenya case which explains why it has experienced significant delays…(More)”.

Is Mass Surveillance the Future of Conservation?


Mallory Picket at Slate: “The high seas are probably the most lawless place left on Earth. They’re a portal back in time to the way the world looked for most of our history: fierce and open competition for resources and contested territories. Pirating continues to be a way to make a living.

It’s not a complete free-for-all—most countries require registration of fishing vessels and enforce environmental protocols. Cooperative agreements between countries oversee fisheries in international waters. But the best data available suggests that around 20 percent of the global seafood catch is illegal. This is an environmental hazard because unregistered boats evade regulations meant to protect marine life. And it’s an economic problem for fishermen who can’t compete with boats that don’t pay for licenses or follow the (often expensive) regulations. In many developing countries, local fishermen are outfished by foreign vessels coming into their territory and stealing their stock….

But Henri Weimerskirch, a French ecologist, has a cheap, low-impact way to monitor thousands of square miles a day in real time: He’s getting birds to do it (a project first reported by Hakai). Specifically, albatross, which have a 10-foot wingspan and can fly around the world in 46 days. The birds naturally congregate around fishing boats, hoping for an easy meal, so Weimerskirch is equipping them with GPS loggers that also have radar detection to pick up the ship’s radar (and make sure it is a ship, not an island) and a transmitter to send that data to authorities in real time. If it works, this should help in two ways: It will provide some information on the extent of the unofficial fishing operation in the area, and because the logger will transmit their information in real time, the data will be used to notify French navy ships in the area to check out suspicious boats.

His team is getting ready to deploy about 80 birds in the south Indian Ocean this November.
The loggers attached around the birds’ legs are about the shape and size of a Snickers. The south Indian Ocean is a shared fishing zone, and nine countries, including France (courtesy of several small islands it claims ownership of, a vestige of colonialism), manage it together. But there are big problems with illegal fishing in the area, especially of the Patagonian toothfish (better known to consumers as Chilean seabass)….(More)”

The UK’s Gender Pay Gap Open Data Law Has Flaws, But Is A Positive Step Forward


Article by Michael McLaughlin: “Last year, the United Kingdom enacted a new regulation requiring companies to report information about their gender pay gap—a measure of the difference in average pay between men and women. The new rules are a good example of how open data can drive social change. However, the regulations have produced some misleading statistics, highlighting the importance of carefully crafting reporting requirements to ensure that they produce useful data.

In the UK, nearly 11,000 companies have filed gender pay gap reports, which include both the difference between the mean and median hourly pay rates for men and women as well the difference in bonuses. And the initial data reveals several interesting findings. Median pay for men is 11.8 percent higher than for women, on average, and nearly 87 percent of companies pay men more than women on average. In addition, over 1,000 firms had a median pay gap greater than 30 percent. The sectors with the highest pay gaps—construction, finance, and insurance—each pay men at least 20 percent more than women. A major reason for the gap is a lack of women in senior positions—UK women actually make more than men between the ages of 22-29. The total pay gap is also a result of more women holding part-time jobs.

However, as detractors note, the UK’s data can be misleading. For example, the data overstates the pay gap on bonuses because it does not adjust these figures for hours worked. More women work part-time than men, so it makes sense that women would receive less in bonus pay when they work less. The data also understates the pay gap because it excludes the high compensation of partners in organizations such as law firms, a group that includes few women. And it is important to note that—by definition—the pay gap data does not compare the wages of men and women working the same jobs, so the data says nothing about whether women receive equal pay for equal work.

Still, publication of the data has sparked an important national conversation. Google searches in the UK for the phrase “gender pay gap” experienced a 12-month high the week the regulations began enforcement, and major news sites like Financial Times have provided significant coverage of the issue by analyzing the reported data. While it is too soon to tell if the law will change employer behavior, such as businesses hiring more female executives, or employee behavior, such as women leaving companies or fields that pay less, countries with similar reporting requirements, such as Belgium, have seen the pay gap narrow following implementation of their rules.

Requiring companies to report this data to the government may be the only way to obtain gender pay gap data, because evidence suggests that the private sector will not produce this data on its own. Only 300 UK organizations joined a voluntary government program to report their gender pay gap in 2011, and as few as 11 actually published the data. Crowdsourced efforts, where women voluntary report their pay, have also suffered from incomplete data. And even complete data does not illuminate variables such as why women may work in a field that pays less….(More)”.

Long Term Info-structure


Long Now Foundation Seminar by Juan Benet: “We live in a spectacular time,”…”We’re a century into our computing phase transition. The latest stages have created astonishing powers for individuals, groups, and our species as a whole. We are also faced with accumulating dangers — the capabilities to end the whole humanity experiment are growing and are ever more accessible. In light of the promethean fire that is computing, we must prevent bad outcomes and lock in good ones to build robust foundations for our knowledge, and a safe future. There is much we can do in the short-term to secure the long-term.”

“I come from the front lines of computing platform design to share a number of new super-powers at our disposal, some old challenges that are now soluble, and some new open problems. In this next decade, we’ll need to leverage peer-to-peer networks, crypto-economics, blockchains, Open Source, Open Services, decentralization, incentive-structure engineering, and so much more to ensure short-term safety and the long-term flourishing of humanity.”

Juan Benet is the inventor of the InterPlanetary File System (IPFS)—a new protocol which uses content-addressing to make the web faster, safer, and more open—and the creator of Filecoin, a cryptocurrency-incentivized storage market….(More + Video)”

A rationale for data governance as an approach to tackle recurrent drawbacks in open data portals


Conference paper by Juan Ribeiro Reis et al: “Citizens and developers are gaining broad access to public data sources, made available in open data portals. These machine-readable datasets enable the creation of applications that help the population in several ways, giving them the opportunity to actively participate in governance processes, such as decision taking and policy-making.

While the number of open data portals grows over the years, researchers have been able to identify recurrent problems with the data they provide, such as lack of data standards, difficulty in data access and poor understandability. Such issues make difficult the effective use of data. Several works in literature propose different approaches to mitigate these issues, based on novel or well-known data management techniques.

However, there is a lack of general frameworks for tackling these problems. On the other hand, data governance has been applied in large companies to manage data problems, ensuring that data meets business needs and become organizational assets. In this paper, firstly, we highlight the main drawbacks pointed out in literature for government open data portals. Eventually, we bring around how data governance can tackle much of the issues identified…(More)”.

Open Data Use Case: Using data to improve public health


Chris Willsher at ODX: “Studies have shown that a large majority of Canadians spend too much time in sedentary activities. According to the Health Status of Canadians report in 2016, only 2 out of 10 Canadian adults met the Canadian Physical Activity Guidelines. Increasing physical activity and healthy lifestyle behaviours can reduce the risk of chronic illnesses, which can decrease pressures on our health care system. And data can play a role in improving public health.

We are already seeing examples of a push to augment the role of data, with programs recently being launched at home and abroad. Canada and the US established an initiative in the spring of 2017 called the Healthy Behaviour Data Challenge. The goal of the initiative is to open up new methods for generating and using data to monitor health, specifically in the areas of physical activity, sleep, sedentary behaviour, or nutrition. The challenge recently wrapped up with winners being announced in late April 2018. Programs such as this provide incentive to the private sector to explore data’s role in measuring healthy lifestyles and raise awareness of the importance of finding new solutions.

In the UK, Sport England and the Open Data Institute (ODI) have collaborated to create the OpenActive initiative. It has set out to encourage both government and private sector entities to unlock data around physical activities so that others can utilize this information to ease the process of engaging in an active lifestyle. The goal is to “make it as easy to find and book a badminton court as it is to book a hotel room.” As of last fall, OpenActive counted more than 76,000 activities across 1,000 locations from their partner organizations. They have also developed a standard for activity data to ensure consistency among data sources, which eases the ability for developers to work with the data. Again, this initiative serves as a mechanism for open data to help address public health issues.

In Canada, we are seeing more open datasets that could be utilized to devise new solutions for generating higher rates of physical activity. A lot of useful information is available at the municipal level that can provide specifics around local infrastructure. Plus, there is data at the provincial and federal level that can provide higher-level insights useful to developing methods for promoting healthier lifestyles.

Information about cycling infrastructure seems to be relatively widespread among municipalities with a robust open data platform. As an example, the City of Toronto, publishes map data of bicycle routes around the city. This information could be utilized in a way to help citizens find the best bike route between two points. In addition, the city also publishes data on indooroutdoor, and post and ring bicycle parking facilities that can identify where to securely lock your bike. Exploring data from proprietary sources, such as Strava, could further enhance an application by layering on popular cycling routes or allow users to integrate their personal information. And algorithms could allow for the inclusion of data on comparable driving times, projected health benefits, or savings on automotive maintenance.

The City of Calgary publishes data on park sports surfaces and recreation facilities that could potentially be incorporated into sports league applications. This would make it easier to display locations for upcoming games or to arrange pick-up games. Knowing where there are fields nearby that may be available for a last minute soccer game could be useful in encouraging use of the facilities and generating more physical activity. Again, other data sources, such as weather, could be integrated with this information to provide a planning tool for organizing these activities….(More)”.

Identifying Healthcare Fraud with Open Data


Paper by Xuan Zhang et al: “Health care fraud is a serious problem that impacts every patient and consumer. This fraudulent behavior causes excessive financial losses every year and causes significant patient harm. Healthcare fraud includes health insurance fraud, fraudulent billing of insurers for services not provided, and exaggeration of medical services, etc. To identify healthcare fraud thus becomes an urgent task to avoid the abuse and waste of public funds. Existing methods in this research field usually use classified data from governments, which greatly compromises the generalizability and scope of application. This paper introduces a methodology to use publicly available data sources to identify potentially fraudulent behavior among physicians. The research involved data pairing of multiple datasets, selection of useful features, comparisons of classification models, and analysis of useful predictors. Our performance evaluation results clearly demonstrate the efficacy of the proposed method….(More)”.

Information Asymmetries, Blockchain Technologies, and Social Change


Reflections by Stefaan Verhulst on “the potential (and challenges) of Distributed Ledgers for “Market for Lemons” Conditions: We live in a data age, and it has become common to extol the transformative power of data and information. It is now conventional to assume that many of our most pressing public problems—everything from climate change to terrorism to mass migration—are amenable to a “data fix.”

The truth, though, is a little more complicated. While there is no doubt that data—when analyzed and used responsibly—holds tremendous potential, many factors affect whether, and to what extent, that potential will ultimately be fulfilled.

Our ability to address complex public problems using data depends vitally on how our respective data ecosystems is designed (as well as ongoing questions of representation in, power over, and stewardship of these ecosystems).

Flaws in our data ecosystem that prevent us from addressing problems; may also be responsible for many societal failures and inequalities result from the fact that:

  • some actors have better access to data than others;
  • data is of poor quality (or even “fake”); contains implicit bias; and/or is not validated and thus not trusted;
  • only easily accessible data are shared and integrated (“open washing”) while important data remain carefully hidden or without resources for relevant research and analysis; and more generally that
  • even in an era of big and open data, information too often remains stove-piped, siloed, and generally difficult to access.

Several observers have pointed to the relationship between these information asymmetries and, for example, corruption, financial exclusion, global pandemics, forced mass migration, human rights abuses, and electoral fraud.

Consider the transaction costs, power inequities and other obstacles that result from such information asymmetries, namely:

–     At the individual level: too often someone who is trying to open a bank account (or sign up for new cell phone service) is unable to provide all the requisite information, such as credit history, proof of address or other confirmatory and trusted attributes of identity. As such, information asymmetries are in effect limiting this individual’s access to financial and communications services.

–     At the corporate level, a vast body of literature in economics has shown how uncertainty over the quality and trustworthiness of data can impose transaction costs, limit the development of markets for goods and services, or shut them down altogether. This is the well-known “market for lemons” problem made famous in a 1970 paper of the same name by George Akerlof.

–     At the societal or governance level, information asymmetries don’t just affect the efficiency of markets or social inequality. They can also incentivize unwanted behaviors that cause substantial public harm. Tyrants and corrupt politicians thrive on limiting their citizens’ access to information (e.g., information related to bank accounts, investment patterns or disbursement of public funds). Likewise, criminals, operate and succeed in the information-scarce corners of the underground economy.

Blockchain technologies and Information Asymmetries

This is where blockchain comes in. At their core, blockchain technologies are a new type of disclosure mechanism that have the potential to address some of the information asymmetries listed above. There are many types of blockchain technologies, and while I use the blanket term ‘blockchain’ in the below for simplicity’s sake, the nuances between different types of blockchain technologies can greatly impact the character and likelihood of success of a given initiative.

By leveraging a shared and verified database of ledgers stored in a distributed manner, blockchain seeks to redesign information ecosystems in a more transparent, immutable, and trusted manner. Solving information asymmetries may be the real potential of blockchain, and this—much more than the current hype over virtual currencies—is the real reason to assess its potential….(More)”.