Assessing the Returns on Investment in Data Openness and Transparency


Paper by Megumi Kubota and Albert Zeufack: “This paper investigates the potential benefits for a country from investing in data transparency. The paper shows that increased data transparency can bring substantive returns in lower costs of external borrowing.

This result is obtained by estimating the impact of public data transparency on sovereign spreads conditional on the country’s level of institutional quality and public and external debt. While improving data transparency alone reduces the external borrowing costs for a country, the return is much higher when combined with stronger institutional quality and lower public and external debt. Similarly, the returns on investing in data transparency are higher when a country’s integration to the global economy deepens, as captured by trade and financial openness.

Estimation of an instrumental variable regression shows that Sub-Saharan African countries could have saved up to 14.5 basis points in sovereign bond spreads and decreased their external debt burden by US$405.4 million (0.02 percent of gross domestic product) in 2018, if their average level of data transparency was that of a country in the top quartile of the upper-middle-income country category. At the country level, Angola could have reduced its external debt burden by around US$73.6 million….(More)”.

Barriers to Working With National Health Service England’s Open Data


Paper by Ben Goldacre and Seb Bacon: “Open data is information made freely available to third parties in structured formats without restrictive licensing conditions, permitting commercial and noncommercial organizations to innovate. In the context of National Health Service (NHS) data, this is intended to improve patient outcomes and efficiency. EBM DataLab is a research group with a focus on online tools which turn our research findings into actionable monthly outputs. We regularly import and process more than 15 different NHS open datasets to deliver OpenPrescribing.net, one of the most high-impact use cases for NHS England’s open data, with over 15,000 unique users each month. In this paper, we have described the many breaches of best practices around NHS open data that we have encountered. Examples include datasets that repeatedly change location without warning or forwarding; datasets that are needlessly behind a “CAPTCHA” and so cannot be automatically downloaded; longitudinal datasets that change their structure without warning or documentation; near-duplicate datasets with unexplained differences; datasets that are impossible to locate, and thus may or may not exist; poor or absent documentation; and withholding of data for dubious reasons. We propose new open ways of working that will support better analytics for all users of the NHS. These include better curation, better documentation, and systems for better dialogue with technical teams….(More)”.

Reuse of open data in Quebec: from economic development to government transparency


Paper by

Reuse of open data in Quebec: from economic development to government transparency

Paper by Christian Boudreau: “Based on the history of open data in Quebec, this article discusses the reuse of these data by various actors within society, with the aim of securing desired economic, administrative and democratic benefits. Drawing on an analysis of government measures and community practices in the field of data reuse, the study shows that the benefits of open data appear to be inconclusive in terms of economic growth. On the other hand, their benefits seem promising from the point of view of government transparency in that it allows various civil society actors to monitor the integrity and performance of government activities. In the age of digital data and networks, the state must be seen not only as a platform conducive to innovation, but also as a rich field of study that is closely monitored by various actors driven by political and social goals….

Although the economic benefits of open data have been inconclusive so far, governments, at least in Quebec, must not stop investing in opening up their data. In terms of transparency, the results of the study suggest that the benefits of open data are sufficiently promising to continue releasing government data, if only to support the evaluation and planning activities of public programmes and services….(More)”.

How digital sleuths unravelled the mystery of Iran’s plane crash


Chris Stokel-Walker at Wired: “The video shows a faint glow in the distance, zig-zagging like a piece of paper caught in an underdraft, slowly meandering towards the horizon. Then there’s a bright flash and the trees in the foreground are thrown into shadow as Ukraine International Airlines flight PS752 hits the ground early on the morning of January 8, killing all 176 people on board.

At first, it seemed like an accident – engine failure was fingered as the cause – until the first video showing the plane seemingly on fire as it weaved to the ground surfaced. United States officials started to investigate, and a more complicated picture emerged. It appeared that the plane had been hit by a missile, corroborated by a second video that appears to show the moment the missile ploughs into the Boeing 737-800. While military and intelligence officials at governments around the world were conducting their inquiries in secret, a team of investigators were using open-source intelligence (OSINT) techniques to piece together the puzzle of flight PS752.

It’s not unusual nowadays for OSINT to lead the way in decoding key news events. When Sergei Skripal was poisoned, Bellingcat, an open-source intelligence website, tracked and identified his killers as they traipsed across London and Salisbury. They delved into military records to blow the cover of agents sent to kill. And in the days after the Ukraine Airlines plane crashed into the ground outside Tehran, Bellingcat and The New York Times have blown a hole in the supposition that the downing of the aircraft was an engine failure. The pressure – and the weight of public evidence – compelled Iranian officials to admit overnight on January 10 that the country had shot down the plane “in error”.

So how do they do it? “You can think of OSINT as a puzzle. To get the complete picture, you need to find the missing pieces and put everything together,” says Loránd Bodó, an OSINT analyst at Tech versus Terrorism, a campaign group. The team at Bellingcat and other open-source investigators pore over publicly available material. Thanks to our propensity to reach for our cameraphones at the sight of any newsworthy incident, video and photos are often available, posted to social media in the immediate aftermath of events. (The person who shot and uploaded the second video in this incident, of the missile appearing to hit the Boeing plane was a perfect example: they grabbed their phone after they heard “some sort of shot fired”.) “Open source investigations essentially involve the collection, preservation, verification, and analysis of evidence that is available in the public domain to build a picture of what happened,” says Yvonne McDermott Rees, a lecturer at Swansea University….(More)”.

Open Science, Open Data, and Open Scholarship: European Policies to Make Science Fit for the Twenty-First Century


Paper by Jean-Claude Burgelman et al: “Open science will make science more efficient, reliable, and responsive to societal challenges. The European Commission has sought to advance open science policy from its inception in a holistic and integrated way, covering all aspects of the research cycle from scientific discovery and review to sharing knowledge, publishing, and outreach. We present the steps taken with a forward-looking perspective on the challenges laying ahead, in particular the necessary change of the rewards and incentives system for researchers (for which various actors are co-responsible and which goes beyond the mandate of the European Commission). Finally, we discuss the role of artificial intelligence (AI) within an open science perspective….(More)”.

Open data for electricity modeling: Legal aspects


Paper by Lion Hirth: “Power system modeling is data intensive. In Europe, electricity system data is often available from sources such as statistical offices or system operators. However, it is often unclear if these data can be legally used for modeling, and in particular if such use infringes intellectual property rights. This article reviews the legal status of power system data, both as a guide for data users and for data publishers.

It is based on interpretation of the law, a review of the secondary literature, an analysis of the licenses used by major data distributors, expert interviews, and a series of workshops. A core finding is that in many cases the legality of current practices is doubtful: in fact, it seems likely that modelers infringe intellectual property rights quite regularly. This is true for industry analysis but also academic researchers. A straightforward solution is open data – the idea that data can be freely used, modified, and shared by anyone for any purpose. To be open, it is not sufficient for data to be accessible free of cost, it must also come with an open data license, the most common types of which are also reviewed in this paper….(More)”.

What are hidden data treasuries and how can they help development outcomes?


Blogpost by Damien Jacques et al: “Cashew nuts in Burkina Faso can be seen growing from space. Such is the power of satellite technology, it’s now possible to observe the changing colors of fields as crops slowly ripen.

This matters because it can be used as an early warning of crop failure and food crisis – giving governments and aid agencies more time to organize a response.

Our team built an exhaustive crop type and yield estimation map in Burkina Faso, using artificial intelligence and satellite images from the European Space Agency. 

But building the map would not have been possible without a data set that GIZ, the German government’s international development agency, had collected for one purpose on the ground some years before – and never looked at again.

At Dalberg, we call this a “hidden data treasury” and it has huge potential to be used for good. 

Unlocking data potential

In the records of the GIZ Data Lab, the GPS coordinates and crop yield measurements of just a few hundred cashew fields were sitting dormant.

They’d been collected in 2015 to assess the impact of a program to train farmers. But through the power of machine learning, that data set has been given a new purpose.

Using Dalberg Data Insights’ AIDA platform, our team trained algorithms to analyze satellite images for cashew crops, track the crops’ color as they ripen, and from there, estimate yields for the area covered by the data.

From this, it’s now possible to predict crop failures for thousands of fields.

We believe this “recycling” of old data, when paired with artificial intelligence, can help to bridge the data gaps in low-income countries and meet the UN’s Sustainable Development Goals….(More)”.

The Politics of Open Government Data: Understanding Organizational Responses to Pressure for More Transparency


Paper by Erna Ruijer et al: “This article contributes to the growing body of literature within public management on open government data by taking
a political perspective. We argue that open government data are a strategic resource of organizations and therefore organizations are not likely to share it. We develop an analytical framework for studying the politics of open government data, based on theories of strategic responses to institutional processes, government transparency, and open government data. The framework shows that there can be different organizational strategic responses to open data—varying from conformity to active resistance—and that different institutional antecedents influence these responses. The value of the framework is explored in two cases: a province in the Netherlands and a municipality in France. The cases provide insights into why governments might release datasets in certain policy domains but not in others thereby producing “strategically opaque transparency.” The article concludes that the politics of open government data framework helps us understand open data practices in relation to broader institutional pressures that influence government transparency….(More)”.

Responsible Operations: Data Science, Machine Learning, and AI in Libraries


OCLC Research Position Paper by Thomas Padilla: “Despite greater awareness, significant gaps persist between concept and operationalization in libraries at the level of workflows (managing bias in probabilistic description), policies (community engagement vis-à-vis the development of machine-actionable collections), positions (developing staff who can utilize, develop, critique, and/or promote services influenced by data science, machine learning, and AI), collections (development of “gold standard” training data), and infrastructure (development of systems that make use of these technologies and methods). Shifting from awareness to operationalization will require holistic organizational commitment to responsible operations. The viability of responsible operations depends on organizational incentives and protections that promote constructive dissent…(More)”.

Rosie the Robot: Social accountability one tweet at a time


Blogpost by Yasodara Cordova and Eduardo Vicente Goncalvese: “Every month in Brazil, the government team in charge of processing reimbursement expenses incurred by congresspeople receives more than 20,000 claims. This is a manually intensive process that is prone to error and susceptible to corruption. Under Brazilian law, this information is available to the public, making it possible to check the accuracy of this data with further scrutiny. But it’s hard to sift through so many transactions. Fortunately, Rosie, a robot built to analyze the expenses of the country’s congress members, is helping out.

Rosie was born from Operação Serenata de Amor, a flagship project we helped create with other civic hackers. We suspected that data provided by members of Congress, especially regarding work-related reimbursements, might not always be accurate. There were clear, straightforward reimbursement regulations, but we wondered how easily individuals could maneuver around them. 

Furthermore, we believed that transparency portals and the public data weren’t realizing their full potential for accountability. Citizens struggled to understand public sector jargon and make sense of the extensive volume of data. We thought data science could help make better sense of the open data  provided by the Brazilian government.

Using agile methods, specifically Domain Driven Design, a flexible and adaptive process framework for solving complex problems, our group started studying the regulations, and converting them into  software code. We did this by reverse-engineering the legal documents–understanding the reimbursement rules and brainstorming ways to circumvent them. Next, we thought about the traces this circumvention would leave in the databases and developed a way to identify these traces using the existing data. The public expenses database included the images of the receipts used to claim reimbursements and we could see evidence of expenses, such as alcohol, which weren’t allowed to be paid with public money. We named our creation, Rosie.

This method of researching the regulations to then translate them into software in an agile way is called Domain-Driven Design. Used for complex systems, this useful approach analyzes the data and the sector as an ecosystem, and then uses observations and rapid prototyping to generate and test an evolving model. This is how Rosie works. Rosie sifts through the reported data and flags specific expenses made by representatives as “suspicious.” An example could be purchases that indicate the Congress member was in two locations on the same day and time.

After finding a suspicious transaction, Rosie then automatically tweets the results to both citizens and congress members.  She invites citizens to corroborate or dismiss the suspicions, while also inviting congress members to justify themselves.

Rosie isn’t working alone. Beyond translating the law into computer code, the group also created new interfaces to help citizens check up on Rosie’s suspicions. The same information that was spread in different places in official government websites was put together in a more intuitive, indexed and machine-readable platform. This platform is called Jarbas – its name was inspired by the AI system that controls Tony Stark’s mansion in Iron Man, J.A.R.V.I.S. (which has origins in the human “Jarbas”) – and it is a website and API (application programming interface) that helps citizens more easily navigate and browse data from different sources. Together, Rosie and Jarbas helps citizens use and interpret the data to decide whether there was a misuse of public funds. So far, Rosie has tweeted 967 times. She is particularly good at detecting overpriced meals. According to an open research, made by the group, since her introduction, members of Congress have reduced spending on meals by about ten percent….(More)”.