What are hidden data treasuries and how can they help development outcomes?


Blogpost by Damien Jacques et al: “Cashew nuts in Burkina Faso can be seen growing from space. Such is the power of satellite technology, it’s now possible to observe the changing colors of fields as crops slowly ripen.

This matters because it can be used as an early warning of crop failure and food crisis – giving governments and aid agencies more time to organize a response.

Our team built an exhaustive crop type and yield estimation map in Burkina Faso, using artificial intelligence and satellite images from the European Space Agency. 

But building the map would not have been possible without a data set that GIZ, the German government’s international development agency, had collected for one purpose on the ground some years before – and never looked at again.

At Dalberg, we call this a “hidden data treasury” and it has huge potential to be used for good. 

Unlocking data potential

In the records of the GIZ Data Lab, the GPS coordinates and crop yield measurements of just a few hundred cashew fields were sitting dormant.

They’d been collected in 2015 to assess the impact of a program to train farmers. But through the power of machine learning, that data set has been given a new purpose.

Using Dalberg Data Insights’ AIDA platform, our team trained algorithms to analyze satellite images for cashew crops, track the crops’ color as they ripen, and from there, estimate yields for the area covered by the data.

From this, it’s now possible to predict crop failures for thousands of fields.

We believe this “recycling” of old data, when paired with artificial intelligence, can help to bridge the data gaps in low-income countries and meet the UN’s Sustainable Development Goals….(More)”.

The Politics of Open Government Data: Understanding Organizational Responses to Pressure for More Transparency


Paper by Erna Ruijer et al: “This article contributes to the growing body of literature within public management on open government data by taking
a political perspective. We argue that open government data are a strategic resource of organizations and therefore organizations are not likely to share it. We develop an analytical framework for studying the politics of open government data, based on theories of strategic responses to institutional processes, government transparency, and open government data. The framework shows that there can be different organizational strategic responses to open data—varying from conformity to active resistance—and that different institutional antecedents influence these responses. The value of the framework is explored in two cases: a province in the Netherlands and a municipality in France. The cases provide insights into why governments might release datasets in certain policy domains but not in others thereby producing “strategically opaque transparency.” The article concludes that the politics of open government data framework helps us understand open data practices in relation to broader institutional pressures that influence government transparency….(More)”.

Responsible Operations: Data Science, Machine Learning, and AI in Libraries


OCLC Research Position Paper by Thomas Padilla: “Despite greater awareness, significant gaps persist between concept and operationalization in libraries at the level of workflows (managing bias in probabilistic description), policies (community engagement vis-à-vis the development of machine-actionable collections), positions (developing staff who can utilize, develop, critique, and/or promote services influenced by data science, machine learning, and AI), collections (development of “gold standard” training data), and infrastructure (development of systems that make use of these technologies and methods). Shifting from awareness to operationalization will require holistic organizational commitment to responsible operations. The viability of responsible operations depends on organizational incentives and protections that promote constructive dissent…(More)”.

Rosie the Robot: Social accountability one tweet at a time


Blogpost by Yasodara Cordova and Eduardo Vicente Goncalvese: “Every month in Brazil, the government team in charge of processing reimbursement expenses incurred by congresspeople receives more than 20,000 claims. This is a manually intensive process that is prone to error and susceptible to corruption. Under Brazilian law, this information is available to the public, making it possible to check the accuracy of this data with further scrutiny. But it’s hard to sift through so many transactions. Fortunately, Rosie, a robot built to analyze the expenses of the country’s congress members, is helping out.

Rosie was born from Operação Serenata de Amor, a flagship project we helped create with other civic hackers. We suspected that data provided by members of Congress, especially regarding work-related reimbursements, might not always be accurate. There were clear, straightforward reimbursement regulations, but we wondered how easily individuals could maneuver around them. 

Furthermore, we believed that transparency portals and the public data weren’t realizing their full potential for accountability. Citizens struggled to understand public sector jargon and make sense of the extensive volume of data. We thought data science could help make better sense of the open data  provided by the Brazilian government.

Using agile methods, specifically Domain Driven Design, a flexible and adaptive process framework for solving complex problems, our group started studying the regulations, and converting them into  software code. We did this by reverse-engineering the legal documents–understanding the reimbursement rules and brainstorming ways to circumvent them. Next, we thought about the traces this circumvention would leave in the databases and developed a way to identify these traces using the existing data. The public expenses database included the images of the receipts used to claim reimbursements and we could see evidence of expenses, such as alcohol, which weren’t allowed to be paid with public money. We named our creation, Rosie.

This method of researching the regulations to then translate them into software in an agile way is called Domain-Driven Design. Used for complex systems, this useful approach analyzes the data and the sector as an ecosystem, and then uses observations and rapid prototyping to generate and test an evolving model. This is how Rosie works. Rosie sifts through the reported data and flags specific expenses made by representatives as “suspicious.” An example could be purchases that indicate the Congress member was in two locations on the same day and time.

After finding a suspicious transaction, Rosie then automatically tweets the results to both citizens and congress members.  She invites citizens to corroborate or dismiss the suspicions, while also inviting congress members to justify themselves.

Rosie isn’t working alone. Beyond translating the law into computer code, the group also created new interfaces to help citizens check up on Rosie’s suspicions. The same information that was spread in different places in official government websites was put together in a more intuitive, indexed and machine-readable platform. This platform is called Jarbas – its name was inspired by the AI system that controls Tony Stark’s mansion in Iron Man, J.A.R.V.I.S. (which has origins in the human “Jarbas”) – and it is a website and API (application programming interface) that helps citizens more easily navigate and browse data from different sources. Together, Rosie and Jarbas helps citizens use and interpret the data to decide whether there was a misuse of public funds. So far, Rosie has tweeted 967 times. She is particularly good at detecting overpriced meals. According to an open research, made by the group, since her introduction, members of Congress have reduced spending on meals by about ten percent….(More)”.

The Challenges of Sharing Data in an Era of Politicized Science


Editorial by Howard Bauchner in JAMA: “The goal of making science more transparent—sharing data, posting results on trial registries, use of preprint servers, and open access publishing—may enhance scientific discovery and improve individual and population health, but it also comes with substantial challenges in an era of politicized science, enhanced skepticism, and the ubiquitous world of social media. The recent announcement by the Trump administration of plans to proceed with an updated version of the proposed rule “Strengthening Transparency in Regulatory Science,” stipulating that all underlying data from studies that underpin public health regulations from the US Environmental Protection Agency (EPA) must be made publicly available so that those data can be independently validated, epitomizes some of these challenges. According to EPA Administrator Andrew Wheeler: “Good science is science that can be replicated and independently validated, science that can hold up to scrutiny. That is why we’re moving forward to ensure that the science supporting agency decisions is transparent and available for evaluation by the public and stakeholders.”

Virtually every time JAMA publishes an article on the effects of pollution or climate change on health, the journal immediately receives demands from critics to retract the article for various reasons. Some individuals and groups simply do not believe that pollution or climate change affects human health. Research on climate change, and the effects of climate change on the health of the planet and human beings, if made available to anyone for reanalysis could be manipulated to find a different outcome than initially reported. In an age of skepticism about many issues, including science, with the ability to use social media to disseminate unfounded and at times potentially harmful ideas, it is challenging to balance the potential benefits of sharing data with the harms that could be done by reanalysis.

Can the experience of sharing data derived from randomized clinical trials (RCTs)—either as mandated by some funders and journals or as supported by individual investigators—serve as examples as a way to safeguard “truth” in science….

Although the sharing of data may have numerous benefits, it also comes with substantial challenges particularly in highly contentious and politicized areas, such as the effects of climate change and pollution on health, in which the public dialogue appears to be based on as much fiction as fact. The sharing of data, whether mandated by funders, including foundations and government, or volunteered by scientists who believe in the principle of data transparency, is a complicated issue in the evolving world of science, analysis, skepticism, and communication. Above all, the scientific process—including original research and reanalysis of shared data—must prevail, and the inherent search for evidence, facts, and truth must not be compromised by special interests, coercive influences, or politicized perspectives. There are no simple answers, just words of caution and concern….(More)”.

The Impact of Open Data on Public Procurement


Paper by Raphael Duguay, Thomas Rauter and Delphine Samuels: “We examine how the increased accessibility of public purchasing data affects competition, prices, contract allocations, and contract performance in government procurement. The European Union recently made its already public but difficult-to-access information about the process and outcomes of procurement awards available for bulk download in a user-friendly format.

Comparing government contracts above EU publication thresholds with contracts that are not, we find that increasing the public accessibility of procurement data raises the likelihood of having competitive bidding processes, increases the number of bids per contract, and facilitates market entry by new vendors. Following the open data initiative, procurement prices decrease and EU government agencies are more likely to award contracts to the lowest bidder.

However, the increased competition comes at a cost ─ firms execute government contracts with more delays and ex-post price renegotiations. These effects are stronger for new vendors, complex procurement projects, and contracts awarded solely based on price. Overall, our results suggest that open procurement data facilitates competition and lowers ex-ante procurement prices but does not necessarily increase allocative efficiency in government contracting….(More)”.

The Trace


About: “The Trace is an independent, nonpartisan, nonprofit newsroom dedicated to shining a light on America’s gun violence crisis….

Every year in our country, a firearm is used in nearly 500,000 crimes, resulting in the deaths and injuries of more than 110,000 people. Shootings devastate families and communities and drain billions of dollars from local, state, and federal governments. Meanwhile, the problem of gun violence has been compounded by another: the shortage of knowledge about the issue…

Data and records are shielded from public view—or don’t exist. Gun-lobby backed restrictions on federal gun violence research deprive policymakers and public health experts of potentially life-saving facts. Other laws limit the information that law enforcement agencies can share on illegal guns and curb litigation that could allow scrutiny of industry practices….

We make the problem clear. In partnership with Slate, we built an eye-opening, interactive map plotting the locations of nearly 40,000 incidents of gun violence nationwide. The feature received millions of pageviews and generated extensive local coverage and social media conversation. “So many shootings and deaths, so close to my home,” wrote one reader. “And I hadn’t even heard about most of them.”…(More)”.

Benefits of Open Data in Public Health


Paper by P. Huston, VL. Edge and E. Bernier: “Open Data is part of a broad global movement that is not only advancing science and scientific communication but also transforming modern society and how decisions are made. What began with a call for Open Science and the rise of online journals has extended to Open Data, based on the premise that if reports on data are open, then the generated or supporting data should be open as well. There have been a number of advances in Open Data over the last decade, spearheaded largely by governments. A real benefit of Open Data is not simply that single databases can be used more widely; it is that these data can also be leveraged, shared and combined with other data. Open Data facilitates scientific collaboration, enriches research and advances analytical capacity to inform decisions. In the human and environmental health realms, for example, the ability to access and combine diverse data can advance early signal detection, improve analysis and evaluation, inform program and policy development, increase capacity for public participation, enable transparency and improve accountability. However, challenges remain. Enormous resources are needed to make the technological shift to open and interoperable databases accessible with common protocols and terminology. Amongst data generators and users, this shift also involves a cultural change: from regarding databases as restricted intellectual property, to considering data as a common good. There is a need to address legal and ethical considerations in making this shift. Finally, along with efforts to modify infrastructure and address the cultural, legal and ethical issues, it is important to share the information equitably and effectively. While there is great potential of the open, timely, equitable and straightforward sharing of data, fully realizing the myriad of benefits of Open Data will depend on how effectively these challenges are addressed….(More)”.

The Public-Data Opportunity: Why Governments Should Share More


Press Release: “The Lisbon Council launches The Public-Data Opportunity: Why Governments Should Share More, a new discussion paper that looks at the state of play for public-sector data sharing – and calls for better protocols and procedures to deliver data-driven service to all Europeans. The paper analyses the importance of data-sharing between European Union public agencies, identifies the barriers and proposes seven policy recommendations that will help lift them. It builds on the research conducted by the “Understanding Value Co-Creation in Public Services for Transforming European Public Administrations” project, or Co-VAL, a 12-partner research consortium, co-funded by the European Union. And was launched at The 2019 Digital Government Conference convened by the Presidency of the European Council of Finland in Helsinki….(More)”

Restricting data’s use: A spectrum of concerns in need of flexible approaches


Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.