Paper by Ben Goldacre and Seb Bacon: “Open data is information made freely available to third parties in structured formats without restrictive licensing conditions, permitting commercial and noncommercial organizations to innovate. In the context of National Health Service (NHS) data, this is intended to improve patient outcomes and efficiency. EBM DataLab is a research group with a focus on online tools which turn our research findings into actionable monthly outputs. We regularly import and process more than 15 different NHS open datasets to deliver OpenPrescribing.net, one of the most high-impact use cases for NHS England’s open data, with over 15,000 unique users each month. In this paper, we have described the many breaches of best practices around NHS open data that we have encountered. Examples include datasets that repeatedly change location without warning or forwarding; datasets that are needlessly behind a “CAPTCHA” and so cannot be automatically downloaded; longitudinal datasets that change their structure without warning or documentation; near-duplicate datasets with unexplained differences; datasets that are impossible to locate, and thus may or may not exist; poor or absent documentation; and withholding of data for dubious reasons. We propose new open ways of working that will support better analytics for all users of the NHS. These include better curation, better documentation, and systems for better dialogue with technical teams….(More)”.
Reuse of open data in Quebec: from economic development to government transparency
Paper by
Reuse of open data in Quebec: from economic development to government transparency
Paper by Christian Boudreau: “Based on the history of open data in Quebec, this article discusses the reuse of these data by various actors within society, with the aim of securing desired economic, administrative and democratic benefits. Drawing on an analysis of government measures and community practices in the field of data reuse, the study shows that the benefits of open data appear to be inconclusive in terms of economic growth. On the other hand, their benefits seem promising from the point of view of government transparency in that it allows various civil society actors to monitor the integrity and performance of government activities. In the age of digital data and networks, the state must be seen not only as a platform conducive to innovation, but also as a rich field of study that is closely monitored by various actors driven by political and social goals….
Although the economic benefits of open data have been inconclusive so far, governments, at least in Quebec, must not stop investing in opening up their data. In terms of transparency, the results of the study suggest that the benefits of open data are sufficiently promising to continue releasing government data, if only to support the evaluation and planning activities of public programmes and services….(More)”.
How digital sleuths unravelled the mystery of Iran’s plane crash
Chris Stokel-Walker at Wired: “The video shows a faint glow in the distance, zig-zagging like a piece of paper caught in an underdraft, slowly meandering towards the horizon. Then there’s a bright flash and the trees in the foreground are thrown into shadow as Ukraine International Airlines flight PS752 hits the ground early on the morning of January 8, killing all 176 people on board.
At first, it seemed like an accident – engine failure was fingered as the cause – until the first video showing the plane seemingly on fire as it weaved to the ground surfaced. United States officials started to investigate, and a more complicated picture emerged. It appeared that the plane had been hit by a missile, corroborated by a second video that appears to show the moment the missile ploughs into the Boeing 737-800. While military and intelligence officials at governments around the world were conducting their inquiries in secret, a team of investigators were using open-source intelligence (OSINT) techniques to piece together the puzzle of flight PS752.
It’s not unusual nowadays for OSINT to lead the way in decoding key news events. When Sergei Skripal was poisoned, Bellingcat, an open-source intelligence website, tracked and identified his killers as they traipsed across London and Salisbury. They delved into military records to blow the cover of agents sent to kill. And in the days after the Ukraine Airlines plane crashed into the ground outside Tehran, Bellingcat and The New York Times have blown a hole in the supposition that the downing of the aircraft was an engine failure. The pressure – and the weight of public evidence – compelled Iranian officials to admit overnight on January 10 that the country had shot down the plane “in error”.
So how do they do it? “You can think of OSINT as a puzzle. To get the complete picture, you need to find the missing pieces and put everything together,” says Loránd Bodó, an OSINT analyst at Tech versus Terrorism, a campaign group. The team at Bellingcat and other open-source investigators pore over publicly available material. Thanks to our propensity to reach for our cameraphones at the sight of any newsworthy incident, video and photos are often available, posted to social media in the immediate aftermath of events. (The person who shot and uploaded the second video in this incident, of the missile appearing to hit the Boeing plane was a perfect example: they grabbed their phone after they heard “some sort of shot fired”.) “Open source investigations essentially involve the collection, preservation, verification, and analysis of evidence that is available in the public domain to build a picture of what happened,” says Yvonne McDermott Rees, a lecturer at Swansea University….(More)”.
Open Science, Open Data, and Open Scholarship: European Policies to Make Science Fit for the Twenty-First Century
Paper by Jean-Claude Burgelman et al: “Open science will make science more efficient, reliable, and responsive to societal challenges. The European Commission has sought to advance open science policy from its inception in a holistic and integrated way, covering all aspects of the research cycle from scientific discovery and review to sharing knowledge, publishing, and outreach. We present the steps taken with a forward-looking perspective on the challenges laying ahead, in particular the necessary change of the rewards and incentives system for researchers (for which various actors are co-responsible and which goes beyond the mandate of the European Commission). Finally, we discuss the role of artificial intelligence (AI) within an open science perspective….(More)”.
Open data for electricity modeling: Legal aspects
Paper by Lion Hirth: “Power system modeling is data intensive. In Europe, electricity system data is often available from sources such as statistical offices or system operators. However, it is often unclear if these data can be legally used for modeling, and in particular if such use infringes intellectual property rights. This article reviews the legal status of power system data, both as a guide for data users and for data publishers.
It is based on interpretation of the law, a review of the secondary literature, an analysis of the licenses used by major data distributors, expert interviews, and a series of workshops. A core finding is that in many cases the legality of current practices is doubtful: in fact, it seems likely that modelers infringe intellectual property rights quite regularly. This is true for industry analysis but also academic researchers. A straightforward solution is open data – the idea that data can be freely used, modified, and shared by anyone for any purpose. To be open, it is not sufficient for data to be accessible free of cost, it must also come with an open data license, the most common types of which are also reviewed in this paper….(More)”.
What are hidden data treasuries and how can they help development outcomes?
Blogpost by Damien Jacques et al: “Cashew nuts in Burkina Faso can be seen growing from space. Such is the power of satellite technology, it’s now possible to observe the changing colors of fields as crops slowly ripen.
This matters because it can be used as an early warning of crop failure and food crisis – giving governments and aid agencies more time to organize a response.
Our team built an exhaustive crop type and yield estimation map in Burkina Faso, using artificial intelligence and satellite images from the European Space Agency.
But building the map would not have been possible without a data set that GIZ, the German government’s international development agency, had collected for one purpose on the ground some years before – and never looked at again.
At Dalberg, we call this a “hidden data treasury” and it has huge potential to be used for good.
Unlocking data potential
In the records of the GIZ Data Lab, the GPS coordinates and crop yield measurements of just a few hundred cashew fields were sitting dormant.
They’d been collected in 2015 to assess the impact of a program to train farmers. But through the power of machine learning, that data set has been given a new purpose.
Using Dalberg Data Insights’ AIDA platform, our team trained algorithms to analyze satellite images for cashew crops, track the crops’ color as they ripen, and from there, estimate yields for the area covered by the data.
From this, it’s now possible to predict crop failures for thousands of fields.
We believe this “recycling” of old data, when paired with artificial intelligence, can help to bridge the data gaps in low-income countries and meet the UN’s Sustainable Development Goals….(More)”.
The Politics of Open Government Data: Understanding Organizational Responses to Pressure for More Transparency
Paper by Erna Ruijer et al: “This article contributes to the growing body of literature within public management on open government data by taking
a political perspective. We argue that open government data are a strategic resource of organizations and therefore organizations are not likely to share it. We develop an analytical framework for studying the politics of open government data, based on theories of strategic responses to institutional processes, government transparency, and open government data. The framework shows that there can be different organizational strategic responses to open data—varying from conformity to active resistance—and that different institutional antecedents influence these responses. The value of the framework is explored in two cases: a province in the Netherlands and a municipality in France. The cases provide insights into why governments might release datasets in certain policy domains but not in others thereby producing “strategically opaque transparency.” The article concludes that the politics of open government data framework helps us understand open data practices in relation to broader institutional pressures that influence government transparency….(More)”.
Responsible Operations: Data Science, Machine Learning, and AI in Libraries
OCLC Research Position Paper by Thomas Padilla: “Despite greater awareness, significant gaps persist between concept and operationalization in libraries at the level of workflows (managing bias in probabilistic description), policies (community engagement vis-à-vis the development of machine-actionable collections), positions (developing staff who can utilize, develop, critique, and/or promote services influenced by data science, machine learning, and AI), collections (development of “gold standard” training data), and infrastructure (development of systems that make use of these technologies and methods). Shifting from awareness to operationalization will require holistic organizational commitment to responsible operations. The viability of responsible operations depends on organizational incentives and protections that promote constructive dissent…(More)”.
Rosie the Robot: Social accountability one tweet at a time
Blogpost by Yasodara Cordova and Eduardo Vicente Goncalvese: “Every month in Brazil, the government team in charge of processing reimbursement expenses incurred by congresspeople receives more than 20,000 claims. This is a manually intensive process that is prone to error and susceptible to corruption. Under Brazilian law, this information is available to the public, making it possible to check the accuracy of this data with further scrutiny. But it’s hard to sift through so many transactions. Fortunately, Rosie, a robot built to analyze the expenses of the country’s congress members, is helping out.
Rosie was born from Operação Serenata de Amor, a flagship project we helped create with other civic hackers. We suspected that data provided by members of Congress, especially regarding work-related reimbursements, might not always be accurate. There were clear, straightforward reimbursement regulations, but we wondered how easily individuals could maneuver around them.
Furthermore, we believed that transparency portals and the public data weren’t realizing their full potential for accountability. Citizens struggled to understand public sector jargon and make sense of the extensive volume of data. We thought data science could help make better sense of the open data provided by the Brazilian government.
Using agile methods, specifically Domain Driven Design, a flexible and adaptive process framework for solving complex problems, our group started studying the regulations, and converting them into software code. We did this by reverse-engineering the legal documents–understanding the reimbursement rules and brainstorming ways to circumvent them. Next, we thought about the traces this circumvention would leave in the databases and developed a way to identify these traces using the existing data. The public expenses database included the images of the receipts used to claim reimbursements and we could see evidence of expenses, such as alcohol, which weren’t allowed to be paid with public money. We named our creation, Rosie.
This method of researching the regulations to then translate them into software in an agile way is called Domain-Driven Design. Used for complex systems, this useful approach analyzes the data and the sector as an ecosystem, and then uses observations and rapid prototyping to generate and test an evolving model. This is how Rosie works. Rosie sifts through the reported data and flags specific expenses made by representatives as “suspicious.” An example could be purchases that indicate the Congress member was in two locations on the same day and time.
After finding a suspicious transaction, Rosie then automatically tweets the results to both citizens and congress members. She invites citizens to corroborate or dismiss the suspicions, while also inviting congress members to justify themselves.
Rosie isn’t working alone. Beyond translating the law into computer code, the group also created new interfaces to help citizens check up on Rosie’s suspicions. The same information that was spread in different places in official government websites was put together in a more intuitive, indexed and machine-readable platform. This platform is called Jarbas – its name was inspired by the AI system that controls Tony Stark’s mansion in Iron Man, J.A.R.V.I.S. (which has origins in the human “Jarbas”) – and it is a website and API (application programming interface) that helps citizens more easily navigate and browse data from different sources. Together, Rosie and Jarbas helps citizens use and interpret the data to decide whether there was a misuse of public funds. So far, Rosie has tweeted 967 times. She is particularly good at detecting overpriced meals. According to an open research, made by the group, since her introduction, members of Congress have reduced spending on meals by about ten percent….(More)”.
The Challenges of Sharing Data in an Era of Politicized Science
Editorial by Howard Bauchner in JAMA: “The goal of making science more transparent—sharing data, posting results on trial registries, use of preprint servers, and open access publishing—may enhance scientific discovery and improve individual and population health, but it also comes with substantial challenges in an era of politicized science, enhanced skepticism, and the ubiquitous world of social media. The recent announcement by the Trump administration of plans to proceed with an updated version of the proposed rule “Strengthening Transparency in Regulatory Science,” stipulating that all underlying data from studies that underpin public health regulations from the US Environmental Protection Agency (EPA) must be made publicly available so that those data can be independently validated, epitomizes some of these challenges. According to EPA Administrator Andrew Wheeler: “Good science is science that can be replicated and independently validated, science that can hold up to scrutiny. That is why we’re moving forward to ensure that the science supporting agency decisions is transparent and available for evaluation by the public and stakeholders.”
Virtually every time JAMA publishes an article on the effects of pollution or climate change on health, the journal immediately receives demands from critics to retract the article for various reasons. Some individuals and groups simply do not believe that pollution or climate change affects human health. Research on climate change, and the effects of climate change on the health of the planet and human beings, if made available to anyone for reanalysis could be manipulated to find a different outcome than initially reported. In an age of skepticism about many issues, including science, with the ability to use social media to disseminate unfounded and at times potentially harmful ideas, it is challenging to balance the potential benefits of sharing data with the harms that could be done by reanalysis.
Can the experience of sharing data derived from randomized clinical trials (RCTs)—either as mandated by some funders and journals or as supported by individual investigators—serve as examples as a way to safeguard “truth” in science….
Although the sharing of data may have numerous benefits, it also comes with substantial challenges particularly in highly contentious and politicized areas, such as the effects of climate change and pollution on health, in which the public dialogue appears to be based on as much fiction as fact. The sharing of data, whether mandated by funders, including foundations and government, or volunteered by scientists who believe in the principle of data transparency, is a complicated issue in the evolving world of science, analysis, skepticism, and communication. Above all, the scientific process—including original research and reanalysis of shared data—must prevail, and the inherent search for evidence, facts, and truth must not be compromised by special interests, coercive influences, or politicized perspectives. There are no simple answers, just words of caution and concern….(More)”.