Report by the Research on Research Institute: “In this working paper, we describe how to map research funding landscapes in order to support research funders in setting priorities. Based on data on scientific publications, a funding landscape highlights the research fields that are supported by different funders. The funding landscape described here has been created using data from the Dimensions database. It is presented using a freely available web-based tool that provides an interactive visualization of the landscape. We demonstrate the use of the tool through a case study in which we analyze funding of mental health research…(More)”.
Ethical guidelines issued by engineers’ organization fail to gain traction
Blogpost by Nicolas Kayser-Bril: “In early 2016, the Institute of Electrical and Electronics Engineers, a professional association known as IEEE, launched a “global initiative to advance ethics in technology.” After almost three years of work and multiple rounds of exchange with experts on the topic, it released last April the first edition of Ethically Aligned Design, a 300-page treatise on the ethics of automated systems.
The general principles issued in the report focus on transparency, human rights and accountability, among other topics. As such, they are not very different from the 83 other ethical guidelines that researchers from the Health Ethics and Policy Lab of the Swiss Federal Institute of Technology in Zurich reviewed in an article published in Nature Machine Intelligence in September. However, one key aspect makes IEEE different from other think-tanks. With over 420,000 members, it is the world’s largest engineers’ association with roots reaching deep into Silicon Valley. Vint Cerf, one of Google’s Vice Presidents, is an IEEE “life fellow.”
Because the purpose of the IEEE principles is to serve as a “key reference for the work of technologists”, and because many technologists contributed to their conception, we wanted to know how three technology companies, Facebook, Google and Twitter, were planning to implement them.
Transparency and accountability
Principle number 5, for instance, requires that the basis of a particular automated decision be “discoverable”. On Facebook and Instagram, the reasons why a particular item is shown on a user’s feed are all but discoverable. Facebook’s “Why You’re Seeing This Post” feature explains that “many factors” are involved in the decision to show a specific item. The help page designed to clarify the matter fails to do so: many sentences there use opaque wording (users are told that “some things influence ranking”, for instance) and the basis of the decisions governing their newsfeeds are impossible to find.
Principle number 6 states that any autonomous system shall “provide an unambiguous rationale for all decisions made.” Google’s advertising systems do not provide an unambiguous rationale when explaining why a particular advert was shown to a user. A click on “Why This Ad” states that an “ad may be based on general factors … [and] information collected by the publisher” (our emphasis). Such vagueness is antithetical to the requirement for explicitness.
AlgorithmWatch sent detailed letters (which you can read below this article) with these examples and more, asking Google, Facebook and Twitter how they planned to implement the IEEE guidelines. This was in June. After a great many emails, phone calls and personal meetings, only Twitter answered. Google gave a vague comment and Facebook promised an answer which never came…(More)”
The weather data gap: How can mobile technology make smallholder farmers climate resilient?
Rishi Raithatha at GSMA: “In the new GSMA AgriTech report, Mobile Technology for Climate Resilience: The role of mobile operators in bridging the data gap, we explore how mobile network operators (MNOs) can play a bigger role in developing and delivering services to strengthen the climate resilience of smallholder farmers. By harnessing their own assets and data, MNOs can improve a broad suite of weather products that are especially relevant for farming communities. These include a variety of weather forecasts (daily, weekly, sub-seasonal and seasonal) and nowcasts, as real-time monitoring and one- to two-hour predictions are often used for Early Warning Systems (EWS) to prevent weather-related disasters. MNOs can also help strengthen the value proposition of other climate products, such as weather index insurance and decision agriculture.
Why do we need more weather data?
Agriculture is highly dependent on regional climates, especially in developing countries where farming is largely rain-fed. Smallholder farmers, who are responsible for the bulk of agricultural production in developing countries, are particularly vulnerable to changing weather patterns – especially given their reliance on natural resources and exclusion from social protection schemes. However, the use of climate adaptation approaches, such as localised weather forecasts and weather index insurance, can enhance smallholder farmers’ ability to withstand the risks posed by climate change and maintain agricultural productivity.
Ground-level measurements are an essential component of climate resilience products; the creation of weather forecasts and nowcasts starts with the analysis of ground, spatial and aerial observations. This involves the use of algorithms, weather models and current and historical observational weather data. Observational instruments, such as radar, weather stations and satellites, are necessary in measuring ground-level weather. However, National Hydrological and Meteorological Services (NHMSs) in developing countries often lack the capacity to generate accurate ground-level measurements beyond a few areas, resulting in gaps in local weather data.
While satellite offers better quality resolution than before, and is more affordable and available to NHMSs, there is a need to complement this data with ground-level measurements. This is especially true in tropical and sub-tropical regions where most smallholder farmers live, where variable local weather patterns can lead to skewed averages from satellite data….(More).”
Secure Shouldn’t Mean Secret: A Call for Public Policy Schools to Share, Support, and Teach Data Stewardship
Paper by Maggie Reeves and Robert McMillan: “The public has long benefitted from researchers using individual-level administrative data (microdata) to answer questions on a gamut of issues related to the efficiency, effectiveness, and causality of programs and policies. However, these benefits have not been pervasive because few researchers have had access to microdata, and their tools, security practices, and technology have rarely been shared. With a clear push to expand access to microdata for purposes of rigorous analysis (Abraham et al., 2017; ADRF Network Working Group Participants, 2018), public policy schools must grapple with imperfect options and decide how to support secure data facilities for their faculty and students. They also must take the lead to educate students as data stewards who can navigate the challenges of microdata access for public policy research.
This white paper outlines the essential components of any secure facility, the pros and cons of four types of secure microdata facilities used for public policy research, the benefits of sharing tools and resources, and the importance of training. It closes with a call on public policy schools to include data stewardship as part of the standard curriculum…(More)”.
Urban Slums in a Datafying Milieu: Challenges for Data-Driven Research Practice
Paper by Bijal Brahmbhatt et al: “With the ongoing trend of urban datafication and growing use of data/evidence to shape developmental initiatives by state as well as non-state actors, this exploratory case study engages with the complex and often contested domains of data use. This study uses on-the-ground experience of working with informal settlements in Indian cities to examine how information value chains work in practice and the contours of their power to intervene in building an agenda of social justice into governance regimes. Using illustrative examples from ongoing action-oriented projects of Mahila Housing Trust in India such as the Energy Audit Project, Slum Mapping Exercise and women-led climate resilience building under the Global Resilience Partnership, it raises questions about challenges of making effective linkages between data, knowledge and action in and for slum communities in the global South by focussing on two issues.
First, it reveals dilemmas of achieving data accuracy when working with slum communities in developing cities where populations are dynamically changing, and where digitisation and use of ICT has limited operational currency. The second issue focuses on data ownership. It foregrounds the need for complementary inputs and the heavy requirement for support systems in informal settlements in order to translate data-driven knowledge into actionable forms. Absence of these will blunt the edge of data-driven community participation in local politics. Through these intersecting streams, the study attempts to address how entanglements between southern urbanism, datafication, governance and social justice diversify the discourse on data justice. It highlights existing hurdles and structural hierarchies within a data-heavy developmental register emergent across multiple cities in the global South where data-driven governmental regimes interact with convoluted urban forms and realities….(More)”.
Algorithmic Impact Assessments under the GDPR: Producing Multi-layered Explanations
Paper by Margot E. Kaminski and Gianclaudio Malgieri: “Policy-makers, scholars, and commentators are increasingly concerned with the risks of using profiling algorithms and automated decision-making. The EU’s General Data Protection Regulation (GDPR) has tried to address these concerns through an array of regulatory tools. As one of us has argued, the GDPR combines individual rights with systemic governance, towards algorithmic accountability. The individual tools are largely geared towards individual “legibility”: making the decision-making system understandable to an individual invoking her rights. The systemic governance tools, instead, focus on bringing expertise and oversight into the system as a whole, and rely on the tactics of “collaborative governance,” that is, use public-private partnerships towards these goals. How these two approaches to transparency and accountability interact remains a largely unexplored question, with much of the legal literature focusing instead on whether there is an individual right to explanation.
The GDPR contains an array of systemic accountability tools. Of these tools, impact assessments (Art. 35) have recently received particular attention on both sides of the Atlantic, as a means of implementing algorithmic accountability at early stages of design, development, and training. The aim of this paper is to address how a Data Protection Impact Assessment (DPIA) links the two faces of the GDPR’s approach to algorithmic accountability: individual rights and systemic collaborative governance. We address the relationship between DPIAs and individual transparency rights. We propose, too, that impact assessments link the GDPR’s two methods of governing algorithmic decision-making by both providing systemic governance and serving as an important “suitable safeguard” (Art. 22) of individual rights….(More)”.
Data Fiduciary in Order to Alleviate Principal-Agent Problems in the Artificial Big Data Age
Paper by Julia M. Puaschunder: “The classic principal-agent problem in political science and economics describes agency dilemmas or problems when one person, the agent, is put in a situation to make decisions on behalf of another entity, the principal. A dilemma occurs in situations when individual profit maximization or principal and agent are pitted against each other. This so-called moral hazard is nowadays emerging in the artificial big data age, when big data reaping entities have to act on behalf of agents, who provide their data with trust in the principal’s integrity and responsible big data conduct. Yet to this day, no data fiduciary has been clearly described and established to protect the agent from misuse of data. This article introduces the agent’s predicament between utility derived from information sharing and dignity in privacy as well as hyper-hyperbolic discounting fallibilities to not clearly foresee what consequences information sharing can have over time and in groups. The principal’s predicament between secrecy and selling big data insights or using big data for manipulative purposes will be outlined. Finally, the article draws a clear distinction between manipulation and nudging in relation to the potential social class division of those who nudge and those who are nudged…(More)”.
The Urban Institute Data Catalog
Data@Urban: “We believe that data make the biggest impact when they are accessible to everyone.
Today, we are excited to announce the public launch of the Urban Institute Data Catalog, a place to discover, learn about, and download open data provided by Urban Institute researchers and data scientists. You can find data that reflect the breadth of Urban’s expertise — health, education, the workforce, nonprofits, local government finances, and so much more.
Built using open source technology, the catalog holds valuable data and metadata that Urban Institute staff have created, enhanced, cleaned, or otherwise added value to as part of our work. And it will provide, for the first time, a central, searchable resource to find many of Urban’s published open data assets.
We hope that researchers, data analysts, civic tech actors, application developers, and many others will use this tool to enhance their work, save time, and generate insights that elevate the policy debate. As Urban produces data for research, analysis, and data visualization, and as new data are released, we will continue to update the catalog.
We’re thrilled to put the power of data in your hands to better understand and respond to many critical issues facing us locally and nationally. If you have comments about the tool or the data it contains, or if you would like to share examples of how you are using these data, please feel free to contact us at [email protected].
Here are some current highlights of the Urban Data Catalog — both the data and research products we’ve built using the data — as of this writing:
– LODES data: The Longitudinal Employer-Household Dynamics Origin-Destination Employment Statistics (LODES) from the US Census Bureau provide detailed information on workers and jobs by census block. We have summarized these large, dispersed data into a set of census tract and census place datasets to make them easier to use. For more information, read our earlier Data@Urban blog post.
– Medicaid opioid data: Our Medicaid Spending and Prescriptions for the Treatment of Opioid Use Disorder and Opioid Overdose dataset is sourced from state drug utilization data and provides breakdowns by state, year, quarter, drug type, and brand name or generic drug status. For more information and to view our data visualization using the data, see the complete project page.
– Nonprofit and foundation data: Members of Urban’s National Center for Charitable Statistics (NCCS) compile, clean, and standardize data from the Internal Revenue Service (IRS) on organizations filing IRS forms 990 or 990-EZ, including private charities, foundations, and other tax-exempt organizations. To read more about these data, see our previous blog posts on redesigning our Nonprofit Sector in Brief Report in R and repurposing our open code and data to create your own custom summary tables….(More)”.
Big Data Analytics in Healthcare
Book edited by Anand J. Kulkarni, Patrick Siarry, Pramod Kumar Singh, Ajith Abraham, Mengjie Zhang, Albert Zomaya and Fazle Baki: “This book includes state-of-the-art discussions on various issues and aspects of the implementation, testing, validation, and application of big data in the context of healthcare. The concept of big data is revolutionary, both from a technological and societal well-being standpoint. This book provides a comprehensive reference guide for engineers, scientists, and students studying/involved in the development of big data tools in the areas of healthcare and medicine. It also features a multifaceted and state-of-the-art literature review on healthcare data, its modalities, complexities, and methodologies, along with mathematical formulations.
The book is divided into two main sections, the first of which discusses the challenges and opportunities associated with the implementation of big data in the healthcare sector. In turn, the second addresses the mathematical modeling of healthcare problems, as well as current and potential future big data applications and platforms…(More)”.
Risk identification and management for the research use of government administrative data
Paper by Elizabeth Shepherd, Anna Sexton, Oliver Duke-Williams, and Alexandra Eveleigh: “Government administrative data have enormous potential for public and individual benefit through improved educational and health services to citizens, medical research, environmental and climate interventions and exploitation of scarce energy resources. Administrative data is usually “collected primarily for administrative (not research) purposes by government departments and other organizations for the purposes of registration, transaction and record keeping, during the delivery of a service” such as health care, vehicle licensing, tax and social security systems (https://esrc.ukri.org/funding/guidance-for-applicants/research-ethics/useful-resources/key-terms-glossary/). Administrative data are usually distinguished from data collected for statistical use such as the census. Unlike administrative records, they do not provide evidence of activities and generally lack metadata and context relating to provenance. Administrative data, unlike open data, are not routinely made open or accessible, but access can be provided only on request to named researchers for specified research projects through research access protocols that often take months to negotiate and are subject to significant constraints around re-use such as the use of safe havens. Researchers seldom make use of freedom of information or access to information protocols to access such data because they need specific datasets and particular levels of granularity and an ability to re-process data, which are not made generally available. This study draws on research undertaken by the authors as part of the Administrative Data Research Centre in England (ADRC-E). The research examined perspectives on the sharing, linking and re-use (secondary use) of administrative data in England, viewed through three analytical themes: trust, consent and risk. This study presents the analysis of the identification and management of risk in the research use of government administrative data and presents a risk framework. Risk management (i.e. coordinated activities that allow organizations to control risks, Lemieux, 2010) enables us to think about the balance between risk and benefit for the public good and for other stakeholders. Mitigating activities or management mechanisms used to control the identified risks depend on the resources available to implement the options, on the risk appetite or tolerance of the community and on the cost and likely effectiveness of the mitigation. Mitigation and risk do not work in isolation and should be holistically viewed by keeping the whole information infrastructure in balance across the administrative data system and between multiple stakeholders.
This study seeks to establish a clearer picture of risk with regard to government administrative data in England. It identifies and categorizes the risks arising from the research use of government administrative data. It identifies mitigating risk management activities, linked to five key stakeholder communities and discusses the locus of responsibility for risk management actions. The identification of the risks and of mitigation strategies is derived from the viewpoints of the interviewees and associated documentation; therefore, they reflect their lived experience. The five stakeholder groups identified from the data are as follows: individual researchers; employers of researchers; wider research community; data creators and providers and data subjects and the broader public. The primary sections of the study, following the methodology and research context, set out the seven identified types of risk events in the research use of administrative data, present a stakeholder mapping of the communities in this research affected by the risks and discuss the findings related to managing and mitigating the risks identified. The conclusion presents the elements of a new risk framework to inform future actions by the government data community and enable researchers to exploit the power of administrative data for public good….(More)”.