Responsible Data Governance of Neuroscience Big Data


Paper by B. Tyr Fothergill et al: “Current discussions of the ethical aspects of big data are shaped by concerns regarding the social consequences of both the widespread adoption of machine learning and the ways in which biases in data can be replicated and perpetuated. We instead focus here on the ethical issues arising from the use of big data in international neuroscience collaborations.

Neuroscience innovation relies upon neuroinformatics, large-scale data collection and analysis enabled by novel and emergent technologies. Each step of this work involves aspects of ethics, ranging from concerns for adherence to informed consent or animal protection principles and issues of data re-use at the stage of data collection, to data protection and privacy during data processing and analysis, and issues of attribution and intellectual property at the data-sharing and publication stages.

Significant dilemmas and challenges with far-reaching implications are also inherent, including reconciling the ethical imperative for openness and validation with data protection compliance, and considering future innovation trajectories or the potential for misuse of research results. Furthermore, these issues are subject to local interpretations within different ethical cultures applying diverse legal systems emphasising different aspects. Neuroscience big data require a concerted approach to research across boundaries, wherein ethical aspects are integrated within a transparent, dialogical data governance process. We address this by developing the concept of ‘responsible data governance’, applying the principles of Responsible Research and Innovation (RRI) to the challenges presented by governance of neuroscience big data in the Human Brain Project (HBP)….(More)”.

Responsible data sharing in international health research: a systematic review of principles and norms


Paper by Shona Kalkman, Menno Mostert, Christoph Gerlinger, Johannes J. M. van Delden and Ghislaine J. M. W. van Thiel: ” Large-scale linkage of international clinical datasets could lead to unique insights into disease aetiology and facilitate treatment evaluation and drug development. Hereto, multi-stakeholder consortia are currently designing several disease-specific translational research platforms to enable international health data sharing. Despite the recent adoption of the EU General Data Protection Regulation (GDPR), the procedures for how to govern responsible data sharing in such projects are not at all spelled out yet. In search of a first, basic outline of an ethical governance framework, we set out to explore relevant ethical principles and norms…

We observed an abundance of principles and norms with considerable convergence at the aggregate level of four overarching themes: societal benefits and value; distribution of risks, benefits and burdens; respect for individuals and groups; and public trust and engagement. However, at the level of principles and norms we identified substantial variation in the phrasing and level of detail, the number and content of norms considered necessary to protect a principle, and the contextual approaches in which principles and norms are used....

While providing some helpful leads for further work on a coherent governance framework for data sharing, the current collection of principles and norms prompts important questions about how to streamline terminology regarding de-identification and how to harmonise the identified principles and norms into a coherent governance framework that promotes data sharing while securing public trust….(More)”

Data-driven models of governance across borders


Introduction to Special Issue of FirstMonday, edited by Payal Arora and Hallam Stevens: “This special issue looks closely at contemporary data systems in diverse global contexts and through this set of papers, highlights the struggles we face as we negotiate efficiency and innovation with universal human rights and social inclusion. The studies presented in these essays are situated in diverse models of policy-making, governance, and/or activism across borders. Attention to big data governance in western contexts has tended to highlight how data increases state and corporate surveillance of citizens, affecting rights to privacy. By moving beyond Euro-American borders — to places such as Africa, India, China, and Singapore — we show here how data regimes are motivated and understood on very different terms….

To establish a kind of baseline, the special issue opens by considering attitudes toward big data in Europe. René König’s essay examines the role of “citizen conferences” in understanding the public’s view of big data in Germany. These “participatory technology assessments” demonstrated that citizens were concerned about the control of big data (should it be under the control of the government or individuals?), about the need for more education about big data technologies, and the need for more government regulation. Participants expressed, in many ways, traditional liberal democratic views and concerns about these technologies centered on individual rights, individual responsibilities, and education. Their proposed solutions too — more education and more government regulation — fit squarely within western liberal democratic traditions.

In contrast to this, Payal Arora’s essay draws us immediately into the vastly different contexts of data governance in India and China. India’s Aadhaar biometric identification system, through tracking its citizens with iris scanning and other measures, promises to root out corruption and provide social services to those most in need. Likewise, China’s emerging “social credit system,” while having immense potential for increasing citizen surveillance, offers ways of increasing social trust and fostering more responsible social behavior online and offline. Although the potential for authoritarian abuses of both systems is high, Arora focuses on how these technologies are locally understood and lived on an everyday basis, which spans from empowering to oppressing their people. From this perspective, the technologies offer modes of “disrupt[ing] systems of inequality and oppression” that should open up new conversations about what democratic participation can and should look like in China and India.

If China and India offer contrasting non-democratic and democratic cases, we turn next to a context that is neither completely western nor completely non-western, neither completely democratic nor completely liberal. Hallam Stevens’ account of government data in Singapore suggests the very different role that data can play in this unique political and social context. Although the island state’s data.gov.sg participates in global discourses of sharing, “open data,” and transparency, much of the data made available by the government is oriented towards the solution of particular economic and social problems. Ultimately, the ways in which data are presented may contribute to entrenching — rather than undermining or transforming — existing forms of governance. The account of data and its meanings that is offered here once again challenges the notion that such data systems can or should be understood in the same ways that similar systems have been understood in the western world.

If systems such as Aadhaar, “social credit,” and data.gov.sg profess to make citizens and governments more visible and legible, Rolien Hoyngexamines what may remain invisible even within highly pervasive data-driven systems. In the world of e-waste, data-driven modes of surveillance and logistics are critical for recycling. But many blind spots remain. Hoyng’s account reminds us that despite the often-supposed all-seeing-ness of big data, we should remain attentive to what escapes the data’s gaze. Here, in midst of datafication, we find “invisibility, uncertainty, and, therewith, uncontrollability.” This points also to the gap between the fantasies of how data-driven systems are supposed to work, and their realization in the world. Such interstices allow individuals — those working with e-waste in Shenzhen or Africa, for example — to find and leverage hidden opportunities. From this perspective, the “blind spots of big data” take on a very different significance.

Big data systems provide opportunities for some, but reduce those for others. Mark Graham and Mohammad Amir Anwar examine what happens when online outsourcing platforms create a “planetary labor market.” Although providing opportunities for many people to make money via their Internet connection, Graham and Anwar’s interviews with workers across sub-Saharan Africa demonstrate how “platform work” alters the balance of power between labor and capital. For many low-wage workers across the globe, the platform- and data-driven planetary labor market means downward pressure on wages, fewer opportunities to collectively organize, less worker agency, and less transparency about the nature of the work itself. Moving beyond bold pronouncements that the “world is flat” and big data as empowering, Graham and Anwar show how data-driven systems of employment can act to reduce opportunities for those residing in the poorest parts of the world. The affordances of data and platforms create a planetary labor market for global capital but tie workers ever-more tightly to their own localities. Once again, the valances of global data systems look very different from this “bottom-up” perspective.

Philippa Metcalfe and Lina Dencik shift this conversation from the global movement of labor to that of people, as they write about the implications of European datafication systems on the governance of refugees entering this region. This work highlights how intrinsic to datafication systems is the classification, coding, and collating of people to legitimize the extent of their belonging in the society they seek to live in. The authors argue that these datafied regimes of power have substantively increased their role in the regulating of human mobility in the guise of national security. These means of data surveillance can foster new forms of containment and entrapment of entire groups of people, creating further divides between “us” and “them.” Through vast interoperable databases, digital registration processes, biometric data collection, and social media identity verification, refugees have become some of the most monitored groups at a global level while at the same time, their struggles remain the most invisible in popular discourse….(More)”.

Privacy-Preserved Data Sharing for Evidence-Based Policy Decisions: A Demonstration Project Using Human Services Administrative Records for Evidence-Building Activities


Paper by the Bipartisan Policy Center: “Emerging privacy-preserving technologies and approaches hold considerable promise for improving data privacy and confidentiality in the 21st century. At the same time, more information is becoming accessible to support evidence-based policymaking.

In 2017, the U.S. Commission on Evidence-Based Policymaking unanimously recommended that further attention be given to the deployment of privacy-preserving data-sharing applications. If these types of applications can be tested and scaled in the near-term, they could vastly improve insights about important policy problems by using disparate datasets. At the same time, the approaches could promote substantial gains in privacy for the American public.

There are numerous ways to engage in privacy-preserving data sharing. This paper primarily focuses on secure computation, which allows information to be accessed securely, guarantees privacy, and permits analysis without making private information available. Three key issues motivated the launch of a domestic secure computation demonstration project using real government-collected data:

  • Using new privacy-preserving approaches addresses pressing needs in society. Current widely accepted approaches to managing privacy risks—like preventing the identification of individuals or organizations in public datasets—will become less effective over time. While there are many practices currently in use to keep government-collected data confidential, they do not often incorporate modern developments in computer science, mathematics, and statistics in a timely way. New approaches can enable researchers to combine datasets to improve the capability for insights, without being impeded by traditional concerns about bringing large, identifiable datasets together. In fact, if successful, traditional approaches to combining data for analysis may not be as necessary.
  • There are emerging technical applications to deploy certain privacy-preserving approaches in targeted settings. These emerging procedures are increasingly enabling larger-scale testing of privacy-preserving approaches across a variety of policy domains, governmental jurisdictions, and agency settings to demonstrate the privacy guarantees that accompany data access and use.
  • Widespread adoption and use by public administrators will only follow meaningful and successful demonstration projects. For example, secure computation approaches are complex and can be difficult to understand for those unfamiliar with their potential. Implementing new privacy-preserving approaches will require thoughtful attention to public policy implications, public opinions, legal restrictions, and other administrative limitations that vary by agency and governmental entity.

This project used real-world government data to illustrate the applicability of secure computation compared to the classic data infrastructure available to some local governments. The project took place in a domestic, non-intelligence setting to increase the salience of potential lessons for public agencies….(More)”.

Data: The Lever to Promote Innovation in the EU


Blog Post by Juan Murillo Arias: “…But in order for data to truly become a lever that foments innovation in benefit of society as a whole, we must understand and address the following factors:

1. Disconnected, disperse sources. As users of digital services (transportation, finance, telecommunications, news or entertainment) we leave a different digital footprint for each service that we use. These footprints, which are different facets of the same polyhedron, can even be contradictory on occasion. For this reason, they must be seen as complementary. Analysts should be aware that they must cross data sources from different origins in order to create a reliable picture of our preferences, otherwise we will be basing decisions on partial or biased information. How many times do we receive advertising for items we have already purchased, or tourist destinations where we have already been? And this is just one example of digital marketing. When scoring financial solvency, or monitoring health, the more complete the digital picture is of the person, the more accurate the diagnosis will be.

Furthermore, from the user’s standpoint, proper management of their entire, disperse digital footprint is a challenge. Perhaps centralized consent would be very beneficial. In the financial world, the PSD2 regulations have already forced banks to open this information to other banks if customers so desire. Fostering competition and facilitating portability is the purpose, but this opening up has also enabled the development of new services of information aggregation that are very useful to financial services users. It would be ideal if this step of breaking down barriers and moving toward a more transparent market took place simultaneously in all sectors in order to avoid possible distortions to competition and by extension, consumer harm. Therefore, customer consent would open the door to building a more accurate picture of our preferences.

2. The public and private sectors’ asymmetric capacity to gather data.This is related to citizens using public services less frequently than private services in the new digital channels. However, governments could benefit from the information possessed by private companies. These anonymous, aggregated data can help to ensure a more dynamic public management. Even personal data could open the door to customized education or healthcare on an individual level. In order to analyze all of this, the European Commissionhas created a working group including 23 experts. The purpose is to come up with a series of recommendations regarding the best legal, technical and economic framework to encourage this information transfer across sectors.

3. The lack of incentives for companies and citizens to encourage the reuse of their data.The reality today is that most companies solely use the sources internally. Only a few have decided to explore data sharing through different models (for academic research or for the development of commercial services). As a result of this and other factors, the public sector largely continues using the survey method to gather information instead of reading the digital footprint citizens produce. Multiple studies have demonstrated that this digital footprint would be useful to describe socioeconomic dynamics and monitor the evolution of official statistical indicators. However, these studies have rarely gone on to become pilot projects due to the lack of incentives for a private company to open up to the public sector, or to society in general, making this new activity sustainable.

4. Limited commitment to the diversification of services.Another barrier is the fact that information based product development is somewhat removed from the type of services that the main data generators (telecommunications, banks, commerce, electricity, transportation, etc.) traditionally provide. Therefore, these data based initiatives are not part of their main business and are more closely tied to companies’ innovation areas where exploratory proofs of concept are often not consolidated as a new line of business.

5. Bidirectionality. Data should also flow from the public sector to the rest of society. The first regulatory framework was created for this purpose. Although it is still very recent (the PSI Directive on the re-use of public sector data was passed in 2013), it is currently being revised, in attempt to foster the consolidation of an open data ecosystem that emanates from the public sector as well. On the one hand it would enable greater transparency, and on the other, the development of solutions to improve multiple fields in which public actors are key, such as the environment, transportation and mobility, health, education, justice and the planning and execution of public works. Special emphasis will be placed on high value data sets, such as statistical or geospatial data — data with tremendous potential to accelerate the emergence of a wide variety of information based data products and services that add value.The Commission will begin working with the Member States to identify these data sets.

In its report, Creating Data through Open Data, the European open data portal estimates that government agencies making their data accessible will inject an extra €65 billion in the EU economy this year.

6. The commitment to analytical training and financial incentives for innovation.They are the key factors that have given rise to the digital unicorns that have emerged, more so in the U.S. and China than in Europe….(More)”

Fearful of fake news blitz, U.S. Census enlists help of tech giants


Nick Brown at Reuters: “The U.S. Census Bureau has asked tech giants Google, Facebook and Twitter to help it fend off “fake news” campaigns it fears could disrupt the upcoming 2020 count, according to Census officials and multiple sources briefed on the matter.

The push, the details of which have not been previously reported, follows warnings from data and cybersecurity experts dating back to 2016 that right-wing groups and foreign actors may borrow the “fake news” playbook from the last presidential election to dissuade immigrants from participating in the decennial count, the officials and sources told Reuters.

The sources, who asked not to be named, said evidence included increasing chatter on platforms like “4chan” by domestic and foreign networks keen to undermine the survey. The census, they said, is a powerful target because it shapes U.S. election districts and the allocation of more than $800 billion a year in federal spending.

Ron Jarmin, the Deputy Director of the Census Bureau, confirmed the bureau was anticipating disinformation campaigns, and was enlisting the help of big tech companies to fend off the threat.

“We expect that (the census) will be a target for those sorts of efforts in 2020,” he said.

Census Bureau officials have held multiple meetings with tech companies since 2017 to discuss ways they could help, including as recently as last week, Jarmin said.

So far, the bureau has gotten initial commitments from Alphabet Inc’s Google, Twitter Inc and Facebook Inc to help quash disinformation campaigns online, according to documents summarizing some of those meetings reviewed by Reuters.

But neither Census nor the companies have said how advanced any of the efforts are….(More)”.

How the NYPD is using machine learning to spot crime patterns


Colin Wood at StateScoop: “Civilian analysts and officers within the New York City Police Department are using a unique computational tool to spot patterns in crime data that is closing cases.

A collection of machine-learning models, which the department calls Patternizr, was first deployed in December 2016, but the department only revealed the system last month when its developers published a research paper in the Informs Journal on Applied Analytics. Drawing on 10 years of historical data about burglary, robbery and grand larceny, the tool is the first of its kind to be used by law enforcement, the developers wrote.

The NYPD hired 100 civilian analysts in 2017 to use Patternizr. It’s also available to all officers through the department’s Domain Awareness System, a citywide network of sensors, databases, devices, software and other technical infrastructure. Researchers told StateScoop the tool has generated leads on several cases that traditionally would have stretched officers’ memories and traditional evidence-gathering abilities.

Connecting similar crimes into patterns is a crucial part of gathering evidence and eventually closing in on an arrest, said Evan Levine, the NYPD’s assistant commissioner of data analytics and one of Patternizr’s developers. Taken independently, each crime in a string of crimes may not yield enough evidence to identify a perpetrator, but the work of finding patterns is slow and each officer only has a limited amount of working knowledge surrounding an incident, he said.

“The goal here is to alleviate all that kind of busywork you might have to do to find hits on a pattern,” said Alex Chohlas-Wood, a Patternizr researcher and deputy director of the Computational Policy Lab at Stanford University.

The knowledge of individual officers is limited in scope by dint of the NYPD’s organizational structure. The department divides New York into 77 precincts, and a person who commits crimes across precincts, which often have arbitrary boundaries, is often more difficult to catch because individual beat officers are typically focused on a single neighborhood.

There’s also a lot of data to sift through. In 2016 alone, about 13,000 burglaries, 15,000 robberies and 44,000 grand larcenies were reported across the five boroughs.

Levine said that last month, police used Patternizr to spot a pattern of three knife-point robberies around a Bronx subway station. It would have taken police much longer to connect those crimes manually, Levine said.

The software works by an analyst feeding it “seed” case, which is then compared against a database of hundreds of thousands of crime records that Patternizr has already processed. The tool generates a “similarity score” and returns a rank-ordered list and a map. Analysts can read a few details of each complaint before examining the seed complaint and similar complaints in a detailed side-by-side view or filtering results….(More)”.

Big Data in the U.S. Consumer Price Index: Experiences & Plans


Paper by Crystal G. Konny, Brendan K. Williams, and David M. Friedman: “The Bureau of Labor Statistics (BLS) has generally relied on its own sample surveys to collect the price and expenditure information necessary to produce the Consumer Price Index (CPI). The burgeoning availability of big data has created a proliferation of information that could lead to methodological improvements and cost savings in the CPI. The BLS has undertaken several pilot projects in an attempt to supplement and/or replace its traditional field collection of price data with alternative sources. In addition to cost reductions, these projects have demonstrated the potential to expand sample size, reduce respondent burden, obtain transaction prices more consistently, and improve price index estimation by incorporating real-time expenditure information—a foundational component of price index theory that has not been practical until now. In CPI, we use the term alternative data to refer to any data not collected through traditional field collection procedures by CPI staff, including third party datasets, corporate data, and data collected through web scraping or retailer API’s. We review how the CPI program is adapting to work with alternative data, followed by discussion of the three main sources of alternative data under consideration by the CPI with a description of research and other steps taken to date for each source. We conclude with some words about future plans… (More)”.

New Data Tools Connect American Workers to Education and Job Opportunities


Department of Commerce: “These are the real stories of the people that recently participated in the Census Bureau initiative called The Opportunity Project—a novel, collaborative effort between government agencies, technology companies, and nongovernment organizations to translate government open data into user-friendly tools that solve real world problems for families, communities, and businesses nationwide.  On March 1, they came together to share their projects at The Opportunity Project’s Demo Day. Projects like theirs help veterans, aspiring technologists, and all Americans connect with the career and educational opportunities, like Bryan and Olivia did.

One barrier for many American students and workers is the lack of clear data to help match them with educational opportunities and jobs.  Students want information on the best courses that lead to high paying and high demand jobs. Job seekers want to find the jobs that best match their skills, or where to find new skills that open up career development opportunities.  Despite the increasing availability of big data and the long-standing, highly regarded federal statistical system, there remain significant data gaps about basic labor market questions.

  • What is the payoff of a bachelor’s degree versus an apprenticeship, 2-year degree, industry certification, or other credential?
  • What are the jobs of the future?  Which jobs of today also will be the jobs of the future? What skills and experience do companies value most?

The Opportunity Project brings government, communities, and companies like IBM, the veteran-led Shift.org, and Nepris together to create tools to answer simple questions related to education, employment, health, transportation, housing, and many other matters that are critical to helping Americans advance in their lives and careers….(More)”.

Nearly Half of Canadian Consumers Willing to Share Significant Personal Data with Banks and Insurers in Exchange for Lower Pricing, Accenture Study Finds


Press Release: “Nearly half of Canadian consumers would be willing to share significant personal information, such as location data and lifestyle information, with their bank and insurer in exchange for lower pricing on products and services, according to a new report from Accenture (NYSE: ACN).

Consumers willing to share personal data in select scenarios. (CNW Group/Accenture)
Consumers willing to share personal data in select scenarios. (CNW Group/Accenture)

Accenture’s global Financial Services Consumer Study, based on a survey of 47,000 consumers in 28 countries which included 2,000 Canadians, found that more than half of consumers would share that data for benefits including more-rapid loan approvals, discounts on gym memberships and personalized offers based on current location.

At the same time, however, Canadian consumers believe that privacy is paramount, with nearly three quarters (72 per cent) saying they are very cautious about the privacy of their personal data. In fact, data security breaches were the second-biggest concern for consumers, behind only increasing costs, when asked what would make them leave their bank or insurer.

“Canadian consumers are willing to sharing their personal data in instances where it makes their lives easier but remain cautious of exactly how their information is being used,” said Robert Vokes, managing director of financial services at Accenture in Canada. “With this in mind, banks and insurers need to deliver hyper-relevant and highly convenient experience in order to remain relevant, retain trust and win customer loyalty in a digital economy.”

Consumers globally showed strong support for personalized insurance premiums, with 64 per cent interested in receiving adjusted car insurance premiums based on safe driving and 52 per cent in exchange for life insurance premiums tied to a healthy lifestyle. Four in five consumers (79 per cent) would provide personal data, including income, location and lifestyle habits, to their insurer if they believe it would help reduce the possibility of injury or loss.

In banking, 81 per cent of consumers would be willing to share income, location and lifestyle habit data for rapid loan approval, and 76 per cent would do so to receive personalized offers based on their location, such as discounts from a retailer. Approximately two-fifths (42 per cent) of Canadian consumers specifically, want their bank to provide updates on how much money they have based on spending that month and 46 per cent want savings tips based on their spending habits.  

Appetite for data sharing differs around the world

Appetite for sharing significant personal data with financial firms was highest in China, with 67 per cent of consumers there willing to share more data for personalized services. Half (50 per cent) of consumers in the U.S. said they were willing to share more data for personalized services, and in Europe — where the General Data Protection Regulation took effect in May — consumers were more skeptical. For instance, only 40 per cent of consumers in both the U.K. and Germany said they would be willing to share more data with banks and insurers in return for personalized services…(More)”,