Big data needs big governance: best practices from Brain-CODE, the Ontario Brain Institute’s neuroinformatics platform


Shannon C. Lefaivre et al in Frontiers of Genetics: “The Ontario Brain Institute (OBI) has begun to catalyze scientific discovery in the field of neuroscience through its large-scale informatics platform, known as Brain-CODE. The platform supports the capture, storage, federation, sharing and analysis of different data types across several brain disorders. Underlying the platform is a robust and scalable data governance structure which allows for the flexibility to advance scientific understanding, while protecting the privacy of research participants.

Recognizing the value of an open science approach to enabling discovery, the governance structure was designed not only to support collaborative research programs, but also to support open science by making all data open and accessible in the future. OBI’s rigorous approach to data sharing maintains the accessibility of research data for big discoveries without compromising privacy and security. Taking a Privacy by Design approach to both data sharing and development of the platform has allowed OBI to establish some best practices related to large scale data sharing within Canada. The aim of this report is to highlight these best practices and develop a key open resource which may be referenced during the development of similar open science initiatives….(More)”.

Using Data Sharing Agreements as Tools of Indigenous Data Governance: Current Uses and Future Options


Paper by Martinez, A. and Rainie, S. C.: “Indigenous communities and scholars have been influencing a shift in participation and inclusion in academic and agency research over the past two decades. As a response, Indigenous peoples are increasingly asking research questions and developing their own studies rooted in their cultural values. They use the study results to rebuild their communities and to protect their lands. This process of Indigenous-driven research has led to partnering with academic institutions, establishing research review boards, and entering into data sharing agreements to protect environmental data, community information, and local and traditional knowledges.

Data sharing agreements provide insight into how Indigenous nations are addressing the key areas of data collection, ownership, application, storage, and the potential for data reuse in the future. By understanding this mainstream data governance mechanism, how they have been applied, and how they have been used in the past, we aim to describe how Indigenous nations and communities negotiate data protection and control with researchers.

The project described here reviewed publicly available data sharing agreements that focus on research with Indigenous nations and communities in the United States. We utilized qualitative analysis methods to identify specific areas of focus in the data sharing agreements, whether or not traditional or cultural values were included in the language of the data sharing agreements, and how the agreements defined data. The results detail how Indigenous peoples currently use data sharing agreements and potential areas of expansion for language to include in data sharing agreements as Indigenous peoples address the research needs of their communities and the protection of community and cultural data….(More)”.

Shutting down the internet doesn’t work – but governments keep doing it


George Ogola in The Conversation: “As the internet continues to gain considerable power and agency around the world, many governments have moved to regulate it. And where regulation fails, some states resort to internet shutdowns or deliberate disruptions.

The statistics are staggering. In India alone, there were 154 internet shutdowns between January 2016 and May 2018. This is the most of any country in the world.

But similar shutdowns are becoming common on the African continent. Already in 2019 there have been shutdowns in Cameroon, the Democratic Republic of Congo, Republic of Congo, Chad, Sudan and Zimbabwe. Last year there were 21 such shutdowns on the continent. This was the case in Togo, Sierra Leone, Sudan and Ethiopia, among others.

The justifications for such shutdowns are usually relatively predictable. Governments often claim that internet access is blocked in the interest of public security and order. In some instances, however, their reasoning borders on the curious if not downright absurd, like the case of Ethiopia in 2017 and Algeria in 2018 when the internet was shut down apparently to curb cheating in national examinations.

Whatever their reasons, governments have three general approaches to controlling citzens’ access to the web.

How they do it

Internet shutdowns or disruptions usually take three forms. The first and probably the most serious is where the state completely blocks access to the internet on all platforms. It’s arguably the most punitive, with significant socialeconomic and political costs.

The financial costs can run into millions of dollars for each day the internet is blocked. A Deloitte report on the issue estimates that a country with average connectivity could lose at least 1.9% of its daily GDP for each day all internet services are shut down.

For countries with average to medium level connectivity the loss is 1% of daily GDP, and for countries with average to low connectivity it’s 0.4%. It’s estimated that Ethiopia, for example, could lose up to US$500,000 a day whenever there is a shutdown. These shutdowns, then, damage businesses, discourage investments, and hinder economic growth.

The second way that governments restrict internet access is by applying content blocking techniques. They restrict access to particular sites or applications. This is the most common strategy and it’s usually targeted at social media platforms. The idea is to stop or limit conversations on these platforms.

Online spaces have become the platform for various forms of political expression that many states especially those with authoritarian leanings consider subversive. Governments argue, for example, that social media platforms encourage the spread of rumours which can trigger public unrest.

This was the case in 2016 in Uganda during the country’s presidential elections. The government restricted access to social media, describing the shutdown as a “security measure to avert lies … intended to incite violence and illegal declaration of election results”.

In Zimbabwe, the government blocked social media following demonstrations over an increase in fuel prices. It argued that the January 2019 ban was because the platforms were being “used to coordinate the violence”.

The third strategy, done almost by stealth, is the use of what is generally known as “bandwidth throttling”. In this case telecom operators or internet service providers are forced to lower the quality of their cell signals or internet speed. This makes the internet too slow to use. “Throttling” can also target particular online destinations such as social media sites….(More)”

Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice


Paper by Rashida Richardson, Jason Schultz, and Kate Crawford: “Law enforcement agencies are increasingly using algorithmic predictive policing systems to forecast criminal activity and allocate police resources. Yet in numerous jurisdictions, these systems are built on data produced within the context of flawed, racially fraught and sometimes unlawful practices (‘dirty policing’). This can include systemic data manipulation, falsifying police reports, unlawful use of force, planted evidence, and unconstitutional searches. These policing practices shape the environment and the methodology by which data is created, which leads to inaccuracies, skews, and forms of systemic bias embedded in the data (‘dirty data’). Predictive policing systems informed by such data cannot escape the legacy of unlawful or biased policing practices that they are built on. Nor do claims by predictive policing vendors that these systems provide greater objectivity, transparency, or accountability hold up. While some systems offer the ability to see the algorithms used and even occasionally access to the data itself, there is no evidence to suggest that vendors independently or adequately assess the impact that unlawful and bias policing practices have on their systems, or otherwise assess how broader societal biases may affect their systems.

In our research, we examine the implications of using dirty data with predictive policing, and look at jurisdictions that (1) have utilized predictive policing systems and (2) have done so while under government commission investigations or federal court monitored settlements, consent decrees, or memoranda of agreement stemming from corrupt, racially biased, or otherwise illegal policing practices. In particular, we examine the link between unlawful and biased police practices and the data used to train or implement these systems across thirteen case studies. We highlight three of these: (1) Chicago, an example of where dirty data was ingested directly into the city’s predictive system; (2) New Orleans, an example where the extensive evidence of dirty policing practices suggests an extremely high risk that dirty data was or will be used in any predictive policing application, and (3) Maricopa County where despite extensive evidence of dirty policing practices, lack of transparency and public accountability surrounding predictive policing inhibits the public from assessing the risks of dirty data within such systems. The implications of these findings have widespread ramifications for predictive policing writ large. Deploying predictive policing systems in jurisdictions with extensive histories of unlawful police practices presents elevated risks that dirty data will lead to flawed, biased, and unlawful predictions which in turn risk perpetuating additional harm via feedback loops throughout the criminal justice system. Thus, for any jurisdiction where police have been found to engage in such practices, the use of predictive policing in any context must be treated with skepticism and mechanisms for the public to examine and reject such systems are imperative….(More)”.

Fact-Based Policy: How Do State and Local Governments Accomplish It?


Report and Proposal by Justine Hastings: “Fact-based policy is essential to making government more effective and more efficient, and many states could benefit from more extensive use of data and evidence when making policy. Private companies have taken advantage of declining computing costs and vast data resources to solve problems in a fact-based way, but state and local governments have not made as much progress….

Drawing on her experience in Rhode Island, Hastings proposes that states build secure, comprehensive, integrated databases, and that they transform those databases into data lakes that are optimized for developing insights. Policymakers can then use the insights from this work to sharpen policy goals, create policy solutions, and measure progress against those goals. Policymakers, computer scientists, engineers, and economists will work together to build the data lake and analyze the data to generate policy insights….(More)”.

Saying yes to State Longitudinal Data Systems: building and maintaining cross agency relationships


Report by the National Skills Coalition: “In order to provide actionable information to stakeholders, state longitudinal data systems use administrative data that state agencies collect through administering programs. Thus, state longitudinal data systems must maintain strong working relationships with the state agencies collecting necessary administrative data. These state agencies can include K-12 and higher education agencies, workforce agencies, and those administering social service programs such as the Supplemental Nutrition Assistance Program or Temporary Assistance for Needy Families.

When state longitudinal data systems have strong relationships with agencies, agencies willingly and promptly share their data with the system, engage with data governance when needed, approve research requests in a timely manner, and continue to cooperate with the system over the long term. If state agencies do not participate with their state’s longitudinal data system, the work of the system is put into jeopardy. States may find that research and performance reporting can be stalled or stopped outright.

Kentucky and Virginia have been able to build and maintain support for their systems among state agencies. Their example demonstrates how states can effectively utilize their state longitudinal data systems….(More)”.

Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

Distributed, privacy-enhancing technologies in the 2017 Catalan referendum on independence: New tactics and models of participatory democracy


M. Poblet at First Monday: “This paper examines new civic engagement practices unfolding during the 2017 referendum on independence in Catalonia. These practices constitute one of the first signs of some emerging trends in the use of the Internet for civic and political action: the adoption of horizontal, distributed, and privacy-enhancing technologies that rely on P2P networks and advanced cryptographic tools. In this regard, the case of the 2017 Catalan referendum, framed within conflicting political dynamics, can be considered a first-of-its kind in participatory democracy. The case also offers an opportunity to reflect on an interesting paradox that twenty-first century activism will face: the more it will rely on private-friendly, secured, and encrypted networks, the more open, inclusive, ethical, and transparent it will need to be….(More)”.

Waze-fed AI platform helps Las Vegas cut car crashes by almost 20%


Liam Tung at ZDNet: “An AI-led, road-safety pilot program between analytics firm Waycare and Nevada transportation agencies has helped reduce crashes along the busy I-15 in Las Vegas.

The Silicon Valley Waycare system uses data from connected cars, road cameras and apps like Waze to build an overview of a city’s roads and then shares that data with local authorities to improve road safety.

Waycare struck a deal with Google-owned Waze earlier this year to “enable cities to communicate back with drivers and warn of dangerous roads, hazards, and incidents ahead”. Waze’s crowdsourced data also feeds into Waycare’s traffic management system, offering more data for cities to manage traffic.

Waycare has now wrapped up a year-long pilot with the Regional Transportation Commission of Southern Nevada (RTC), Nevada Highway Patrol (NHP), and the Nevada Department of Transportation (NDOT).

RTC reports that Waycare helped the city reduce the number of primary crashes by 17 percent along the Interstate 15 Las Vegas.

Waycare’s data, as well as its predictive analytics, gave the city’s safety and traffic management agencies the ability to take preventative measures in high risk areas….(More)”.

Using Data to Raise the Voices of Working Americans


Ida Rademacher at the Aspen Institute: “…At the Aspen Institute Financial Security Program, we sense a growing need to ground these numbers in what people experience day-to-day. We’re inspired by projects like the Financial Diaries that helped create empathy for what the statistics mean. …the Diaries was a time-delimited project, and the insights we can gain from major banking institutions are somewhat limited in their ability to show the challenges of economically marginalized populations. That’s why we’ve recently launched a consumer insights initiative to develop and translate a more broadly sourced set of data that lifts the curtain on the financial lives of low- and moderate-income US consumers. What does it really mean to lack $400 when you need it? How do people cope? What are the aspirations and anxieties that fuel choices? Which strategies work and which fall flat? Our work exists to focus the dialogue about financial insecurity by keeping an ear to the ground and amplifying what we hear. Our ultimate goal: Inspire new solutions that react to reality, ones that can genuinely improve the financial well-being of many.

Our consumer insights initiative sees power in partnerships and collaboration. We’re building a big tent for a range of actors to query and share what their data says: private sector companies, public programs, and others who see unique angles into the financial lives of low- and moderate-income households. We are creating a new forum to lift up these firms serving consumers – and in doing so, we’re raising the voices of consumers themselves.

One example of this work is our Consumer Insights Collaborative (CIC), a group of nine leading non-profits from across the country. Each has a strong sense of challenges and opportunities on the ground because every day their work brings them face-to-face with a wide array of consumers, many of whom are low- and moderate-income families. And most already work independently to learn from their data. Take EARN and its Big Data on Small Savings project; the Financial Clinic’s insights series called Change Matters; Mission Asset Fund’s R&D Lab focused on human-centered design; and FII which uses data collection as part of its main service.

Through the CIC, they join forces to see more than any one nonprofit can on their own. Together CIC members articulate common questions and synthesize collective answers. In the coming months we will publish a first-of-its-kind report on a jointly posed question: What are the dimensions and drivers of short term financial stability?

An added bonus of partnerships like the CIC is the community of practice that naturally emerges. We believe that data scientists from all walks can, and indeed must, learn from each other to have the greatest impact. Our initiative especially encourages cooperative capacity-building around data security and privacy. We acknowledge that as access to information grows, so does the risk to consumers themselves. We endorse collaborative projects that value ethics, respect, and integrity as much as they value cross-organizational learning.

As our portfolio grows, we will invite an even broader network to engage. We’re already working with NEST Insights to draw on NEST’s extensive administrative data on retirement savings, with an aim to understand more about the long-term implications of non-traditional work and unstable household balance sheets on financial security….(More)”.