Replicating the Justice Data Lab in the USA: Key Considerations


Blog by Tracey Gyateng and Tris Lumley: “Since 2011, NPC has researched, supported and advocated for the development of impact-focussed Data Labs in the UK. The goal has been to unlock government administrative data so that organisations (primarily nonprofits) who provide a social service can understand the impact of their services on the people who use them.

So far, one of these Data Labs has been developed to measure re-offending outcomes- the Justice Data Lab-, and others are currently being piloted for employment and education. Given our seven years of work in this area, we at NPC have decided to reflect on the key factors needed to create a Data Lab with our report: How to Create an Impact Data Lab. This blog outlines these factors, examines whether they are present in the USA, and asks what the next steps should be — drawing on the research undertaken with the Governance Lab….Below we examine the key factors and to what extent they appear to be present within the USA.

Environment: A broad culture that supports impact measurement. Similar to the UK, nonprofits in the USA are increasingly measuring the impact they have had on the participants of their service and sharing the difficulties of undertaking robust, high quality evaluations.

Data: Individual person-level administrative data. A key difference between the two countries is that, in the USA, personal data on social services tends to be held at a local, rather than central level. In the UK social services data such as reoffending, education and employment are collated into a central database. In the USA, the federal government has limited centrally collated personal data, instead this data can be found at state/city level….

A leading advocate: A Data Lab project team, and strong networks. Data Labs do not manifest by themselves. They requires a lead agency to campaign with, and on behalf of, nonprofits to set out a persuasive case for their development. In the USA, we have developed a partnership with the Governance Lab to seek out opportunities where Data Labs can be established but given the size of the country, there is scope for further collaborations/ and or advocates to be identified and supported.

Customers: Identifiable organisations that would use the Data Lab. Initial discussions with several US nonprofits and academia indicate support for a Data Lab in their context. Broad consultation based on an agreed region and outcome(s) will be needed to fully assess the potential customer base.

Data owners: Engaged civil servants. Generating buy-in and persuading various stakeholders including data owners, analysts and politicians is a critical part of setting up a data lab. While the exact profiles of the right people to approach can only be assessed once a region and outcome(s) of interest have been chosen, there are encouraging signs, such as the passing of the Foundations for Evidence-Based Policy Making Act of 2017 in the house of representatives which, among other things, mandates the appointment of “Chief Evaluation Officers” in government departments- suggesting that there is bipartisan support for increased data-driven policy evaluation.

Legal and ethical governance: A legal framework for sharing data. In the UK, all personal data is subject to data protection legislation, which provides standardised governance for how personal data can be processed across the country and within the European Union. A universal data protection framework does not exist within the USA, therefore data sharing agreements between customers and government data-owners will need to be designed for the purposes of Data Labs, unless there are existing agreements that enable data sharing for research purposes. This will need to be investigated at the state/city level of a desired Data Lab.

Funding: Resource and support for driving the set-up of the Data Lab. Most of our policy lab case studies were funded by a mixture of philanthropy and government grants. It is expected that a similar mixed funding model will need to be created to establish Data Labs. One alternative is the model adopted by the Washington State Institute for Public Policy (WSIPP), which was created by the Washington State Legislature and is funded on a project basis, primarily by the state. Additionally funding will be needed to enable advocates of a Data Lab to campaign for the service….(More)”.

Lessons from Cambridge Analytica: one way to protect your data


Julia Apostle in the Financial Times: “The unsettling revelations about how data firm Cambridge Analytica surreptitiously exploited the personal information of Facebook users is yet another demoralising reminder of how much data has been amassed about us, and of how little control we have over it.

Unfortunately, the General Data Protection Regulation privacy laws that are coming into force across Europe — with more demanding consent, transparency and accountability requirements, backed by huge fines — may improve practices, but they will not change the governing paradigm: the law labels those who gather our data as “controllers”. We are merely “subjects”.

But if the past 20 years have taught us anything, it is that when business and legislators have been too slow to adapt to public demand — for goods and services that we did not even know we needed, such as Amazon, Uber and bitcoin — computer scientists have stepped in to fill the void. And so it appears that the realms of data privacy and security are deserving of some disruption. This might come in the form of “self-sovereign identity” systems.

The theory behind self-sovereign identity is that individuals should control the data elements that form the basis of their digital identities, and not centralised authorities such as governments and private companies. In the current online environment, we all have multiple log-ins, usernames, customer IDs and personal data spread across countless platforms and stored in myriad repositories.

Instead of this scattered approach, we should each possess the digital equivalent of a wallet that contains verified pieces of our identities. We can then choose which identification to share, with whom, and when. Self-sovereign identity systems are currently being developed.

They involve the creation of a unique and persistent identifier attributed to an individual (called a decentralised identity), which cannot be taken away. The systems use public/private key cryptography, which enables a user with a private key (a string of numbers) to share information with unlimited recipients who can access the encrypted data if they possess a corresponding public key.

The systems also rely on decentralised ledger applications like blockchain. While key cryptography has been around for a long time, it is the development of decentralised ledger technology, which also supports the trading of cryptocurrencies without the involvement of intermediaries, that will allow self-sovereign identity systems to take off. The potential uses for decentralised identity are legion and small-scale implementation is already happening. The Swiss municipality of Zug started using a decentralised identity system called uPort last year, to allow residents access to certain government services. The municipality announced it will also use the system for voting this spring….

Decentralised identity is more difficult to access and therefore there is less financial incentive for hackers to try. Self-sovereign identity systems could eliminate many of our data privacy concerns while empowering individuals in the online world and turning the established data order on its head. But the success of the technology depends on its widespread adoption….(More)

Launching the Data Culture Project


New project by MIT Center for Civic Media and the Engagement Lab@Emerson College: “Learning to work with data is like learning a new language — immersing yourself in the culture is the best way to do it. For some individuals, this means jumping into tools like Excel, Tableau, programming, or R Studio. But what does this mean for a group of people that work together? We often talk about data literacy as if it’s an individual capacity, but what about data literacy for a community? How does an organization learn how to work with data?

About a year ago we (Rahul Bhargava and Catherine D’Ignazio) found that more and more users of our DataBasic.io suite of tools and activities were asking this question — online and in workshops. In response, with support from the Stanford Center on Philanthropy and Civil Society, we’ve worked together with 25 organizations to create the Data Culture Project. We’re happy to launch it publicly today! Visit datacultureproject.org to learn more.

The Data Culture Project is a hands-on learning program to kickstart a data culture within your organization. We provide facilitation videos to help you run creative introductions to get people across your organization talking to each other — from IT to marketing to programs to evaluation. These are not boring spreadsheet trainings! Try running our fun activities — one per month works as a brown bag lunch to focus people on a common learning goal. For example, “Sketch a Story” brings people together around basic concepts of quantitative text analysis and visual storytelling. “Asking Good Questions” introduces principles of exploratory data analysis in a fun environment. What’s more, you can use the sample data that we provide, or you can integrate your organization’s data as the topic of conversation and learning….(More)”.

Regulatory sandbox lessons learned report


Financial Conduct Authority (UK): “The sandbox allows firms to test innovative products, services or business models in a live market environment, while ensuring that appropriate protections are in place. It was established to support the FCA’s objective of promoting effective competition in the interests of consumers and opened for applications in June 2016.

The sandbox has supported 50 firms from 146 applications received across the first two cohorts. This report sets out the sandbox’s overall impact on the market including the adoption of new technologies, increasing access and improving experiences for vulnerable consumers as well as lessons learnt from individual tests that have been, or are being, conducted as part of the sandbox.

Early indications suggest the sandbox is providing the benefits it set out to achieve with evidence of the sandbox enabling new products to be tested, reducing time and cost of getting innovative ideas to market, improving access to finance for innovators, and ensuring appropriate safeguards are built into new products and services.

We will be using these learnings to inform any future sandbox developments as well as our ongoing policymaking and supervision work….(More)”.

How Universities Are Tackling Society’s Grand Challenges


Michelle Popowitz and Cristin Dorgelo in Scientific American: “…Universities embarking on Grand Challenge efforts are traversing new terrain—they are making commitments about research deliverables rather than simply committing to invest in efforts related to a particular subject. To mitigate risk, the universities that have entered this space are informally consulting with others regarding effective strategies, but the entire community would benefit from a more formal structure for identifying and sharing “what works.” To address this need, the new Community of Practice for University-Led Grand Challenges—launched at the October 2017 workshop—aims to provide peer support to leaders of university Grand Challenge programs, and to accelerate the adoption of Grand Challenge approaches at more universities supported by cross-sector partnerships.

The university community has identified extensive opportunities for collaboration on these Grand Challenge programs with other sectors:

  • Philanthropy can support the development of new Grand Challenge programs at more universities by establishing planning and administration grant programs, convening experts, and providing funding support for documenting these models through white papers and other publications and for evaluation of these programs over time.
  • Relevant associations and professional development organizations can host learning sessions about Grand Challenges for university leaders and professionals.
  • Companies can collaborate with universities on Grand Challenges research, act as sponsors and hosts for university-led programs and activities, and offer leaders, experts, and other personnel for volunteer advisory roles and tours of duties at universities.
  • Federal, State, and local governments and elected officials can provide support for collaboration among government agencies and offices and the research community on Grand Challenges.

Today’s global society faces pressing, complex challenges across many domains—including health, environment, and social justice. Science (including social sciences), technology, the arts, and humanities have critical roles to play in addressing these challenges and building a bright and prosperous future. Universities are hubs for discovery, building new knowledge, and changing understanding of the world. The public values the role universities play in education; yet as a sector, universities are less effective at highlighting their roles as the catalysts of new industries, homes for the fundamental science that leads to new treatments and products, or sources of the evidence on which policy decisions should be made.

By coming together as universities, collaborating with partners, and aiming for ambitious goals to address problems that might seem unsolvable, universities can show commitment to their communities and become beacons of hope….(More)”.

World’s biggest city database shines light on our increasingly urbanised planet


EU Joint Research Centers: “The JRC has launched a new tool with data on all 10,000 urban centres scattered across the globe. It is the largest and most comprehensive database on cities ever published.

With data derived from the JRC’s Global Human Settlement Layer (GHSL), researchers have discovered that the world has become even more urbanised than previously thought.

Populations in urban areas doubled in Africa and grew by 1.1 billion in Asia between 1990 and 2015.

Globally, more than 400 cities have a population between 1 and 5 million. More than 40 cities have 5 to 10 million people, and there are 32 ‘megacities’ with above 10 million inhabitants.

There are some promising signs for the environment: Cities became 25% greener between 2000 and 2015. And although air pollution in urban centres was increasing from 1990, between 2000 and 2015 the trend was reversed.

With every high density area of at least 50,000 inhabitants covered, the city centres database shows growth in population and built-up areas over the past 40 years.  Environmental factors tracked include:

  • ‘Greenness’: the estimated amount of healthy vegetation in the city centre
  • Soil sealing: the covering of the soil surface with materials like concrete and stone, as a result of new buildings, roads and other public and private spaces
  • Air pollution: the level of polluting particles such as PM2.5 in the air
  • Vicinity to protected areas: the percentage of natural protected space within 30 km distance from the city centre’s border
  • Disaster risk-related exposure of population and buildings in low lying areas and on steep slopes.

The data is free to access and open to everyone. It applies big data analytics and a global, people-based definition of cities, providing support to monitor global urbanisation and the 2030 Sustainable Development Agenda.

The information gained from the GHSL is used to map out population density and settlement maps. Satellite, census and local geographic information are used to create the maps….(More)”.

Republics of Makers: From the Digital Commons to a Flat Marginal Cost Society


Mario Carpo at eFlux: “…as the costs of electronic computation have been steadily decreasing for the last forty years at least, many have recently come to the conclusion that, for most practical purposes, the cost of computation is asymptotically tending to zero. Indeed, the current notion of Big Data is based on the assumption that an almost unlimited amount of digital data will soon be available at almost no cost, and similar premises have further fueled the expectation of a forthcoming “zero marginal costs society”: a society where, except for some upfront and overhead costs (the costs of building and maintaining some facilities), many goods and services will be free for all. And indeed, against all odds, an almost zero marginal cost society is already a reality in the case of many services based on the production and delivery of electricity: from the recording, transmission, and processing of electrically encoded digital information (bits) to the production and consumption of electrical power itself. Using renewable energies (solar, wind, hydro) the generation of electrical power is free, except for the cost of building and maintaining installations and infrastructure. And given the recent progress in the micro-management of intelligent electrical grids, it is easy to imagine that in the near future the cost of servicing a network of very small, local hydro-electric generators, for example, could easily be devolved to local communities of prosumers who would take care of those installations as their tend to their living environment, on an almost voluntary, communal basis.4 This was already often the case during the early stages of electrification, before the rise of AC (alternate current, which, unlike DC, or direct current, could be carried over long distances): AC became the industry’s choice only after Galileo Ferraris’s and Nikola Tesla’s developments in AC technologies in the 1880s.

Likewise, at the micro-scale of the electronic production and processing of bits and bytes of information, the Open Source movement and the phenomenal surge of some crowdsourced digital media (including some so-called social media) in the first decade of the twenty-first century has already proven that a collaborative, zero cost business model can effectively compete with products priced for profit on a traditional marketplace. As the success of Wikipedia, Linux, or Firefox proves, many are happy to volunteer their time and labor for free when all can profit from the collective work of an entire community without having to pay for it. This is now technically possible precisely because the fixed costs of building, maintaining, and delivering these service are very small; hence, from the point of view of the end-user, negligible.

Yet, regardless of the fixed costs of the infrastructure, content—even user-generated content—has costs, albeit for the time being these are mostly hidden, voluntarily born, or inadvertently absorbed by the prosumers themselves. For example, the wisdom of Wikipedia is not really a wisdom of crowds: most Wikipedia entries are de facto curated by fairly traditional scholar communities, and these communities can contribute their expertise for free only because their work has already been paid for by others—often by universities. In this sense, Wikipedia is only piggybacking on someone else’s research investments (but multiplying their outreach, which is one reason for its success). Ditto for most Open Source software, as training a software engineer, coder, or hacker, takes time and money—an investment for future returns that in many countries around the world is still born, at least in part, by public institutions….(More)”.

Mobile Devices as Stigmatizing Security Sensors: The GDPR and a Future of Crowdsourced ‘Broken Windows’


Paper by Oskar Josef Gstrein and Gerard Jan Ritsema van Eck: “Various smartphone apps and services are available which encourage users to report where and when they feel they are in an unsafe or threatening environment. This user generated content may be used to build datasets, which can show areas that are considered ‘bad,’ and to map out ‘safe’ routes through such neighbourhoods.

Despite certain advantages, this data inherently carries the danger that streets or neighbourhoods become stigmatized and already existing prejudices might be reinforced. Such stigmas might also result in negative consequences for property values and businesses, causing irreversible damage to certain parts of a municipality. Overcoming such an “evidence-based stigma” — even if based on biased, unreviewed, outdated, or inaccurate data — becomes nearly impossible and raises the question how such data should be managed….(More)”.

Eight great applications of simulation in the policymaking process


Florence Engasser and Sonia Nasser at Nesta: “In a context where complexity and unpredictability increasingly form part of the decision-making process, our policymakers need new tools to help them experiment, explore different scenarios and weigh the trade-offs of a decision in a safe, pressure-free environment.

Simulation brings the potential for more creative, efficient and effective policymaking

The best way to understand how simulation can be used as a policy method is to look at examples. We’ve found eight really great examples from around the world, giving us a sense of the broad range of applications simulation can have in the policymaking process, from board games through to more traditional modelling techniques applied to new fields, and all the way to virtual reality….(More)”.

Open Data Risk Assessment


Report by the Future of Privacy Forum: “The transparency goals of the open data movement serve important social, economic, and democratic functions in cities like Seattle. At the same time, some municipal datasets about the city and its citizens’ activities carry inherent risks to individual privacy when shared publicly. In 2016, the City of Seattle declared in its Open Data Policy that the city’s data would be “open by preference,” except when doing so may affect individual privacy. To ensure its Open Data Program effectively protects individuals, Seattle committed to performing an annual risk assessment and tasked the Future of Privacy Forum (FPF) with creating and deploying an initial privacy risk assessment methodology for open data.

This Report provides tools and guidance to the City of Seattle and other municipalities navigating the complex policy, operational, technical, organizational, and ethical standards that support privacyprotective open data programs. Although there is a growing body of research regarding open data privacy, open data managers and departmental data owners need to be able to employ a standardized methodology for assessing the privacy risks and benefits of particular datasets internally, without access to a bevy of expert statisticians, privacy lawyers, or philosophers. By optimizing its internal processes and procedures, developing and investing in advanced statistical disclosure control strategies, and following a flexible, risk-based assessment process, the City of Seattle – and other municipalities – can build mature open data programs that maximize the utility and openness of civic data while minimizing privacy risks to individuals and addressing community concerns about ethical challenges, fairness, and equity.

This Report first describes inherent privacy risks in an open data landscape, with an emphasis on potential harms related to re-identification, data quality, and fairness. To address these risks, the Report includes a Model Open Data Benefit-Risk Analysis (“Model Analysis”). The Model Analysis evaluates the types of data contained in a proposed open dataset, the potential benefits – and concomitant risks – of releasing the dataset publicly, and strategies for effective de-identification and risk mitigation. This holistic assessment guides city officials to determine whether to release the dataset openly, in a limited access environment, or to withhold it from publication (absent countervailing public policy considerations). …(More)”.