How can stakeholder engagement and mini-publics better inform the use of data for pandemic response?


Andrew Zahuranec, Andrew Young and Stefaan G. Verhulst at the OECD Participo Blog Series:

Image for post

“What does the public expect from data-driven responses to the COVID-19 pandemic? And under what conditions?” These are the motivating questions behind The Data Assembly, a recent initiative by The GovLab at New York University Tandon School of Engineering — an action research center that aims to help institutions work more openly, collaboratively, effectively, and legitimately.

Launched with support from The Henry Luce Foundation, The Data Assembly solicited diverse, actionable public input on data re-use for crisis response in the United States. In particular, we sought to engage the public on how to facilitate, if deemed acceptable, the use of data that was collected for a particular purpose for informing COVID-19. One additional objective was to inform the broader emergence of data collaboration— through formal and ad hoc arrangements between the public sector, civil society, and those in the private sector — by evaluating public expectation and concern with current institutional, contractual, and technical structures and instruments that may underpin these partnerships.

The Data Assembly used a new methodology that re-imagines how organisations can engage with society to better understand local expectations regarding data re-use and related issues. This work goes beyond soliciting input from just the “usual suspects”. Instead, data assemblies provide a forum for a much more diverse set of participants to share their insights and voice their concerns.

This article is informed by our experience piloting The Data Assembly in New York City in summer 2020. It provides an overview of The Data Assembly’s methodology and outcomes and describes major elements of the effort to support organisations working on similar issues in other cities, regions, and countries….(More)”.

How a Google Street View image of your house predicts your risk of a car accident


MIT Technology Review: “Google Street View has become a surprisingly useful way to learn about the world without stepping into it. People use it to plan journeys, to explore holiday destinations, and to virtually stalk friends and enemies alike.

But researchers have found more insidious uses. In 2017 a team of researchers used the images to study the distribution of car types in the US and then used that data to determine the demographic makeup of the country. It turns out that the car you drive is a surprisingly reliable proxy for your income level, your education, your occupation, and even the way you vote in elections.

Street view of houses in Poland

Now a different group has gone even further. Łukasz Kidziński at Stanford University in California and Kinga Kita-Wojciechowska at the University of Warsaw in Poland have used Street View images of people’s houses to determine how likely they are to be involved in a car accident. That’s valuable information that an insurance company could use to set premiums.

The result raises important questions about the way personal information can leak from seemingly innocent data sets and whether organizations should be able to use it for commercial purposes.

Insurance data

The researchers’ method is straightforward. They began with a data set of 20,000 records of people who had taken out car insurance in Poland between 2013 and 2015. These were randomly selected from the database of an undisclosed insurance company.

Each record included the address of the policyholder and the number of damage claims he or she made during the 2013–’15 period. The insurer also shared its own prediction of future claims, calculated using its state-of-the-art risk model that takes into account the policyholder’s zip code and the driver’s age, sex, claim history, and so on.

The question that Kidziński and Kita-Wojciechowska investigated is whether they could make a more accurate prediction using a Google Street View image of the policyholder’s house….(More)”.

The Role of the Private Sector in Protecting Civic Space


Chatham House Report by Bennett Freeman, Harriet Moynihan and Thiago Alves Pinto: “Robust civic space is essential for good governance, the rule of law and for enabling citizens to shape their societies. However, civil society space around the world is under significant pressure and the COVID-19 pandemic has exacerbated this situation.

The weakening of international institutions and democratic norms worldwide has resulted in fewer constraints on autocracies. Meanwhile, the rise of nationalism, populism and illiberalism is taking its toll on civil liberties.

The private sector is in a unique position to work with civil society organizations to uphold and defend civic freedoms and support sustainable and profitable business environments. Companies have the capacity, resources and expertise to enhance the protection of civic space….(More)”.

This is how we lost control of our faces


Karen Hao at MIT Technology Review: “In 1964, mathematician and computer scientist Woodrow Bledsoe first attempted the task of matching suspects’ faces to mugshots. He measured out the distances between different facial features in printed photographs and fed them into a computer program. His rudimentary successes would set off decades of research into teaching machines to recognize human faces.

Now a new study shows just how much this enterprise has eroded our privacy. It hasn’t just fueled an increasingly powerful tool of surveillance. The latest generation of deep-learning-based facial recognition has completely disrupted our norms of consent.

Deborah Raji, a fellow at nonprofit Mozilla, and Genevieve Fried, who advises members of the US Congress on algorithmic accountability, examined over 130 facial-recognition data sets compiled over 43 years. They found that researchers, driven by the exploding data requirements of deep learning, gradually abandoned asking for people’s consent. This has led more and more of people’s personal photos to be incorporated into systems of surveillance without their knowledge.

It has also led to far messier data sets: they may unintentionally include photos of minors, use racist and sexist labels, or have inconsistent quality and lighting. The trend could help explain the growing number of cases in which facial-recognition systems have failed with troubling consequences, such as the false arrests of two Black men in the Detroit area last year.

People were extremely cautious about collecting, documenting, and verifying face data in the early days, says Raji. “Now we don’t care anymore. All of that has been abandoned,” she says. “You just can’t keep track of a million faces. After a certain point, you can’t even pretend that you have control.”…(More)”.

Democratizing data in a 5G world


Blog by Dimitrios Dosis at Mastercard: “The next generation of mobile technology has arrived, and it’s more powerful than anything we’ve experienced before. 5G can move data faster, with little delay — in fact, with 5G, you could’ve downloaded a movie in the time you’ve read this far. 5G will also create a vast network of connected machines. The Internet of Things will finally deliver on its promise to fuse all our smart products — vehicles, appliances, personal devices — into a single streamlined ecosystem.

My smartwatch could monitor my blood pressure and schedule a doctor’s appointment, while my car could collect data on how I drive and how much gas I use while behind the wheel. In some cities, petrol trucks already act as roving gas stations, receiving pings when cars are low on gas and refueling them as needed, wherever they are.

This amounts to an incredible proliferation of data. By 2025, every connected person will conduct nearly 5,000 data interactions every day — one every 18 seconds — whether they know it or not. 

Enticing and convenient as new 5G-powered developments may be, it also raises complex questions about data. Namely, who is privy to our personal information? As your smart refrigerator records the foods you buy, will the refrigerator’s manufacturer be able to see your eating habits? Could it sell that information to a consumer food product company for market research without your knowledge? And where would the information go from there? 

People are already asking critical questions about data privacy. In fact, 72% of them say they are paying attention to how companies collect and use their data, according to a global survey released last year by the Harvard Business Review Analytic Services. The survey, sponsored by Mastercard, also found that while 60% of executives believed consumers think the value they get in exchange for sharing their data is worthwhile, only 44% of consumers actually felt that way.

There are many reasons for this data disconnect, including the lack of transparency that currently exists in data sharing and the tension between an individual’s need for privacy and his or her desire for personalization.

This paradox can be solved by putting data in the hands of the people who create it — giving consumers the ability to manage, control and share their own personal information when they want to, with whom they want to, and in a way that benefits them.

That’s the basis of Mastercard’s core set of principles regarding data responsibility – and in this 5G world, it’s more important than ever. We will be able to gain from these new technologies, but this change must come with trust and user control at its core. The data ecosystem needs to evolve from schemes dominated by third parties, where some data brokers collect inferred, often unreliable and inaccurate data, then share it without the consumer’s knowledge….(More)”.

Governance models for redistribution of data value


Essay by Maria Savona: “The growth of interest in personal data has been unprecedented. Issues of privacy violation, power abuse, practices of electoral behaviour manipulation unveiled in the Cambridge Analytica scandal, and a sense of imminent impingement of our democracies are at the forefront of policy debates. Yet, these concerns seem to overlook the issue of concentration of equity value (stemming from data value, which I use interchangeably here) that underpins the current structure of big tech business models. Whilst these quasi-monopolies own the digital infrastructure, they do not own the personal data that provide the raw material for data analytics. 

The European Commission has been at the forefront of global action to promote convergence of the governance of data (privacy), including, but not limited to, the General Data Protection Regulation (GDPR) (European Commission 2016), enforced in May 2018. Attempts to enforce similar regulations are emerging around the world, including the California Consumer Privacy Act, which came into effect on 1 January 2020. Notwithstanding greater awareness among citizens around the use of their data, companies find that complying with GDPR is, at best, a useless nuisance. 

Data have been seen as ‘innovation investment’ since the beginning of the 1990s. The first edition of the Oslo Manual, the OECD’s international guidelines for collecting and using data on innovation in firms, dates back to 19921 and included the collection of databases on employee best practices as innovation investments. Data are also measured as an ‘intangible asset’ (Corrado et al. 2009 was one of the pioneering studies). What has changed over the last decade? The scale of data generation today is such that its management and control might have already gone well beyond the capacity of the very tech giants we are all feeding. Concerns around data governance and data privacy might be too little and too late. 

In this column, I argue that economists have failed twice: first, to predict the massive concentration of data value in the hands of large platforms; and second, to account for the complexity of the political economy aspects of data accumulation. Based on a pair of recent papers (Savona 2019a, 2019b), I systematise recent research and propose a novel data rights approach to redistribute data value whilst not undermining the range of ethical, legal, and governance challenges that this poses….(More)”.

From satisficing to artificing: The evolution of administrative decision-making in the age of the algorithm


Paper by Thea Snow at Data & Policy: “Algorithmic decision tools (ADTs) are being introduced into public sector organizations to support more accurate and consistent decision-making. Whether they succeed turns, in large part, on how administrators use these tools. This is one of the first empirical studies to explore how ADTs are being used by Street Level Bureaucrats (SLBs). The author develops an original conceptual framework and uses in-depth interviews to explore whether SLBs are ignoring ADTs (algorithm aversion); deferring to ADTs (automation bias); or using ADTs together with their own judgment (an approach the author calls “artificing”). Interviews reveal that artificing is the most common use-type, followed by aversion, while deference is rare. Five conditions appear to influence how practitioners use ADTs: (a) understanding of the tool (b) perception of human judgment (c) seeing value in the tool (d) being offered opportunities to modify the tool (e) alignment of tool with expectations….(More)”.

Using “Big Data” to forecast migration


Blog Post by Jasper Tjaden, Andres Arau, Muertizha Nuermaimaiti, Imge Cetin, Eduardo Acostamadiedo, Marzia Rango: Act 1 — High Expectations

“Data is the new oil,” they say. ‘Big Data’ is even bigger than that. The “data revolution” will contribute to solving societies’ problems and help governments adopt better policies and run more effective programs. In the migration field, digital trace data are seen as a potentially powerful tool to improve migration management processes (visa applicationsasylum decision and geographic allocation of asylum seeker, facilitating integration, “smart borders” etc.).1

Forecasting migration is one particular area where big data seems to excite data nerds (like us) and policymakers alike. If there is one way big data has already made a difference, it is its ability to bring different actors together — data scientists, business people and policy makers — to sit through countless slides with numbers, tables and graphs. Traditional migration data sources, like censuses, administrative data and surveys, have never quite managed to generate the same level of excitement.

Many EU countries are currently heavily investing in new ways to forecast migration. Relatively large numbers of asylum seekers in 2014, 2015 and 2016 strained the capacity of many EU governments. Better forecasting tools are meant to help governments prepare in advance.

In a recent European Migration Network study, 10 out of the 22 EU governments surveyed said they make use of forecasting methods, many using open source data for “early warning and risk analysis” purposes. The 2020 European Migration Network conference was dedicated entirely to the theme of forecasting migration, hosting more than 15 expert presentations on the topic. The recently proposed EU Pact on Migration and Asylum outlines a “Migration Preparedness and Crisis Blueprint” which “should provide timely and adequate information in order to establish the updated migration situational awareness and provide for early warning/forecasting, as well as increase resilience to efficiently deal with any type of migration crisis.” (p. 4) The European Commission is currently finalizing a feasibility study on the use of artificial intelligence for predicting migration to the EU; Frontex — the EU Border Agency — is scaling up efforts to forecast irregular border crossings; EASO — the European Asylum Support Office — is devising a composite “push-factor index” and experimenting with forecasting asylum-related migration flows using machine learning and data at scale. In Fall 2020, during Germany’s EU Council Presidency, the German Interior Ministry organized a workshop series around Migration 4.0 highlighting the benefits of various ways to “digitalize” migration management. At the same time, the EU is investing substantial resources in migration forecasting research under its Horizon2020 programme, including QuantMigITFLOWS, and HumMingBird.

Is all this excitement warranted?

Yes, it is….(More)” See also: Big Data for Migration Alliance

These crowdsourced maps will show exactly where surveillance cameras are watching


Mark Sullivan at FastCompany: “Amnesty International is producing a map of all the places in New York City where surveillance cameras are scanning residents’ faces.

The project will enlist volunteers to use their smartphones to identify, photograph, and locate government-owned surveillance cameras capable of shooting video that could be matched against people’s faces in a database through AI-powered facial recognition.

The map that will eventually result is meant to give New Yorkers the power of information against an invasive technology the usage of which and purpose is often not fully disclosed to the public. It’s also meant to put pressure on the New York City Council to write and pass a law restricting or banning it. Other U.S. cities, such as Boston, Portland, and San Francisco, have already passed such laws.

Facial recognition technology can be developed by scraping millions of images from social media profiles and driver’s licenses without people’s consent, Amnesty says. Software from companies like Clearview AI can then use computer vision algorithms to match those images against facial images captured by closed-circuit television (CCTV) or other video surveillance cameras and stored in a database.

Starting in May, volunteers will be able to use a software tool to identify all the facial recognition cameras within their view—like at an intersection where numerous cameras can often be found. The tool, which runs on a phone’s browser, lets users place a square around any cameras they see. The software integrates Google Street View and Google Earth to help volunteers label and attach geolocation data to the cameras they spot.

The map is part of a larger campaign called “Ban the Scan” that’s meant to educate people around the world on the civil rights dangers of facial recognition. Research has shown that facial recognition systems aren’t as accurate when it comes to analyzing dark-skinned faces, putting Black people at risk of being misidentified. Even when accurate, the technology exacerbates systemic racism because it is disproportionately used to identify people of color, who are already subject to discrimination by law enforcement officials. The campaign is sponsored by Amnesty in partnership with a number of other tech advocacy, privacy, and civil liberties groups.

In the initial phase of the project, which was announced last Thursday, Amnesty and its partners launched a website that New Yorkers can use to generate public comments on the New York Police Department’s (NYPD’s) use of facial recognition….(More)”.

Inside India’s booming dark data economy


Snigdha Poonam and Samarath Bansal at the Rest of the World: “…The black market for data, as it exists online in India, resembles those for wholesale vegetables or smuggled goods. Customers are encouraged to buy in bulk, and the variety of what’s on offer is mind-boggling: There are databases about parents, cable customers, pregnant women, pizza eaters, mutual funds investors, and almost any niche group one can imagine. A typical database consists of a spreadsheet with row after row of names and key details: Sheila Gupta, 35, lives in Kolkata, runs a travel agency, and owns a BMW; Irfaan Khan, 52, lives in Greater Noida, and has a son who just applied to engineering college. The databases are usually updated every three months (the older one is, the less it is worth), and if you buy several at the same time, you’ll get a discount. Business is always brisk, and transactions are conducted quickly. No one will ask you for your name, let alone inquire why you want the phone numbers of five million people who have applied for bank loans.

There isn’t a reliable estimate of the size of India’s data economy or of how much money it generates annually. Regarding the former, each broker we spoke to had a different guess: One said only about one or two hundred professionals make up the top tier, another that every big Indian city has at least a thousand people trading data. To find them, potential customers need only look for their ads on social media or run searches with industry keywords and hashtags — “data,” “leads,” “database” — combined with detailed information about the kind of data they want and the city they want it from.

Privacy experts believe that the data-brokering industry has existed since the early days of the internet’s arrival in India. “Databases have been bought and sold in India for at least 15 years now. I remember a case from way back in 2006 of leaked employee data from Naukri.com (one of India’s first online job portals) being sold on CDs,” says Nikhil Pahwa, the editor and publisher of MediaNama, which covers technology policy. By 2009, data brokers were running SMS-marketing companies that offered complementary services: procuring targeted data and sending text messages in bulk. Back then, there was simply less data, “and those who had it could sell it at whatever price,” says Himanshu Bhatt, a data broker who claims to be retired. That is no longer the case: “Today, everyone has every kind of data,” he said.

No broker we contacted would openly discuss their methods of hunting, harvesting, and selling data. But the day-to-day work generally consists of following the trails that people leave during their travels around the internet. Brokers trawl data storage websites armed with a digital fishing net. “I was shocked when I was surfing [cloud-hosted data sites] one day and came across Aadhaar cards,” Bhatt remarked, referring to India’s state-issued biometric ID cards. Images of them were available to download in bulk, alongside completed loan applications and salary sheets.

Again, the legal boundaries here are far from clear. Anybody who has ever filled out a form on a coupon website or requested a refund for a movie ticket has effectively entered their information into a database that can be sold without their consent by the company it belongs to. A neighborhood cell phone store can sell demographic information to a political party for hyperlocal campaigning, and a fintech company can stealthily transfer an individual’s details from an astrology app onto its own server, to gauge that person’s creditworthiness. When somebody shares employment history on LinkedIn or contact details on a public directory, brokers can use basic software such as web scrapers to extract that data.

But why bother hacking into a database when you can buy it outright? More often, “brokers will directly approach a bank employee and tell them, ‘I need the high-end database’,” Bhatt said. And as demand for information increases, so, too, does data vulnerability. A 2019 survey found that 69% of Indian companies haven’t set up reliable data security systems; 44% have experienced at least one breach already. “In the past 12 months, we have seen an increasing trend of Indians’ data [appearing] on the dark web,” says Beenu Arora, the CEO of the global cyberintelligence firm Cyble….(More)”.