data collaboratives

The potential of Data Collaboratives for COVID19

Curated on April 2, 2020April 2, 2020 by Stefaan Verhulst

Blog post by Stefaan Verhulst: “We live in almost unimaginable times. The spread of COVID-19 is a human tragedy and global crisis that will impact our communities for many years to come. The social and economic costs are huge and mounting, and they are already contributing to a global slowdown. Every day, the emerging pandemic reveals new vulnerabilities in various aspects of our economic, political and social lives. These include our vastly overstretched public health services, our dysfunctional political climate, and our fragile global supply chains and financial markets.

The unfolding crisis is also making shortcomings clear in another area: the way we re-use data responsibly. Although this aspect of the crisis has been less remarked upon than other, more obvious failures, those who work with data—and who have seen its potential to impact the public good—understand that we have failed to create the necessary governance and institutional structures that would allow us to harness data responsibly to halt or at least limit this pandemic. A recent article in Stat, an online journal dedicated to health news, characterized the COVID-19 outbreak as “a once-in-a-century evidence fiasco.” The article continues:

“At a time when everyone needs better information, […] we lack reliable evidence on how many people have been infected with SARS-CoV-2 or who continue to become infected. Better information is needed to guide decisions and actions of monumental significance and to monitor their impact.”

It doesn’t have to be this way, and these data challenges are not an excuse for inaction. As we explain in what follows, there is ample evidence that the re-use of data can help mitigate health pandemics. A robust (if somewhat unsystematized) body of knowledge could direct policymakers and others in their efforts. In the second part of this article, we outline eight steps that key stakeholders can and should take to better re-use data in the fight against COVID-19. In particular, we argue that more responsible data stewardship and increased use of data collaboratives are critical….(More)”.

Mobile phone data and COVID-19: Missing an opportunity?

Curated on April 2, 2020April 2, 2020 by Stefaan Verhulst

Paper by Nuria Oliver, et al: “This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic….(More)”.

Why isn’t the government publishing more data about coronavirus deaths?

Curated on April 2, 2020April 2, 2020 by Stefaan Verhulst

Article by Jeni Tennison: “Studying the past is futile in an unprecedented crisis. Science is the answer – and open-source information is paramount…Data is a necessary ingredient in day-to-day decision-making – but in this rapidly evolving situation, it’s especially vital. Everything has changed, almost overnight. Demands for food, transport, and energy have been overhauled as more people stop travelling and work from home. Jobs have been lost in some sectors, and workers are desperately needed in others. Historic experience can no longer tell us how our society or economy is working. Past models hold little predictive power in an unprecedented situation. To know what is happening right now, we need up-to-date information….

This data is also crucial for scientists, who can use it to replicate and build upon each other’s work. Yet no open data has been published alongside the evidence for the UK government’s coronavirus response. While a model that informed the US government’s response is freely available as a Google spreadsheet, the Imperial College London model that prompted the current lockdown has still not been published as open-source code. Making data open – publishing it on the web, in spreadsheets, without restrictions on access – is the best way to ensure it can be used by the people who need it most.

There is currently no open data available on UK hospitalisation rates; no regional, age or gender breakdown of daily deaths. The more granular breakdown of registered deaths provided by the Office of National Statistics is only published on a weekly basis, and with a delay. It is hard to tell whether this data does not exist or the NHS has prioritised creating dashboards for government decision makers rather than informing the rest of the country. But the UK is making progress with regard to data: potential Covid-19 cases identified through online and call-centre triage are now being published daily by NHS Digital.

Of course, not all data should be open. Singapore has been publishing detailed data about every infected person, including their age, gender, workplace, where they have visited and whether they had contact with other infected people. This can both harm the people who are documented and incentivise others to lie to authorities, undermining the quality of data.

When people are concerned about how data about them is handled, they demand transparency. To retain our trust, governments need to be open about how data is collected and used, how it’s being shared, with whom, and for what purpose. Openness about the use of personal data to help tackle the Covid-19 crisis will become more pressing as governments seek to develop contact tracing apps and immunity passports….(More)”.

Urgently Needed for Policy Guidance: An Operational Tool for Monitoring the COVID-19 Pandemic

Curated on April 2, 2020April 2, 2020 by Stefaan Verhulst

Paper by Stephane Luchini et al:” The radical uncertainty around the current COVID19 pandemics requires that governments around the world should be able to track in real time not only how the virus spreads but, most importantly, what policies are effective in keeping the spread of the disease under check. To improve the quality of health decision-making, we argue that it is necessary to monitor and compare acceleration/deceleration of confirmed cases over health policy responses, across countries. To do so, we provide a simple mathematical tool to estimate the convexity/concavity of trends in epidemiological surveillance data. Had it been applied at the onset of the crisis, it would have offered more opportunities to measure the impact of the policies undertaken in different Asian countries, and to allow European and North-American governments to draw quicker lessons from these Asian experiences when making policy decisions. Our tool can be especially useful as the epidemic is currently extending to lower-income African and South American countries, some of which have weaker health systems….(More)”.

Human migration: the big data perspective

Curated on March 28, 2020March 28, 2020 by Stefaan Verhulst

Alina Sîrbu et al at the International Journal of Data Science and Analytics: “How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants….(More)”.

Why we need responsible data for children

Curated on March 26, 2020March 26, 2020 by Stefaan Verhulst

Andrew Young and Stefaan Verhulst at The Conversation: “…Without question, the increased use of data poses unique risks for and responsibilities to children. While practitioners may have well-intended purposes to leverage data for and about children, the data systems used are often designed with (consenting) adults in mind without a focus on the unique needs and vulnerabilities of children. This can lead to the collection of inaccurate and unreliable data as well as the inappropriate and potentially harmful use of data for and about children….

Research undertaken in the context of the RD4C initiative uncovered the following trends and realities. These issues make clear why we need a dedicated data responsibility approach for children.

Today’s children are the first generation growing up at a time of rapid datafication where almost all aspects of their lives, both on and off-line, are turned into data points. An entire generation of young people is being datafied – often starting even before birth. Every year the average child will have more data collected about them in their lifetime than would a similar child born any year prior. The potential uses of such large volumes of data and the impact on children’s lives are unpredictable, and could potentially be used against them.
Children typically do not have full agency to make decisions about their participation in programs or services which may generate and record personal data. Children may also lack the understanding to assess a decision’s purported risks and benefits. Privacy terms and conditions are often barely understood by educated adults, let alone children. As a result, there is a higher duty of care for children’s data.
Disaggregating data according to socio-demographic characteristics can improve service delivery and assist with policy development. However, it also creates risks for group privacy. Children can be identified, exposing them to possible harms. Disaggregated data for groups such as child-headed households and children experiencing gender-based violence can put vulnerable communities and children at risk. Data about children’s location itself can be risky, especially if they have some additional vulnerability that could expose them to harm.
Mishandling data can cause children to lose trust in institutions that deliver essential services including vaccines, medicine, and nutrition supplies. For organizations dealing with child well-being, these retreats can have severe consequences. Distrust can cause families and children to refuse health, education, child protection and other public services. Such privacy protective behavior can impact children throughout the course of their lifetime, and potentially exacerbate existing inequities and vulnerabilities.
As volumes of collected and stored data increase, obligations and protections traditionally put in place for children may be difficult or impossible to uphold. The interests of children are not always prioritized when organizations define their legitimate interest to access or share personal information of children. The immediate benefit of a service provided does not always justify the risk or harm that might be caused by it in the future. Data analysis may be undertaken by people who do not have expertise in the area of child rights, as opposed to traditional research where practitioners are specifically educated in child subject research. Similarly, service providers collecting children’s data are not always specially trained to handle it, as international standards recommend.
Recent events around the world reveal the promise and pitfalls of algorithmic decision-making. While it can expedite certain processes, algorithms and their inferences can possess biases that can have adverse effects on people, for example those seeking medical care and attempting to secure jobs. The danger posed by algorithmic bias is especially pronounced for children and other vulnerable populations. These groups often lack the awareness or resources necessary to respond to instances of bias or to rectify any misconceptions or inaccuracies in their data.
Many of the children served by child welfare organizations have suffered trauma. Whether physical, social, emotional in nature, repeatedly making children register for services or provide confidential personal information can amount to revictimization – re-exposing them to traumas or instigating unwarranted feelings of shame and guilt.

These trends and realities make clear the need for new approaches for maximizing the value of data to improve children’s lives, while mitigating the risks posed by our increasingly datafied society….(More)”.

Data Collaboratives in Response to COVID19

Curated on March 26, 2020March 26, 2020 by Stefaan Verhulst

Living Repository: “This document is part of a call for action to build a responsible infrastructure for data-driven pandemic response.

It serves as a living repository for data collaboratives seeking to address the spread of COVID-19 and its secondary effects.

> You can find ongoing data collaborative projects here

> Requests for data and expertise that might lead to data collaboratives can be found here.

> Data competitions, challenges, and calls for proposals, which can lead to useful tools to combat COVID-19, can be found here.

The repository aims to include projects that show a commitment to privacy protection, data responsibility, and overall user well-being.

It will be updated regularly as we receive projects and proposals or otherwise become aware of them.

HELP US MAKE THIS REPOSITORY BETTER: Individuals are encouraged to edit the repo and/or suggest additions to this document if a project is not currently listed.

See full Living Repository here.

Location Surveillance to Counter COVID-19: Efficacy Is What Matters

Curated on March 25, 2020March 25, 2020 by Stefaan Verhulst

Susan Landau at Lawfare: “…Some government officials believe that the location information that phones can provide will be useful in the current crisis. After all, if cellphone location information can be used to track terrorists and discover who robbed a bank, perhaps it can be used to determine whether you rubbed shoulders yesterday with someone who today was diagnosed as having COVID-19, the respiratory disease that the novel coronavirus causes. But such thinking ignores the reality of how phone-tracking technology works.

Let’s look at the details of what we can glean from cellphone location information. Cell towers track which phones are in their locale—but that is a very rough measure, useful perhaps for tracking bank robbers, but not for the six-foot proximity one wants in order to determine who might have been infected by the coronavirus.

Finer precision comes from GPS signals, but these can only work outside. That means the location information supplied by your phone—if your phone and that of another person are both on—can tell you if you both went into the same subway stop around the same time. But it won’t tell you whether you rode the same subway car. And the location information from your phone isn’t fully precise. So not only can’t it reveal if, for example, you were in the same aisle in the supermarket as the ill person, but sometimes it will make errors about whether you made it into the store, as opposed to just sitting on a bench outside. What’s more, many people won’t have the location information available because GPS drains the battery, so they’ll shut it off when they’re not using it. Their phones don’t have the location information—and neither do the providers, at least not at the granularity to determine coronavirus exposure.

GPS is not the only way that cellphones can collect location information. Various other ways exist, including through the WiFi network to which a phone is connected. But while two individuals using the same WiFi network are likely to be close together inside a building, the WiFi data would typically not be able to determine whether they were in that important six-foot proximity range.

Other devices can also get within that range, including Bluetooth beacons. These are used within stores, seeking to determine precisely what people are—and aren’t—buying; they track peoples’ locations indoors within inches. But like WiFi, they’re not ubiquitous, so their ability to track exposure will be limited.

If the apps lead to the government’s dogging people’s whereabouts at work, school, in the supermarket and at church, will people still be willing to download the tracking apps that get them get discounts when they’re passing the beer aisle? China follows this kind of surveillance model, but such a surveillance-state solution is highly unlikely to be acceptable in the United States. Yet anything less is unlikely to pinpoint individuals exposed to the virus.

South Korea took a different route. In precisely tracking coronavirus exposure, the country used additional digital records, including documentation of medical and pharmacy visits, history of credit card transactions, and CCTV videos, to determine where potentially exposed people had been—then followed up with interviews not just of infected people but also of their acquaintances, to determine where they had traveled.

Validating such records is labor intensive. And for the United States, it may not be the best use of resources at this time. There’s an even more critical reason that the Korean solution won’t work for the U.S.: South Korea was able to test exposed people. The U.S. can’t do this. Currently the country has a critical shortage of test kits; patients who are not sufficiently ill as to be hospitalized are not being tested. The shortage of test kits is sufficiently acute that in New York City, the current epicenter of the pandemic, the rule is, “unless you are hospitalized and a diagnosis will impact your care, you will not be tested.” With this in mind, moving to the South Korean model of tracking potentially exposed individuals won’t change the advice from federal and state governments that everyone should engage in social distancing—but employing such tracking would divert government resources and thus be counterproductive.

Currently, phone tracking in the United States is not efficacious. It cannot be unless all people are required to carry such location-tracking devices at all times; have location tracking on; and other forms of information tracking, including much wider use of CCTV cameras, Bluetooth beacons, and the like, are also in use. There are societies like this. But so far, even in the current crisis, no one is seriously contemplating the U.S. heading in that direction….(More)”.

Cellphone tracking could help stem the spread of coronavirus. Is privacy the price?

Curated on March 25, 2020March 25, 2020 by Stefaan Verhulst

Kelly Servick at Science: “…At its simplest, digital contact tracing might work like this: Phones log their own locations; when the owner of a phone tests positive for COVID-19, a record of their recent movements is shared with health officials; owners of any other phones that recently came close to that phone get notified of their risk of infection and are advised to self-isolate. But designers of a tracking system will have to work out key details: how to determine the proximity among phones and the health status of users, where that information gets stored, who sees it, and in what format.

Digital contact tracing systems are already running in several countries, but details are scarce and privacy concerns abound. Protests greeted Israeli Prime Minister Benjamin Netanyahu’s rollout this week of a surveillance program that uses the country’s domestic security agency to track the locations of people potentially infected with the virus. South Korea has released detailed information on infected individuals—including their recent movements—viewable through multiple private apps that send alerts to users in their vicinity. “They’re essentially texting people, saying, ‘Hey, there’s been a 60-year-old woman who’s positive for COVID. Click this for more information about her path,’” says Anne Liu, a global health expert at Columbia University. She warns that the South Korean approach risks unmasking and stigmatizing infected people and the businesses they frequent.

But digital tracking is probably “identifying more contacts than you would with traditional methods,” Liu says. A contact-tracing app might not have much impact in a city where a high volume of coronavirus cases and extensive community transmission has already shuttered businesses and forced citizens inside, she adds. But it could be powerful in areas, such as in sub-Saharan Africa, that are at an earlier stage of the outbreak, and where isolating potential cases could avert the need to shut down all schools and businesses. “If you can package this type of information in a way that protects individual privacy as best you can, it can be something positive,” she says.

Navigating privacy laws

In countries with strict data privacy laws, one option for collecting data is to ask telecommunications and other tech companies to share anonymous, aggregated information they’ve already gathered. Laws in the United States and the European Union are very specific about how app and device users must consent to the use of their data—and how much information companies must disclose about how those data will be used, stored, and shared. Working within those constraints, mobile carriers in Germany and Italy have started to share cellphone location data with health officials in an aggregated, anonymized format. Even though individual users aren’t identified, the data could reveal general trends about where and when people are congregating and risk spreading infection.

Google and Facebook are both in discussions with the U.S. government about sharing anonymized location data, The Washington Post reported this week. U.S. companies have to deal with a patchwork of state and federal privacy regulations, says Melissa Krasnow, a privacy and data security partner at VLP Law Group. App and devicemakers could face user lawsuits for sharing data in a way that wasn’t originally specified in their terms of service—unless federal or local officials pass legislation that would free them from liability. “Now you’ve got a global pandemic, so you would think that [you] would be able to use this information for the global good, but you can’t,” Krasnow says. “There’s expectations about privacy.”

Another option is to start fresh with a coronavirus-specific app that asks users to voluntarily share their location and health data. For example, a basic symptom-checking app could do more than just keeping people who don’t need urgent care out of overstretched emergency rooms, says Samuel Scarpino, an epidemiologist at Northeastern University. Health researchers could use also use location data from the app to estimate the size of an outbreak. “That could be done, I think, without risking being evil,” he says.

For Scarpino, the calculus changes if governments want to track the movements of a specific person who has coronavirus relative to the paths of other people, as China and South Korea have apparently done. That kind of tracking “could easily swing towards a privacy violation that isn’t justified by the potential public health benefit,” he says….(More)”.

Governments could track COVID-19 lockdowns through social media posts

Curated on March 25, 2020March 26, 2020 by Stefaan Verhulst

Alfred Ng at CNET: “Your posts on social media have been harvested for advertising. They’ve been taken to build up a massive facial recognition database. Now that same data could be used by companies and governments to help maintain quarantines during the coronavirus outbreak.

Ghost Data, a research group in Italy and the US, collected more than half a million Instagram posts in March, targeting regions in Italy where residents were supposed to be on lockdown. It provided those images and videos to LogoGrab, an image recognition company that can automatically identify people and places. The company found at least 33,120 people violated Italy’s quarantine orders.

Andrea Stroppa, the founder of Ghost Data, said his group has offered its research to the Italian government. Stroppa doesn’t consider the social media scraping to be a privacy concern because researchers anonymized the data by removing profile and specific location data before analyzing it. He also has public health on his mind.

“In our view, privacy is very important. It’s a fundamental human right,” Stroppa said. “However, it’s important to give our support to help the government and the authorities. Hundreds of people are dying every day.”…(More)” .