Measuring Commuting and Economic Activity Inside Cities with Cell Phone Records


Paper by Gabriel Kreindler and Yuhei Miyauchi: “We show how to use commuting flows to infer the spatial distribution of income within a city. A simple workplace choice model predicts a gravity equation for commuting flows whose destination fixed effects correspond to wages. We implement this method with cell phone transaction data from Dhaka and Colombo. Model-predicted income predicts separate income data, at the workplace and residential level, and by skill group. Unlike machine learning approaches, our method does not require training data, yet achieves comparable predictive power. We show that hartals (transportation strikes) in Dhaka reduce commuting more for high model-predicted wage and high-skill commuters….(More)”.

The Landscape of Big Data and Gender


Report by Data2X: “This report draws out six observations about trends in big data and gender:

– The current environment COVID-19 and the global economic recession is stimulating groundbreaking gender research.

– Where we’re progressing, where we’re lagging Some gendered topics—especially mobility, health, and social norms—are increasingly well-studied through the combination of big data and traditional data. However, worrying gaps remain, especially around the subjects of economic opportunity, human security, and public participation.

– Capturing gender-representative samples using big data continues to be a challenge, but progress is being made.

– Large technology firms generate an immense volume of gender data critical for policymaking, and researchers are finding ways to reuse this data safely.

– Data collaboratives that bring private sector data-holders, researchers, and public policymakers together in a formal, enduring relationship can help big data make a practical difference in the lives of women and girls….(More)”

COVID vaccination studies: plan now to pool data, or be bogged down in confusion


Natalie Dean at Nature: “More and more COVID-19 vaccines are rolling out safely around the world; just last month, the United States authorized one produced by Johnson & Johnson. But there is still much to be learnt. How long does protection last? How much does it vary by age? How well do vaccines work against various circulating variants, and how well will they work against future ones? Do vaccinated people transmit less of the virus?

Answers to these questions will help regulators to set the best policies. Now is the time to make sure that those answers are as reliable as possible, and I worry that we are not laying the essential groundwork. Our current trajectory has us on course for confusion: we must plan ahead to pool data.

Many questions remain after vaccines are approved. Randomized trials generate the best evidence to answer targeted questions, such as how effective booster doses are. But for others, randomized trials will become too difficult as more and more people are vaccinated. To fill in our knowledge gaps, observational studies of the millions of vaccinated people worldwide will be essential….

Perhaps most importantly, we must coordinate now on plans to combine data. We must take measures to counter the long-standing siloed approach to research. Investigators should be discouraged from setting up single-site studies and encouraged to contribute to a larger effort. Funding agencies should favour studies with plans for collaborating or for sharing de-identified individual-level data.

Even when studies do not officially pool data, they should make their designs compatible with others. That means up-front discussions about standardization and data-quality thresholds. Ideally, this will lead to a minimum common set of variables to be collected, which the WHO has already hammered out for COVID-19 clinical outcomes. Categories include clinical severity (such as all infections, symptomatic disease or critical/fatal disease) and patient characteristics, such as comorbidities. This will help researchers to conduct meta-analyses of even narrow subgroups. Efforts are under way to develop reporting guidelines for test-negative studies, but these will be most successful when there is broad engagement.

There are many important questions that will be addressed only by observational studies, and data that can be combined are much more powerful than lone results. We need to plan these studies with as much care and intentionality as we would for randomized trials….(More)”.

Narratives and Counternarratives on Data Sharing in Africa


Paper by Rediet Abebe et al: “As machine learning and data science applications grow ever more prevalent, there is an increased focus on data sharing and open data initiatives, particularly in the context of the African continent. Many argue that data sharing can support research and policy design to alleviate poverty, inequality, and derivative effects in Africa. Despite the fact that the datasets in question are often extracted from African communities, conversations around the challenges of accessing and sharing African data are too often driven by non-African stakeholders. These perspectives frequently employ a deficit narratives, often focusing on lack of education, training, and technological resources in the continent as the leading causes of friction in the data ecosystem.

We argue that these narratives obfuscate and distort the full complexity of the African data sharing landscape. In particular, we use storytelling via fictional personas built from a series of interviews with African data experts to complicate dominant narratives and to provide counternarratives. Coupling these personas with research on data practices within the continent, we identify recurring barriers to data sharing as well as inequities in the distribution of data sharing benefits. In particular, we discuss issues arising from power imbalances resulting from the legacies of colonialism, ethno-centrism, and slavery, disinvestment in building trust, lack of acknowledgement of historical and present-day extractive practices, and Western-centric policies that are ill-suited to the African context. After outlining these problems, we discuss avenues for addressing them when sharing data generated in the continent….(More)”.

New approach to data is a great opportunity for the UK post-Brexit


Oliver Dowden at the Financial Times: “As you read this, thousands of people are receiving a message that will change their lives: a simple email or text, inviting them to book their Covid jab. But what has powered the UK’s remarkable vaccine rollout isn’t just our NHS, but the data that sits underneath it — from the genetic data used to develop the vaccine right through to the personal health data enabling that “ping” on their smartphone.

After years of seeing data solely through the lens of risk, Covid-19 has taught us just how much we have to lose when we don’t use it.

As I launch the competition to find the next Information Commissioner, I want to set a bold new approach that capitalises on all we’ve learnt during the pandemic, which forced us to share data quickly, efficiently and responsibly for the public good. It is one that no longer sees data as a threat, but as the great opportunity of our time.

Until now, the conversation about data has revolved around privacy — and with good reason. A person’s digital footprint can tell you not just vital statistics like age and gender, but their personal habits.

Our first priority is securing this valuable personal information. The UK has a long and proud tradition of defending privacy, and a commitment to maintaining world-class data protection standards now that we’re outside the EU. That was recognised last week in the bloc’s draft decisions on the ‘adequacy’ of our data protection rules — the agreement that data can keep flowing freely between the EU and UK.

We fully intend to maintain those world-class standards. But to do so, we do not need to copy and paste the EU’s rule book, the General Data Protection Regulation (GDPR), word-for-word. Countries as diverse as Israel and Uruguay have successfully secured adequacy with Brussels despite having their own data regimes. Not all of those were identical to GDPR, but equal doesn’t have to mean the same. The EU doesn’t hold the monopoly on data protection.

So, having come a long way in learning how to manage data’s risks, the UK is going to start making more of its opportunities….(More)”.

Balancing Privacy With Data Sharing for the Public Good


David Deming at the New York Times: “Governments and technology companies are increasingly collecting vast amounts of personal data, prompting new laws, myriad investigations and calls for stricter regulation to protect individual privacy.

Yet despite these issues, economics tells us that society needs more data sharing rather than less, because the benefits of publicly available data often outweigh the costs. Public access to sensitive health records sped up the development of lifesaving medical treatments like the messenger-RNA coronavirus vaccines produced by Moderna and Pfizer. Better economic data could vastly improve policy responses to the next crisis.

Data increasingly powers innovation, and it needs to be used for the public good, while individual privacy is protected. This is new and unfamiliar terrain for policymaking, and it requires a careful approach.

The pandemic has brought the increasing dominance of big, data-gobbling tech companies into sharp focus. From online retail to home entertainment, digitally savvy businesses are collecting data and deploying it to anticipate product demand and set prices, lowering costs and outwitting more traditional competitors.

Data provides a record of what has already happened, but its main value comes from improving predictions. Companies like Amazon choose products and prices based on what you — and others like you — bought in the past. Your data improves their decision-making, boosting corporate profits.

Private companies also depend on public data to power their businesses. Redfin and Zillow disrupted the real estate industry thanks to access to public property databases. Investment banks and consulting firms make economic forecasts and sell insights to clients using unemployment and earnings data collected by the Department of Labor. By 2013, one study estimated, public data contributed at least $3 trillion per year to seven sectors of the economy worldwide.

The buzzy refrain of the digital age is that “data is the new oil,” but this metaphor is inaccurate. Data is indeed the fuel of the information economy, but it is more like solar energy than oil — a renewable resource that can benefit everyone at once, without being diminished….(More)”.

My Data, My Choice? – German Patient Organizations’ Attitudes towards Big Data-Driven Approaches in Personalized Medicine. An Empirical-Ethical Study


Paper by Carolin Martina Rauter, Sabine Wöhlke & Silke Schicktanz: “Personalized medicine (PM) operates with biological data to optimize therapy or prevention and to achieve cost reduction. Associated data may consist of large variations of informational subtypes e.g. genetic characteristics and their epigenetic modifications, biomarkers or even individual lifestyle factors. Present innovations in the field of information technology have already enabled the procession of increasingly large amounts of such data (‘volume’) from various sources (‘variety’) and varying quality in terms of data accuracy (‘veracity’) to facilitate the generation and analyzation of messy data sets within a short and highly efficient time period (‘velocity’) to provide insights into previously unknown connections and correlations between different items (‘value’). As such developments are characteristics of Big Data approaches, Big Data itself has become an important catchphrase that is closely linked to the emerging foundations and approaches of PM. However, as ethical concerns have been pointed out by experts in the debate already, moral concerns by stakeholders such as patient organizations (POs) need to be reflected in this context as well. We used an empirical-ethical approach including a website-analysis and 27 telephone-interviews for gaining in-depth insight into German POs’ perspectives on PM and Big Data. Our results show that not all POs are stakeholders in the same way. Comparing the perspectives and political engagement of the minority of POs that is currently actively involved in research around PM and Big Data-driven research led to four stakeholder sub-classifications: ‘mediators’ support research projects through facilitating researcher’s access to the patient community while simultaneously selecting projects they preferably support while ‘cooperators’ tend to contribute more directly to research projects by providing and implemeting patient perspectives. ‘Financers’ provide financial resources. ‘Independents’ keep control over their collected samples and associated patient-related information with a strong interest in making autonomous decisions about its scientific use. A more detailed terminology for the involvement of POs as stakeholders facilitates the adressing of their aims and goals. Based on our results, the ‘independents’ subgroup is a promising candidate for future collaborations in scientific research. Additionally, we identified gaps in PO’s knowledge about PM and Big Data. Based on these findings, approaches can be developed to increase data and statistical literacy. This way, the full potential of stakeholder involvement of POs can be made accessible in discourses around PM and Big Data….(More)”.

Designing Data Trusts. Why We Need to Test Consumer Data Trusts Now


Policy Brief by Aline Blankertz: “Data about individuals, about their preferences and behaviors, has become an increasingly important resource for companies, public agencies, and research institutions. Consumers carry the burden of having to decide which data about them is shared for which purpose. They want to make sure that data about them is not used to infer intimate details of their private life or to pursue other undesirable purposes. At the same time, they want to benefit from personalized products and innovation driven by the same data. The complexity of how data is collected and used overwhelms consumers, many of whom wearily accept privacy policies and lose trust that those who gain effective control over the data will use it for the consumers’ benefit.

At the same time, a few large companies accumulate and lock in vast amounts of data that enable them to use insights across markets and across consumers. In Europe, the General Data Protection Regulation (GDPR) has given data rights to consumers to assert their interests vis-a-vis those companies, but it gives consumers neither enough information nor enough power to make themselves heard. Other organizations, especially small businesses or start-ups, do not have access to the data (unless individual consumers laboriously exercise their right to portability), which often inhibits competition and innovation. Governments across Europe would like to tackle the challenge of reconciling productive data use with privacy. In recent months, data trusts have emerged as a promising solution to enable data-sharing for the benefit of consumers.

The concept has been endorsed by a broad range of stakeholders, including privacy advocates, companies and expert commissions. In Germany, for example, the data ethics commission and the commission competition law 4.0 have recommended further exploring data trusts, and the government is incorporating the concept into its data strategy.

There is no common understanding yet what consumer data trusts are and what they do. In order for them to address the problems mentioned, it is helpful to use as a working definition: consumer data trusts are intermediaries that aggregate consumers’ interests and represent them vis-à-vis data-using organizations. Data trusts use more technical and legal expertise, as well as greater bargaining power, to negotiate with organizations on the conditions of data use to achieve better outcomes than those that individual consumers can achieve. To achieve their consumer-oriented mission, data trusts should be able to assign access rights, audit data practices, and support enforcement. They may or may not need to hold data…(More)”.

Inside the ‘Wikipedia of Maps,’ Tensions Grow Over Corporate Influence


Corey Dickinson at Bloomberg: “What do Lyft, Facebook, the International Red Cross, the U.N., the government of Nepal and Pokémon Go have in common? They all use the same source of geospatial data: OpenStreetMap, a free, open-source online mapping service akin to Google Maps or Apple Maps. But unlike those corporate-owned mapping platforms, OSM is built on a network of mostly volunteer contributors. Researchers have described it as the “Wikipedia for maps.”

Since it launched in 2004, OpenStreetMap has become an essential part of the world’s technology infrastructure. Hundreds of millions of monthly users interact with services derived from its data, from ridehailing apps, to social media geotagging on Snapchat and Instagram, to humanitarian relief operations in the wake of natural disasters. 

But recently the map has been changing, due the growing impact of private sector companies that rely on it. In a 2019 paper published in the ISPRS International Journal of Geo-Information, a cross-institutional team of researchers traced how Facebook, Apple, Microsoft and other companies have gained prominence as editors of the map. Their priorities, the researchers say, are driving significant change to what is being mapped compared to the past. 

“OpenStreetMap’s data is crowdsourced, which has always made spectators to the project a bit wary about the quality of the data,” says Dipto Sarkar, a professor of geoscience at Carleton University in Ottawa, and one of the paper’s co-authors. “As the data becomes more valuable and is used for an ever-increasing list of projects, the integrity of the information has to be almost perfect. These companies need to make sure there’s a good map of the places they want to expand in, and nobody else is offering that, so they’ve decided to fill it in themselves.”…(More)”.

Collective bargaining on digital platforms and data stewardship


Paper by Astha Kapoor: “… there is a need to think of exploitation on platforms not only through the lens of labour rights but also that of data rights. In the current context, it is impossible to imagine well-being without more agency on the way data are collected, stored and used. It is imperative to envision structures through which worker communities and representatives can be more involved in determining their own data lives on platforms. There is a need to organize and mobilize workers on data rights.

One of the ways in which this can be done is through a mechanism of community data stewards who represent the needs and interests of workers to their platforms, thus negotiating and navigating the data-based decisions. This paper examines the need for data rights as a critical requirement for worker well-being in the platform economy and the ways in which it can be actualized. It argues, given that workers on platforms produce data through collective labour on and off the platform, that worker data are a community resource and should be governed by representatives of workers who can negotiate with platforms on the use of that data for workers and for the public interest. The paper analyses the opportunity for a community data steward mechanism that represents workers’ interests and intermediates on data issues, such as transparency and accountability, with offline support systems. And is also a voice to online action to address some of the injustices of the data economy. Thus, a data steward is a tool through which workers better control their data—consent, privacy and rights—better and organize online. Essentially, it is a way forward for workers to mobilize collective bargaining on data rights.

The paper covers the impact of the COVID-19 pandemic on workers’ rights and well-being. It explores the idea of community data rights on the platform economy and why collective bargaining on data is imperative for any kind of meaningful negotiation with technology companies. The role of a community data steward in reclaiming workers’ power in the platform economy is explained, concluding with policy recommendations for a community data steward structure in the Indian context….(More)”.