To harness telecom data for good, there are six challenges to overcome


Blog by Anat Lewin and Sveta Milusheva: “The global use of mobile phones generates a vast amount of data. What good can be done with these data? During the COVID-19 pandemic, we saw that aggregated data from mobile phones can tell us where groups of humans are going, how many of them are there, and how they are behaving as a cluster. When used effectively and responsibly, mobile phone data can be immensely helpful for development work and emergency response — particularly in resource-constrained countries.  For example, an African country that had, in recent years, experienced a cholera outbreak was ahead of the game. Since the legal and practical agreements were already in place to safely share aggregated mobile data, accessing newer information to support epidemiological modeling for COVID-19 was a straightforward exercise. The resulting datasets were used to produce insightful analyses that could better inform health, lockdown, and preventive policy measures in the country.

To better understand such challenges and opportunities, we led an effort to access and use anonymized, aggregated mobile phone data across 41 countries. During this process, we identified several recurring roadblocks and replicable successes, which we summarized in a paper along with our lessons learned. …(More)”.

Data Collaborative Case Study: NYC Recovery Data Partnership


Report by the Open Data Policy Lab (The GovLab): “In July 2020, following severe economic and social losses due to the COVID-19 pandemic, the administration of New York City Mayor Bill de Blasio announced the NYC Recovery Data Partnership. This data collaborative asked private and civic organizations with assets relevant to New York City to provide their data to the city. Senior city leaders from the First Deputy Mayor’s Office, the Mayor’s Office of Operations, Mayor’s Office of Information Privacy and Mayor’s Office of Data Analytics formed an internal coalition which served as trusted intermediaries, assessing agency requests from city agencies to use the data provided and allocating access accordingly. The data informed internal research conducted by various city agencies, including New York City Emergency Management’s Recovery Team and the NYC…(More)”Department of City Planning. The experience reveals the ability of crises to spur innovation, the value of responsiveness from both data users and data suppliers, and the importance of technical capacity, and the value of a network of peers. In terms of challenges, the experience also exposes the limitations of data, the challenges of compiling complex datasets, and the role of resource constraints.

Ten (not so) simple rules for clinical trial data-sharing


Paper by Claude Pellen et al: “Clinical trial data-sharing is seen as an imperative for research integrity and is becoming increasingly encouraged or even required by funders, journals, and other stakeholders. However, early experiences with data-sharing have been disappointing because they are not always conducted properly. Health data is indeed sensitive and not always easy to share in a responsible way. We propose 10 rules for researchers wishing to share their data. These rules cover the majority of elements to be considered in order to start the commendable process of clinical trial data-sharing:

  • Rule 1: Abide by local legal and regulatory data protection requirements
  • Rule 2: Anticipate the possibility of clinical trial data-sharing before obtaining funding
  • Rule 3: Declare your intent to share data in the registration step
  • Rule 4: Involve research participants
  • Rule 5: Determine the method of data access
  • Rule 6: Remember there are several other elements to share
  • Rule 7: Do not proceed alone
  • Rule 8: Deploy optimal data management to ensure that the data shared is useful
  • Rule 9: Minimize risks
  • Rule 10: Strive for excellence…(More)”

Examining public views on decentralised health data sharing


Paper by Victoria Neumann et al: “In recent years, researchers have begun to explore the use of Distributed Ledger Technologies (DLT), also known as blockchain, in health data sharing contexts. However, there is a significant lack of research that examines public attitudes towards the use of this technology. In this paper, we begin to address this issue and present results from a series of focus groups which explored public views and concerns about engaging with new models of personal health data sharing in the UK. We found that participants were broadly in favour of a shift towards new decentralised models of data sharing. Retaining ‘proof’ of health information stored about patients and the capacity to provide permanent audit trails, enabled by immutable and transparent properties of DLT, were regarded as particularly valuable for our participants and prospective data custodians. Participants also identified other potential benefits such as supporting people to become more health data literate and enabling patients to make informed decisions about how their data was shared and with whom. However, participants also voiced concerns about the potential to further exacerbate existing health and digital inequalities. Participants were also apprehensive about the removal of intermediaries in the design of personal health informatics systems…(More)”.

Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good


Report by National Academies of Sciences, Engineering, and Medicine: “Historically, the U.S. national data infrastructure has relied on the operations of the federal statistical system and the data assets that it holds. Throughout the 20th century, federal statistical agencies aggregated survey responses of households and businesses to produce information about the nation and diverse subpopulations. The statistics created from such surveys provide most of what people know about the well-being of society, including health, education, employment, safety, housing, and food security. The surveys also contribute to an infrastructure for empirical social- and economic-sciences research. Research using survey-response data, with strict privacy protections, led to important discoveries about the causes and consequences of important societal challenges and also informed policymakers. Like other infrastructure, people can easily take these essential statistics for granted. Only when they are threatened do people recognize the need to protect them…(More)”.

Ten lessons for data sharing with a data commons


Article by Robert L. Grossman: “..Lesson 1. Build a commons for a specific community with a specific set of research challenges

Although there are a few data repositories that serve the general scientific community that have proved successful, in general data commons that target a specific user community have proven to be the most successful. The first lesson is to build a data commons for a specific research community that is struggling to answer specific research challenges with data. As a consequence, a data commons is a partnership between the data scientists developing and supporting the commons and the disciplinary scientists with the research challenges.

Lesson 2. Successful commons curate and harmonize the data

Successful commons curate and harmonize the data and produce data products of broad interest to the community. It’s time consuming, expensive, and labor intensive to curate and harmonize data, by much of the value of data commons is centralizing this work so that it can be done once instead of many times by each group that needs the data. These days, it is very easy to think of a data commons as a platform containing data, not spend the time curating or harmonizing it, and then be surprised that the data in the commons is not used more widely used and its impact is not as high as expected.

Lesson 3. It’s ultimately about the data and its value to generate new research discoveries

Despite the importance of a study, few scientists will try to replicate previously published studies. Instead, data is usually accessed if it can lead to a new high impact paper. For this reason, data commons play two different but related roles. First, they preserve data for reproducible science. This is a small fraction of the data access, but plays a critical role in reproducible science. Second, data commons make data available for new high value science.

Lesson 4. Reduce barriers to access to increase usage

A useful rule of thumb is that every barrier to data access cuts down access by a factor of 10. Common barriers that reduce use of a commons include: registration vs no-registration; open access vs controlled access; click through agreements vs signing of data usage agreements and approval by data access committees; license restrictions on the use of the data vs no license restrictions…(More)”.

Satellite data: The other type of smartphone data you might not know about


Article by Tommy Cooke et al: “Smartphones determine your location in several ways. The first way involves phones triangulating distances between cell towers or Wi-Fi routers.

The second way involves smartphones interacting with navigation satellites. When satellites pass overhead, they transmit signals to smartphones, which allows smartphones to calculate their own location. This process uses a specialized piece of hardware called the Global Navigation Satellite System (GNSS) chipset. Every smartphone has one.

When these GNSS chipsets calculate navigation satellite signals, they output data in two standardized formats (known as protocols or languages): the GNSS raw measurement protocol and the National Marine Electronics Association protocol (NMEA 0183).

GNSS raw measurements include data such as the distance between satellites and cellphones and measurements of the signal itself.

NMEA 0183 contains similar information to GNSS raw measurements, but also includes additional information such as satellite identification numbers, the number of satellites in a constellation, what country owns a satellite, and the position of a satellite.

NMEA 0183 was created and is governed by the NMEA, a not-for-profit lobby group that is also a marine electronics trade organization. The NMEA was formed at the 1957 New York Boat Show when boating equipment manufacturers decided to build stronger relationships within the electronic manufacturing industry.

In the decades since, the NMEA 0183 data standard has improved marine electronics communications and is now found on a wide variety of non-marine communications devices today, including smartphones…

It is difficult to know who has access to data produced by these protocols. Access to NMEA protocols is only available under licence to businesses for a fee.

GNSS raw measurements, on the other hand, are a universal standard and can be read by different devices in the same way without a license. In 2016, Google allowed industries to have open access to it to foster innovation around device tracking accuracy, precision, analytics about how we move in real-time, and predictions about our movements in the future.

While automated processes can quietly harvest location data — like when a French-based company extracted location data from Salaat First, a Muslim prayer app — these data don’t need to be taken directly from smartphones to be exploited.

Data can be modelled, experimented with, or emulated in licensed devices in labs for innovation and algorithmic development.

Satellite-driven raw measurements from our devices were used to power global surveillance networks like STRIKE3, a now defunct European-led initiative that monitored and reported perceived threats to navigation satellites…(More)”.

Data sharing during coronavirus: lessons for government


Report by Gavin Freeguard and Paul Shepley: “This report synthesises the lessons from six case studies and other research on government data sharing during the pandemic. It finds that current legislation, such as the Digital Economy Act and UK General Data Protection Regulation (GDPR), does not constitute a barrier to data sharing and that while technical barriers – incompatible IT systems, for example – can slow data sharing, they do not prevent it. 

Instead, the pandemic forced changes to standard working practice that enabled new data sharing agreements to be created quickly. This report focuses on what these changes were and how they can lead to improvements in future practice.

The report recommends: 

  • The government should retain data protection officers and data protection impact assessments within the Data Protection and Digital Information Bill, and consider strengthening provisions around citizen engagement and how to ensure data flows during emergency response.
  • The Department for Levelling Up, Housing and Communities should consult on how to improve working around data between central and local government in England. This should include the role of the proposed Office for Local Government, data skills and capabilities at the local level, reform of the Single Data List and the creation of a data brokering function to facilitate two-way data sharing between national and local government.
  • The Central Digital and Data Office (CDDO) should create a data sharing ‘playbook’ to support public servants building new services founded on data. The playbook should contain templates for standard documents, links to relevant legislation and codes of practice (like those from the Information Commissioner’s Office), guidance on public engagement and case studies covering who to engage and when whilst setting up a new service.
  • The Centre for Data Ethics and Innovation, working with CDDO, should take the lead on guidance and resources on how to engage the public at every stage of data sharing…(More)”.

How an Open-Source Disaster Map Helped Thousands of Earthquake Survivors


Article by Eray Gündoğmuş: “On February 6, 2023, earthquakes measuring 7.7 and 7.6 hit the Kahramanmaraş region of Turkey, affecting 10 cities and resulting in more than 42.000 deaths and 120.000 injuries as of February 21.

In the hours following the earthquake, a group of programmers quickly become together on the Discord server called “Açık Yazılım Ağı” , inviting IT professionals to volunteer and develop a project that could serve as a resource for rescue teams, earthquake survivors, and those who wanted to help: afetharita.com. It literally means “disaster map”.

As there was a lack of preparation for the first few days of such a huge earthquake, disaster victims in distress started making urgent aid requests on social media. With the help of thousands of volunteers, we utilized technologies such as artificial intelligence and machine learning to transform these aid requests into readable data and visualized them on afetharita.com. Later, we gathered critical data related to the disaster from necessary institutions and added them to the map.

Disaster Map, which received a total of 35 million requests and 627,000 unique visitors, played a significant role in providing software support during the most urgent and critical periods of the disaster, and helped NGOs, volunteers, and disaster victims to access important information. I wanted to share the process, our experiences, and technical details of this project clearly in writing…(More)”.

COVID isn’t going anywhere, neither should our efforts to increase responsible access to data


Article by Andrew J. Zahuranec, Hannah Chafetz and Stefaan Verhulst: “..Moving forward, institutions will need to consider how to embed non-traditional data capacity into their decision-making to better understand the world around them and respond to it.

For example, wastewater surveillance programmes that emerged during the pandemic continue to provide valuable insights about outbreaks before they are reported by clinical testing and have the potential to be used for other emerging diseases.

We need these and other programmes now more than ever. Governments and their partners need to maintain and, in many cases, strengthen the collaborations they established through the pandemic.

To address future crises, we need to institutionalize new data capacities – particularly those involving non-traditional datasets that may capture digital information that traditional health surveys and statistical methods often miss.

The figure above summarizes the types and sources of non-traditional data sources that stood out most during the COVID-19 response.

The types and sources of non-traditional data sources that stood out most during the COVID-19 response. Image: The GovLab

In our report, we suggest four pathways to advance the responsible access to non-traditional data during future health crises…(More)”.