Seize the Future by Harnessing the Power of Data

Essay by Kriss Deiglmeier: “…Data is a form of power. And the sad reality is that power is being held increasingly by the commercial sector and not by organizations seeking to create a more just, sustainable, and prosperous world. A year into my tenure as the chief global impact officer at Splunk, I became consumed with the new era driven by data. Specifically, I was concerned with the emerging data divide, which I defined as “the disparity between the expanding use of data to create commercial value, and the comparatively weak use of data to solve social and environmental challenges.”…

To effectively address the emerging data future, the social impact sector must build an entire impact data ecosystem for this moment in time—and the next moment in time. The way to do that is by investing in those areas where we currently lag the commercial sector. Consider the following gaps:

  • Nonprofits are ill-equipped with the financial and technical resources they need to make full use of data, often due to underfunding.
  • The sector’s technical and data talent is a desert compared to the commercial sector.
  • While the sector is rich with output and service-delivery data, that data is locked away or is unusable in its current form.
  • The sector lacks living data platforms (collaboratives and data refineries) that can make use of sector-wide data in a way that helps improve service delivery, maximize impact, and create radical innovation.

The harsh realities of the sector’s disparate data skills, infrastructure, and competencies show the dire current state. For the impact sector to transition to a place of power, it must jump without hesitation into the arena of the Data Age—and invest time, talent, and money in filling in these gaps.

Regardless of our lagging position, the social sector has both an incredible opportunity and a unique capacity to drive the power of data into the emerging and unimaginable. The good news is that there’s pivotal work already happening in the sector that is making it easier to build the kind of impact data ecosystem needed to join the Data Age. The framing and terms used to describe this work are many—data for good, data science for impact, open data, public interest technology, data lakes, ethical data, and artificial intelligence ethics.

These individual pieces, while important, are not enough. To fully exploit the power of data for a more just, sustainable, and prosperous world, we need to be bold enough to build the full ecosystem and not be satisfied with piecemeal work. To do that we should begin by looking at the assets that we have and build on those.

People. There are dedicated leaders in the field of social innovation who are committed to using data for impact and who have been doing that for many years. We need to support them by investing in their work at scale. The list of people leading the way is constantly growing, but to name a few: Stefaan G. Verhulst, Joy Buolamwini, Jim Fruchterman, Katara McCarty, Geoff Mulgan, Rediet Abebe, Jason Saul, and Jake Porway….(More)”.

Could a Global “Wicked Problems Agency” Incentivize Data Sharing?

Paper by Susan Ariel Aaronson: “Global data sharing could help solve “wicked” problems (problems such as climate change, terrorism and global poverty that no one knows how to solve without creating further problems). There is no one or best way to address wicked problems because they have many different causes and manifest in different contexts. By mixing vast troves of data, policy makers and researchers may find new insights and strategies to address these complex problems. National and international government agencies and large corporations generally control the use of such data, and the world has made little progress in encouraging cross-sectoral and international data sharing. This paper proposes a new international cloud-based organization, the “Wicked Problems Agency,” to catalyze both data sharing and data analysis in the interest of mitigating wicked problems. This organization would work to prod societal entities — firms, individuals, civil society groups and governments — to share and analyze various types of data. The Wicked Problems Agency could provide a practical example of how data sharing can yield both economic and public good benefits…(More)”.

An agenda for advancing trusted data collaboration in cities

Report by Hannah Chafetz, Sampriti Saxena, Adrienne Schmoeker, Stefaan G. Verhulst, & Andrew J. Zahuranec: “… Joined by experts across several domains including smart cities, the law, and data ecosystem, this effort was focused on developing solutions that could improve the design of Data Sharing Agreements…we assessed what is needed to implement each aspect of our Contractual Wheel of Data Collaboration–a tool developed as a part of the Contracts for Data Collaborations initiative that seeks to capture the elements involved in data collaborations and Data Sharing Agreements.

In what follows, we provide key suggestions from this Action Lab…

  1. The Elements of Principled Negotiations: Those seeking to develop a Data Sharing Agreement often struggle to work with collaborators or agree to common ends. There is a need for a common resource that Data Stewards can use to initiate a principled negotiation process. To address this need, we would identify the principles to inform negotiations and the elements that could help achieve those principles. For example, participants voiced a need for fairness, transparency, and reciprocity principles. These principles could be supported by having a shared language or outlining the minimum legal documents required for each party. The final product would be a checklist or visualization of principles and their associated elements.
  2. Data Responsibility Principles by Design: …
  3. Readiness Matrix: 
  4. A Decision Provenance Approach for Data Collaboration: ..
  5. The Contractual Wheel of Data Collaboration 2.0
  6. A Repository of Legal Drafting Technologies:…(More)”.

To harness telecom data for good, there are six challenges to overcome

Blog by Anat Lewin and Sveta Milusheva: “The global use of mobile phones generates a vast amount of data. What good can be done with these data? During the COVID-19 pandemic, we saw that aggregated data from mobile phones can tell us where groups of humans are going, how many of them are there, and how they are behaving as a cluster. When used effectively and responsibly, mobile phone data can be immensely helpful for development work and emergency response — particularly in resource-constrained countries.  For example, an African country that had, in recent years, experienced a cholera outbreak was ahead of the game. Since the legal and practical agreements were already in place to safely share aggregated mobile data, accessing newer information to support epidemiological modeling for COVID-19 was a straightforward exercise. The resulting datasets were used to produce insightful analyses that could better inform health, lockdown, and preventive policy measures in the country.

To better understand such challenges and opportunities, we led an effort to access and use anonymized, aggregated mobile phone data across 41 countries. During this process, we identified several recurring roadblocks and replicable successes, which we summarized in a paper along with our lessons learned. …(More)”.

Data Collaborative Case Study: NYC Recovery Data Partnership

Report by the Open Data Policy Lab (The GovLab): “In July 2020, following severe economic and social losses due to the COVID-19 pandemic, the administration of New York City Mayor Bill de Blasio announced the NYC Recovery Data Partnership. This data collaborative asked private and civic organizations with assets relevant to New York City to provide their data to the city. Senior city leaders from the First Deputy Mayor’s Office, the Mayor’s Office of Operations, Mayor’s Office of Information Privacy and Mayor’s Office of Data Analytics formed an internal coalition which served as trusted intermediaries, assessing agency requests from city agencies to use the data provided and allocating access accordingly. The data informed internal research conducted by various city agencies, including New York City Emergency Management’s Recovery Team and the NYC…(More)”Department of City Planning. The experience reveals the ability of crises to spur innovation, the value of responsiveness from both data users and data suppliers, and the importance of technical capacity, and the value of a network of peers. In terms of challenges, the experience also exposes the limitations of data, the challenges of compiling complex datasets, and the role of resource constraints.

Ten (not so) simple rules for clinical trial data-sharing

Paper by Claude Pellen et al: “Clinical trial data-sharing is seen as an imperative for research integrity and is becoming increasingly encouraged or even required by funders, journals, and other stakeholders. However, early experiences with data-sharing have been disappointing because they are not always conducted properly. Health data is indeed sensitive and not always easy to share in a responsible way. We propose 10 rules for researchers wishing to share their data. These rules cover the majority of elements to be considered in order to start the commendable process of clinical trial data-sharing:

  • Rule 1: Abide by local legal and regulatory data protection requirements
  • Rule 2: Anticipate the possibility of clinical trial data-sharing before obtaining funding
  • Rule 3: Declare your intent to share data in the registration step
  • Rule 4: Involve research participants
  • Rule 5: Determine the method of data access
  • Rule 6: Remember there are several other elements to share
  • Rule 7: Do not proceed alone
  • Rule 8: Deploy optimal data management to ensure that the data shared is useful
  • Rule 9: Minimize risks
  • Rule 10: Strive for excellence…(More)”

Examining public views on decentralised health data sharing

Paper by Victoria Neumann et al: “In recent years, researchers have begun to explore the use of Distributed Ledger Technologies (DLT), also known as blockchain, in health data sharing contexts. However, there is a significant lack of research that examines public attitudes towards the use of this technology. In this paper, we begin to address this issue and present results from a series of focus groups which explored public views and concerns about engaging with new models of personal health data sharing in the UK. We found that participants were broadly in favour of a shift towards new decentralised models of data sharing. Retaining ‘proof’ of health information stored about patients and the capacity to provide permanent audit trails, enabled by immutable and transparent properties of DLT, were regarded as particularly valuable for our participants and prospective data custodians. Participants also identified other potential benefits such as supporting people to become more health data literate and enabling patients to make informed decisions about how their data was shared and with whom. However, participants also voiced concerns about the potential to further exacerbate existing health and digital inequalities. Participants were also apprehensive about the removal of intermediaries in the design of personal health informatics systems…(More)”.

Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good

Report by National Academies of Sciences, Engineering, and Medicine: “Historically, the U.S. national data infrastructure has relied on the operations of the federal statistical system and the data assets that it holds. Throughout the 20th century, federal statistical agencies aggregated survey responses of households and businesses to produce information about the nation and diverse subpopulations. The statistics created from such surveys provide most of what people know about the well-being of society, including health, education, employment, safety, housing, and food security. The surveys also contribute to an infrastructure for empirical social- and economic-sciences research. Research using survey-response data, with strict privacy protections, led to important discoveries about the causes and consequences of important societal challenges and also informed policymakers. Like other infrastructure, people can easily take these essential statistics for granted. Only when they are threatened do people recognize the need to protect them…(More)”.

Ten lessons for data sharing with a data commons

Article by Robert L. Grossman: “..Lesson 1. Build a commons for a specific community with a specific set of research challenges

Although there are a few data repositories that serve the general scientific community that have proved successful, in general data commons that target a specific user community have proven to be the most successful. The first lesson is to build a data commons for a specific research community that is struggling to answer specific research challenges with data. As a consequence, a data commons is a partnership between the data scientists developing and supporting the commons and the disciplinary scientists with the research challenges.

Lesson 2. Successful commons curate and harmonize the data

Successful commons curate and harmonize the data and produce data products of broad interest to the community. It’s time consuming, expensive, and labor intensive to curate and harmonize data, by much of the value of data commons is centralizing this work so that it can be done once instead of many times by each group that needs the data. These days, it is very easy to think of a data commons as a platform containing data, not spend the time curating or harmonizing it, and then be surprised that the data in the commons is not used more widely used and its impact is not as high as expected.

Lesson 3. It’s ultimately about the data and its value to generate new research discoveries

Despite the importance of a study, few scientists will try to replicate previously published studies. Instead, data is usually accessed if it can lead to a new high impact paper. For this reason, data commons play two different but related roles. First, they preserve data for reproducible science. This is a small fraction of the data access, but plays a critical role in reproducible science. Second, data commons make data available for new high value science.

Lesson 4. Reduce barriers to access to increase usage

A useful rule of thumb is that every barrier to data access cuts down access by a factor of 10. Common barriers that reduce use of a commons include: registration vs no-registration; open access vs controlled access; click through agreements vs signing of data usage agreements and approval by data access committees; license restrictions on the use of the data vs no license restrictions…(More)”.

Satellite data: The other type of smartphone data you might not know about

Article by Tommy Cooke et al: “Smartphones determine your location in several ways. The first way involves phones triangulating distances between cell towers or Wi-Fi routers.

The second way involves smartphones interacting with navigation satellites. When satellites pass overhead, they transmit signals to smartphones, which allows smartphones to calculate their own location. This process uses a specialized piece of hardware called the Global Navigation Satellite System (GNSS) chipset. Every smartphone has one.

When these GNSS chipsets calculate navigation satellite signals, they output data in two standardized formats (known as protocols or languages): the GNSS raw measurement protocol and the National Marine Electronics Association protocol (NMEA 0183).

GNSS raw measurements include data such as the distance between satellites and cellphones and measurements of the signal itself.

NMEA 0183 contains similar information to GNSS raw measurements, but also includes additional information such as satellite identification numbers, the number of satellites in a constellation, what country owns a satellite, and the position of a satellite.

NMEA 0183 was created and is governed by the NMEA, a not-for-profit lobby group that is also a marine electronics trade organization. The NMEA was formed at the 1957 New York Boat Show when boating equipment manufacturers decided to build stronger relationships within the electronic manufacturing industry.

In the decades since, the NMEA 0183 data standard has improved marine electronics communications and is now found on a wide variety of non-marine communications devices today, including smartphones…

It is difficult to know who has access to data produced by these protocols. Access to NMEA protocols is only available under licence to businesses for a fee.

GNSS raw measurements, on the other hand, are a universal standard and can be read by different devices in the same way without a license. In 2016, Google allowed industries to have open access to it to foster innovation around device tracking accuracy, precision, analytics about how we move in real-time, and predictions about our movements in the future.

While automated processes can quietly harvest location data — like when a French-based company extracted location data from Salaat First, a Muslim prayer app — these data don’t need to be taken directly from smartphones to be exploited.

Data can be modelled, experimented with, or emulated in licensed devices in labs for innovation and algorithmic development.

Satellite-driven raw measurements from our devices were used to power global surveillance networks like STRIKE3, a now defunct European-led initiative that monitored and reported perceived threats to navigation satellites…(More)”.