Data Collaboratives can transform the way civil society organisations find solutions


Stefaan G. Verhulst at Disrupt & Innovate: “The need for innovation is clear: The twenty-first century is shaping up to be one of the most challenging in recent history. From climate change to income inequality to geopolitical upheaval and terrorism: the difficulties confronting International Civil Society Organisations (ICSOs) are unprecedented not only in their variety but also in their complexity. At the same time, today’s practices and tools used by ICSOs seem stale and outdated. Increasingly, it is clear, we need not only new solutions but new methods for arriving at solutions.

Data will likely become more central to meeting these challenges. We live in a quantified era. It is estimated that 90% of the world’s data was generated in just the last two years. We know that this data can help us understand the world in new ways and help us meet the challenges mentioned above. However, we need new data collaboration methods to help us extract the insights from that data.

UNTAPPED DATA POTENTIAL

For all of data’s potential to address public challenges, the truth remains that most data generated today is in fact collected by the private sector – including ICSOs who are often collecting a vast amount of data – such as, for instance, the International Committee of the Red Cross, which generates various (often sensitive) data related to humanitarian activities. This data, typically ensconced in tightly held databases toward maintaining competitive advantage or protecting from harmful intrusion, contains tremendous possible insights and avenues for innovation in how we solve public problems. But because of access restrictions and often limited data science capacity, its vast potential often goes untapped.

DATA COLLABORATIVES AS A SOLUTION

Data Collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas — including the private sector, government, and civil society — come together to exchange data and pool analytical expertise.

While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains. Importantly several ICSOs have started to collaborate with others around their own data and that of the private and public sector. For example:

  • Several civil society organisations, academics, and donor agencies are partnering in the Health Data Collaborative to improve the global data infrastructure necessary to make smarter global and local health decisions and to track progress against the Sustainable Development Goals (SDGs).
  • Additionally, the UN Office for the Coordination of Humanitarian Affairs (UNOCHA) built Humanitarian Data Exchange (HDX), a platform for sharing humanitarian from and for ICSOs – including Caritas, InterAction and others – donor agencies, national and international bodies, and other humanitarian organisations.

These are a few examples of Data Collaboratives that ICSOs are participating in. Yet, the potential for collaboration goes beyond these examples. Likewise, so do the concerns regarding data protection and privacy….(More)”.

The future of statistics and data science


Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.

Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.

Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.

First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.

Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….

Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.

Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….

 Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.

Data journalism and the ethics of publishing Twitter data


Matthew L. Williams at Data Driven Journalism: “Collecting and publishing data collected from social media sites such as Twitter are everyday practices for the data journalist. Recent findings from Cardiff University’s Social Data Science Lab question the practice of publishing Twitter content without seeking some form of informed consent from users beforehand. Researchers found that tweets collected around certain topics, such as those related to terrorism, political votes, changes in the law and health problems, create datasets that might contain sensitive content, such as extreme political opinion, grossly offensive comments, overly personal revelations and threats to life (both to oneself and to others). Handling these data in the process of analysis (such as classifying content as hateful and potentially illegal) and reporting has brought the ethics of using social media in social research and journalism into sharp focus.

Ethics is an issue that is becoming increasingly salient in research and journalism using social media data. The digital revolution has outpaced parallel developments in research governance and agreed good practice. Codes of ethical conduct that were written in the mid twentieth century are being relied upon to guide the collection, analysis and representation of digital data in the twenty-first century. Social media is particularly ethically challenging because of the open availability of the data (particularly from Twitter). Many platforms’ terms of service specifically state users’ data that are public will be made available to third parties, and by accepting these terms users legally consent to this. However, researchers and data journalists must interpret and engage with these commercially motivated terms of service through a more reflexive lens, which implies a context sensitive approach, rather than focusing on the legally permissible uses of these data.

Social media researchers and data journalists have experimented with data from a range of sources, including Facebook, YouTube, Flickr, Tumblr and Twitter to name a few. Twitter is by far the most studied of all these networks. This is because Twitter differs from other networks, such as Facebook, that are organised around groups of ‘friends’, in that it is more ‘open’ and the data (in part) are freely available to researchers. This makes Twitter a more public digital space that promotes the free exchange of opinions and ideas. Twitter has become the primary space for online citizens to publicly express their reaction to events of national significance, and also the primary source of data for social science research into digital publics.

The Twitter streaming API provides three levels of data access: the free random 1% that provides ~5M tweets daily and the random 10% and 100% (chargeable or free to academic researchers upon request). Datasets on social interactions of this scale, speed and ease of access have been hitherto unrealisable in the social sciences and journalism, and have led to a flood of journal articles and news pieces, many of which include tweets with full text content and author identity without informed consent. This is presumably because of Twitter’s ‘open’ nature, which leads to the assumption that ‘these are public data’ and using it does not require the rigor and scrutiny of an ethical oversight. Even when these data are scrutinised, journalists don’t need to be convinced by the ‘public data’ argument, due to the lack of a framework to evaluate the potential harms to users. The Social Data Science Lab takes a more ethically reflexive approach to the use of social media data in social research, and carefully considers users’ perceptions, online context and the role of algorithms in estimating potentially sensitive user characteristics.

recent Lab survey conducted into users’ perceptions of the use of their social media posts found the following:

  • 94% were aware that social media companies had Terms of Service
  • 65% had read the Terms of Service in whole or in part
  • 76% knew that when accepting Terms of Service they were giving permission for some of their information to be accessed by third parties
  • 80% agreed that if their social media information is used in a publication they would expect to be asked for consent
  • 90% agreed that if their tweets were used without their consent they should be anonymized…(More)”.

Can Crowdsourcing and Collaboration Improve the Future of Human Health?


Ben Wiegand at Scientific American: “The process of medical research has been likened to searching for a needle in a haystack. With the continued acceleration of novel science and health care technologies in areas like artificial intelligence, digital therapeutics and the human microbiome we have tremendous opportunity to search the haystack in new and exciting ways. Applying these high-tech advances to today’s most pressing health issues increases our ability to address the root cause of disease, intervene earlier and change the trajectory of human health.

Global crowdsourcing forums, like the Johnson & Johnson Innovation QuickFire Challenges, can be incredibly valuable tools for searching the “haystack.” An initiative of JLABS—the no-strings-attached incubators of Johnson & Johnson Innovation—these contests spur scientific diversity through crowdsourcing, inspiring and attracting fresh thinking. They seek to stimulate the global innovation ecosystem through funding, mentorship and access to resources that can kick-start breakthrough ideas.

Our most recent challenge, the Next-Gen Baby Box QuickFire Challenge, focused on updating the 80-year-old “Finnish baby box,” a free, government-issued maternity supply kit for new parents containing such essentials as baby clothing, bath and sleep supplies packaged in a sleep-safe cardboard box. Since it first launched, the baby box has, together with increased use of maternal healthcare services early in pregnancy, helped to significantly reduce the Finnish infant mortality rate from 65 in every 1,000 live births in the 1930s to 2.5 per 1,000 today—one of the lowest rates in the world.

Partnering with Finnish innovation and government groups, we set out to see if updating this popular early parenting tool with the power of personalized health technology might one day impact Finland’s unparalleled high rate of type 1 diabetes. We issued the call globally to help create “the Baby Box of the future” as part of the Janssen and Johnson & Johnson Innovation vision to create a world without disease by accelerating science and delivering novel solutions to prevent, intercept and cure disease. The contest brought together entrepreneurs, researchers and innovators to focus on ideas with the potential to promote child health, detect childhood disease earlier and facilitate healthy parenting.

Incentive challenges like this award participants who have most effectively met a predefined objective or task. It’s a concept that emerged well before our time—as far back as the 18th century—from Napoleon’s Food Preservation Prize, meant to find a way to keep troops fed during battle, to the Longitude Prize for improved marine navigation.

Research shows that prize-based challenges that attract talent across a wide range of disciplines can generate greater risk-taking and yield more dramatic solutions….(More)”.

How Universities Are Tackling Society’s Grand Challenges


Michelle Popowitz and Cristin Dorgelo in Scientific American: “…Universities embarking on Grand Challenge efforts are traversing new terrain—they are making commitments about research deliverables rather than simply committing to invest in efforts related to a particular subject. To mitigate risk, the universities that have entered this space are informally consulting with others regarding effective strategies, but the entire community would benefit from a more formal structure for identifying and sharing “what works.” To address this need, the new Community of Practice for University-Led Grand Challenges—launched at the October 2017 workshop—aims to provide peer support to leaders of university Grand Challenge programs, and to accelerate the adoption of Grand Challenge approaches at more universities supported by cross-sector partnerships.

The university community has identified extensive opportunities for collaboration on these Grand Challenge programs with other sectors:

  • Philanthropy can support the development of new Grand Challenge programs at more universities by establishing planning and administration grant programs, convening experts, and providing funding support for documenting these models through white papers and other publications and for evaluation of these programs over time.
  • Relevant associations and professional development organizations can host learning sessions about Grand Challenges for university leaders and professionals.
  • Companies can collaborate with universities on Grand Challenges research, act as sponsors and hosts for university-led programs and activities, and offer leaders, experts, and other personnel for volunteer advisory roles and tours of duties at universities.
  • Federal, State, and local governments and elected officials can provide support for collaboration among government agencies and offices and the research community on Grand Challenges.

Today’s global society faces pressing, complex challenges across many domains—including health, environment, and social justice. Science (including social sciences), technology, the arts, and humanities have critical roles to play in addressing these challenges and building a bright and prosperous future. Universities are hubs for discovery, building new knowledge, and changing understanding of the world. The public values the role universities play in education; yet as a sector, universities are less effective at highlighting their roles as the catalysts of new industries, homes for the fundamental science that leads to new treatments and products, or sources of the evidence on which policy decisions should be made.

By coming together as universities, collaborating with partners, and aiming for ambitious goals to address problems that might seem unsolvable, universities can show commitment to their communities and become beacons of hope….(More)”.

A science that knows no country: Pandemic preparedness, global risk, sovereign science


Paper by J. Benjamin Hurlbut: “… examines political norms and relationships associated with governance of pandemic risk. Through a pair of linked controversies over scientific access to H5N1 flu virus and genomic data, it examining the duties, obligations, and allocations of authority articulated around the imperative for globally free-flowing information and around the corollary imperative for a science that is set free to produce such information.

It argues that scientific regimes are laying claim to a kind of sovereignty, particularly in moments where scientific experts call into question the legitimacy of claims grounded in national sovereignty, by positioning the norms of scientific practice, including a commitment to unfettered access to scientific information and to the authority of science to declare what needs to be known, as essential to global governance. Scientific authority occupies a constitutional position insofar as it figures centrally in the repertoire of imaginaries that shape how a global community is imagined: what binds that community together and what shared political commitments, norms, and subjection to delegated authority are seen as necessary for it to be rightly governed….(More)”.

Invisible Algorithms, Invisible Politics


Laura Forlano at Public Books: “Over the past several decades, politicians and business leaders, technology pundits and the mainstream media, engineers and computer scientists—as well as science fiction and Hollywood films—have repeated a troubling refrain, championing the shift away from the material and toward the virtual, the networked, the digital, the online. It is as if all of life could be reduced to 1s and 0s, rendering it computable….

Today, it is in design criteria and engineering specifications—such as “invisibility” and “seamlessness,” which aim to improve the human experience with technology—that ethical decisions are negotiated….

Take this example. In late July 2017, the City of Chicago agreed to settle a $38.75 million class-action lawsuit related to its red-light-camera program. Under the settlement, the city will repay drivers who were unfairly ticketed a portion of the cost of their ticket. Over the past five years, the program, ostensibly implemented to make Chicago’s intersections safer, has been mired in corruption, bribery, mismanagement, malfunction, and moral wrongdoing. This confluence of factors has resulted in a great deal of negative press about the project.

The red-light-camera program is just one of many examples of such technologies being adopted by cities in their quest to become “smart” and, at the same time, increase revenue. Others include ticketless parking, intelligent traffic management, ride-sharing platforms, wireless networks, sensor-embedded devices, surveillance cameras, predictive policing software, driverless car testbeds, and digital-fabrication facilities.

The company that produced the red-light cameras, Redflex, claims on their website that their technology can “reliably and consistently address negative driving behaviors and effectively enforce traffic laws on roadways and intersections with a history of crashes and incidents.”Nothing could be further from the truth. Instead, the cameras were unnecessarily installed at some intersections without a history of problems; they malfunctioned; they issued illegal tickets due to short yellow-lights that were not within federal limits; and they issued tickets after enforcement hours. And, due to existing structural inequalities, these difficulties were more likely to negatively impact poorer and less advantaged city residents.

The controversies surrounding red-light cameras in Chicago make visible the ways in which design criteria and engineering specifications—concepts including safety and efficiency, seamlessness and stickiness, convenience and security—are themselves ways of defining the ethics, values, and politics of our cities and citizens. To be sure, these qualities seem clean, comforting, and cuddly at first glance. They are difficult to argue against.

But, like wolves in sheep’s clothing, they gnash their political-economic teeth, and show their insatiable desire to further the goals of neoliberal capitalism. Rather than merely slick marketing, these mundane infrastructures (hardware, software, data, and services) negotiate ethical questions around what kinds of societies we aspire to, what kind of cities we want to live in, what kinds of citizens we can become, who will benefit from these tradeoffs, and who will be left out….(More)

Managing Democracy in the Digital Age


Book edited by Julia Schwanholz, Todd Graham and Peter-Tobias Stoll: “In light of the increased utilization of information technologies, such as social media and the ‘Internet of Things,’ this book investigates how this digital transformation process creates new challenges and opportunities for political participation, political election campaigns and political regulation of the Internet. Within the context of Western democracies and China, the contributors analyze these challenges and opportunities from three perspectives: the regulatory state, the political use of social media, and through the lens of the public sphere.

The first part of the book discusses key challenges for Internet regulation, such as data protection and censorship, while the second addresses the use of social media in political communication and political elections. In turn, the third and last part highlights various opportunities offered by digital media for online civic engagement and protest in the public sphere. Drawing on different academic fields, including political science, communication science, and journalism studies, the contributors raise a number of innovative research questions and provide fascinating theoretical and empirical insights into the topic of digital transformation….(More)”.

A Really Bad Blockchain Idea: Digital Identity Cards for Rohingya Refugees


Wayan Vota at ICTworks: “The Rohingya Project claims to be a grassroots initiative that will empower Rohingya refugees with a blockchain-leveraged financial ecosystem tied to digital identity cards….

What Could Possibly Go Wrong?

Concerns about Rohingya data collection are not new, so Linda Raftree‘s Facebook post about blockchain for biometrics started a spirited discussion on this escalation of techno-utopia. Several people put forth great points about the Rohingya Project’s potential failings. For me, there were four key questions originating in the discussion that we should all be debating:

1. Who Determines Ethnicity?

Ethnicity isn’t a scientific way to categorize humans. Ethnic groups are based on human constructs such as common ancestry, language, society, culture, or nationality. Who are the Rohingya Project to be the ones determining who is Rohingya or not? And what is this rigorous assessment they have that will do what science cannot?

Might it be better not to perpetuate the very divisions that cause these issues? Or at the very least, let people self-determine their own ethnicity.

2. Why Digitally Identify Refugees?

Let’s say that we could group a people based on objective metrics. Should we? Especially if that group is persecuted where it currently lives and in many of its surrounding countries? Wouldn’t making a list of who is persecuted be a handy reference for those who seek to persecute more?

Instead, shouldn’t we focus on changing the mindset of the persecutors and stop the persecution?

3. Why Blockchain for Biometrics?

How could linking a highly persecuted people’s biometric information, such as fingerprints, iris scans, and photographs, to a public, universal, and immutable distributed ledger be a good thing?

Might it be highly irresponsible to digitize all that information? Couldn’t that data be used by nefarious actors to perpetuate new and worse exploitation of Rohingya? India has already lost Aadhaar data and the Equafax lost Americans’ data. How will the small, lightly funded Rohingya Project do better?

Could it be possible that old-fashioned paper forms are a better solution than digital identity cards? Maybe laminate them for greater durability, but paper identity cards can be hidden, even destroyed if needed, to conceal information that could be used against the owner.

4. Why Experiment on the Powerless?

Rohingya refugees already suffer from massive power imbalances, and now they’ll be asked to give up their digital privacy, and use experimental technology, as part of an NGO’s experiment, in order to get needed services.

Its not like they’ll have the agency to say no. They are homeless, often penniless refugees, who will probably have no realistic way to opt-out of digital identity cards, even if they don’t want to be experimented on while they flee persecution….(More)”

Our Hackable Political Future


Henry J. Farrell and Rick Perlstein at the New York Times: “….A program called Face2Face, developed at Stanford, films one person speaking, then manipulates that person’s image to resemble someone else’s. Throw in voice manipulation technology, and you can literally make anyone say anything — or at least seem to….

Another harrowing potential is the ability to trick the algorithms behind self-driving cars to not recognize traffic signs. Computer scientists have shown that nearly invisible changes to a stop sign can fool algorithms into thinking it says yield instead. Imagine if one of these cars contained a dissident challenging a dictator.

In 2007, Barack Obama’s political opponents insisted that footage existed of Michelle Obama ranting against “whitey.” In the future, they may not have to worry about whether it actually existed. If someone called their bluff, they may simply be able to invent it, using data from stock photos and pre-existing footage.

The next step would be one we are already familiar with: the exploitation of the algorithms used by social media sites like Twitter and Facebook to spread stories virally to those most inclined to show interest in them, even if those stories are fake.

It might be impossible to stop the advance of this kind of technology. But the relevant algorithms here aren’t only the ones that run on computer hardware. They are also the ones that undergird our too easily hacked media system, where garbage acquires the perfumed scent of legitimacy with all too much ease. Editors, journalists and news producers can play a role here — for good or for bad.

Outlets like Fox News spread stories about the murder of Democratic staff members and F.B.I. conspiracies to frame the president. Traditional news organizations, fearing that they might be left behind in the new attention economy, struggle to maximize “engagement with content.”

This gives them a built-in incentive to spread informational viruses that enfeeble the very democratic institutions that allow a free media to thrive. Cable news shows consider it their professional duty to provide “balance” by giving partisan talking heads free rein to spout nonsense — or amplify the nonsense of our current president.

It already feels as though we are living in an alternative science-fiction universe where no one agrees on what it true. Just think how much worse it will be when fake news becomes fake video. Democracy assumes that its citizens share the same reality. We’re about to find out whether democracy can be preserved when this assumption no longer holds….(More)”.