Mobile Devices as Stigmatizing Security Sensors: The GDPR and a Future of Crowdsourced ‘Broken Windows’


Paper by Oskar Josef Gstrein and Gerard Jan Ritsema van Eck: “Various smartphone apps and services are available which encourage users to report where and when they feel they are in an unsafe or threatening environment. This user generated content may be used to build datasets, which can show areas that are considered ‘bad,’ and to map out ‘safe’ routes through such neighbourhoods.

Despite certain advantages, this data inherently carries the danger that streets or neighbourhoods become stigmatized and already existing prejudices might be reinforced. Such stigmas might also result in negative consequences for property values and businesses, causing irreversible damage to certain parts of a municipality. Overcoming such an “evidence-based stigma” — even if based on biased, unreviewed, outdated, or inaccurate data — becomes nearly impossible and raises the question how such data should be managed….(More)”.

Can scientists learn to make ‘nature forecasts’ just as we forecast the weather?


 at The Conversation: “We all take weather forecasts for granted, so why isn’t there a ‘nature forecast’ to answer these questions? Enter the new scientific field of ecological forecasting. Ecologists have long sought to understand the natural world, but only recently have they begun to think systematically about forecasting.

Much of the current research in ecological forecasting is focused on long-term projections. It considers questions that play out over decades to centuries, such as how species may shift their ranges in response to climate change, or whether forests will continue to take up carbon dioxide from the atmosphere.

However, in a new article that I co-authored with 18 other scientists from universities, private research institutes and the U.S. Geological Survey, we argue that focusing on near-term forecasts over spans of days, seasons and years will help us better understand, manage and conserve ecosystems. Developing this ability would be a win-win for both science and society….

Big data is driving many of the advances in ecological forecasting. Today ecologists have orders of magnitude more data compared to just a decade ago, thanks to sustained public funding for basic science and environmental monitoring. This investment has given us better sensors, satellites and organizations such as the National Ecological Observatory Network, which collects high-quality data from 81 field sites across the United States and Puerto Rico. At the same time, cultural shifts across funding agencies, research networks and journals have made that data more open and available.

Digital technologies make it possible to access this information more quickly than in the past. Field notebooks have given way to tablets and cell networks that can stream new data into supercomputers in real time. Computing advances allow us to build better models and use more sophisticated statistical methods to produce forecasts….(More)”.

The Modern Research Data Portal: a design pattern for networked, data-intensive science


 et al in PeerJ Computer Science: “We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs.

We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals….(More)”.

Studying Migrant Assimilation Through Facebook Interests


Antoine DuboisEmilio ZagheniKiran Garimella, and Ingmar Weber at arXiv: “Migrants’ assimilation is a major challenge for European societies, in part because of the sudden surge of refugees in recent years and in part because of long-term demographic trends. In this paper, we use Facebook’s data for advertisers to study the levels of assimilation of Arabic-speaking migrants in Germany, as seen through the interests they express online. Our results indicate a gradient of assimilation along demographic lines, language spoken and country of origin. Given the difficulty to collect timely migration data, in particular for traits related to cultural assimilation, the methods that we develop and the results that we provide open new lines of research that computational social scientists are well-positioned to address….(More)”.

Rights-Based and Tech-Driven: Open Data, Freedom of Information, and the Future of Government Transparency


Beth Noveck at the Yale Human Rights and Development Journal: “Open data policy mandates that government proactively publish its data online for the public to reuse. It is a radically different approach to transparency than traditional right-to-know strategies as embodied in Freedom of Information Act (FOIA) legislation in that it involves ex ante rather than ex post disclosure of whole datasets. Although both open data and FOIA deal with information sharing, the normative essence of open data is participation rather than litigation. By fostering public engagement, open data shifts the relationship between state and citizen from a monitorial to a collaborative one, centered around using information to solve problems together. This Essay explores the theory and practice of open data in comparison to FOIA and highlights its uses as a tool for advancing human rights, saving lives, and strengthening democracy. Although open data undoubtedly builds upon the fifty-year legal tradition of the right to know about the workings of one’s government, open data does more than advance government accountability. Rather, it is a distinctly twenty-first century governing practice borne out of the potential of big data to help solve society’s biggest problems. Thus, this Essay charts a thoughtful path toward a twenty-first century transparency regime that takes advantage of and blends the strengths of open data’s collaborative and innovation-centric approach and the adversarial and monitorial tactics of freedom of information regimes….(More)”.

How AI Could Help the Public Sector


Emma Martinho-Truswell in the Harvard Business Review: “A public school teacher grading papers faster is a small example of the wide-ranging benefits that artificial intelligence could bring to the public sector. A.I could be used to make government agencies more efficient, to improve the job satisfaction of public servants, and to increase the quality of services offered. Talent and motivation are wasted doing routine tasks when they could be doing more creative ones.

Applications of artificial intelligence to the public sector are broad and growing, with early experiments taking place around the world. In addition to education, public servants are using AI to help them make welfare payments and immigration decisions, detect fraud, plan new infrastructure projects, answer citizen queries, adjudicate bail hearings, triage health care cases, and establish drone paths.  The decisions we are making now will shape the impact of artificial intelligence on these and other government functions. Which tasks will be handed over to machines? And how should governments spend the labor time saved by artificial intelligence?

So far, the most promising applications of artificial intelligence use machine learning, in which a computer program learns and improves its own answers to a question by creating and iterating algorithms from a collection of data. This data is often in enormous quantities and from many sources, and a machine learning algorithm can find new connections among data that humans might not have expected. IBM’s Watson, for example, is a treatment recommendation-bot, sometimes finding treatments that human doctors might not have considered or known about.

Machine learning program may be better, cheaper, faster, or more accurate than humans at tasks that involve lots of data, complicated calculations, or repetitive tasks with clear rules. Those in public service, and in many other big organizations, may recognize part of their job in that description. The very fact that government workers are often following a set of rules — a policy or set of procedures — already presents many opportunities for automation.

To be useful, a machine learning program does not need to be better than a human in every case. In my work, we expect that much of the “low hanging fruit” of government use of machine learning will be as a first line of analysis or decision-making. Human judgment will then be critical to interpret results, manage harder cases, or hear appeals.

When the work of public servants can be done in less time, a government might reduce its staff numbers, and return money saved to taxpayers — and I am sure that some governments will pursue that option. But it’s not necessarily the one I would recommend. Governments could instead choose to invest in the quality of its services. They can re-employ workers’ time towards more rewarding work that requires lateral thinking, empathy, and creativity — all things at which humans continue to outperform even the most sophisticated AI program….(More)”.

Algorithms show potential in measuring diagnostic errors using big data


Greg Slabodkin at Information Management: “While the problem of diagnostic errors is widespread in medicine, with an estimated 12 million Americans affected annually, a new approach to quantifying and monitoring these errors has the potential to prevent serious patient injuries, including disability or death.

“The single biggest impediment to making progress is the lack of operational measures of diagnostic errors,” says David Newman-Toker, MD, director of the Johns Hopkins Armstrong Institute Center for Diagnostic Excellence. “It’s very difficult to measure because we haven’t had the tools to look for it in a systematic way. And most of the methods that look for diagnostics errors involve training people to do labor-intensive chart reviews.”

However, a new method—called the Symptom-Disease Pair Analysis of Diagnostic Error (SPADE)—uncovers misdiagnosis-related harms using specific algorithms and big data. The automated approach could replace labor-intensive reviews of medical records by hospital staff, which researchers contend are limited by poor clinical documentation, low reliability and inherent bias.

According to Newman-Toker, SPADE utilizes statistical analyses to identify critical patterns that measure the rate of diagnostic error by analyzing large, existing clinical and claims datasets containing hundreds of thousands of patient visits. Specifically, algorithms are leveraged to look for common symptoms prompting a physician visit and then pairing them with one or more diseases that could be misdiagnosed in those clinical contexts….(More)”.

How the Data That Internet Companies Collect Can Be Used for the Public Good


Stefaan G. Verhulst and Andrew Young at Harvard Business Review: “…In particular, the vast streams of data generated through social media platforms, when analyzed responsibly, can offer insights into societal patterns and behaviors. These types of behaviors are hard to generate with existing social science methods. All this information poses its own problems, of complexity and noise, of risks to privacy and security, but it also represents tremendous potential for mobilizing new forms of intelligence.

In a recent report, we examine ways to harness this potential while limiting and addressing the challenges. Developed in collaboration with Facebook, the report seeks to understand how public and private organizations can join forces to use social media data — through data collaboratives — to mitigate and perhaps solve some our most intractable policy dilemmas.

Data Collaboratives: Public-Private Partnerships for Our Data Age 

For all of data’s potential to address public challenges, most data generated today is collected by the private sector. Typically ensconced in corporate databases, and tightly held in order to maintain competitive advantage, this data contains tremendous possible insights and avenues for policy innovation. But because the analytical expertise brought to bear on it is narrow, and limited by private ownership and access restrictions, its vast potential often goes untapped.

Data collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas , including the private sector, government, and civil society , can come together to exchange data and pool analytical expertise in order to create new public value. While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains….

Professionalizing the Responsible Use of Private Data for Public Good

For all its promise, the practice of data collaboratives remains ad hoc and limited. In part, this is a result of the lack of a well-defined, professionalized concept of data stewardship within corporations. Today, each attempt to establish a cross-sector partnership built on the analysis of social media data requires significant and time-consuming efforts, and businesses rarely have personnel tasked with undertaking such efforts and making relevant decisions.

As a consequence, the process of establishing data collaboratives and leveraging privately held data for evidence-based policy making and service delivery is onerous, generally one-off, not informed by best practices or any shared knowledge base, and prone to dissolution when the champions involved move on to other functions.

By establishing data stewardship as a corporate function, recognized within corporations as a valued responsibility, and by creating the methods and tools needed for responsible data-sharing, the practice of data collaboratives can become regularized, predictable, and de-risked.

If early efforts toward this end — from initiatives such as Facebook’s Data for Good efforts in the social media space and MasterCard’s Data Philanthropy approach around finance data — are meaningfully scaled and expanded, data stewards across the private sector can act as change agents responsible for determining what data to share and when, how to protect data, and how to act on insights gathered from the data.

Still, many companies (and others) continue to balk at the prospect of sharing “their” data, which is an understandable response given the reflex to guard corporate interests. But our research has indicated that many benefits can accrue not only to data recipients but also to those who share it. Data collaboration is not a zero-sum game.

With support from the Hewlett Foundation, we are embarking on a two-year project toward professionalizing data stewardship (and the use of data collaboratives) and establishing well-defined data responsibility approaches. We invite others to join us in working to transform this practice into a widespread, impactful means of leveraging private-sector assets, including social media data, to create positive public-sector outcomes around the world….(More)”.

 

Open Data Risk Assessment


Report by the Future of Privacy Forum: “The transparency goals of the open data movement serve important social, economic, and democratic functions in cities like Seattle. At the same time, some municipal datasets about the city and its citizens’ activities carry inherent risks to individual privacy when shared publicly. In 2016, the City of Seattle declared in its Open Data Policy that the city’s data would be “open by preference,” except when doing so may affect individual privacy. To ensure its Open Data Program effectively protects individuals, Seattle committed to performing an annual risk assessment and tasked the Future of Privacy Forum (FPF) with creating and deploying an initial privacy risk assessment methodology for open data.

This Report provides tools and guidance to the City of Seattle and other municipalities navigating the complex policy, operational, technical, organizational, and ethical standards that support privacyprotective open data programs. Although there is a growing body of research regarding open data privacy, open data managers and departmental data owners need to be able to employ a standardized methodology for assessing the privacy risks and benefits of particular datasets internally, without access to a bevy of expert statisticians, privacy lawyers, or philosophers. By optimizing its internal processes and procedures, developing and investing in advanced statistical disclosure control strategies, and following a flexible, risk-based assessment process, the City of Seattle – and other municipalities – can build mature open data programs that maximize the utility and openness of civic data while minimizing privacy risks to individuals and addressing community concerns about ethical challenges, fairness, and equity.

This Report first describes inherent privacy risks in an open data landscape, with an emphasis on potential harms related to re-identification, data quality, and fairness. To address these risks, the Report includes a Model Open Data Benefit-Risk Analysis (“Model Analysis”). The Model Analysis evaluates the types of data contained in a proposed open dataset, the potential benefits – and concomitant risks – of releasing the dataset publicly, and strategies for effective de-identification and risk mitigation. This holistic assessment guides city officials to determine whether to release the dataset openly, in a limited access environment, or to withhold it from publication (absent countervailing public policy considerations). …(More)”.

After Big Data: The Coming Age of “Big Indicators”


Andrew Zolli at the Stanford Social Innovation Review: “Consider, for a moment, some of the most pernicious challenges facing humanity today: the increasing prevalence of natural disasters; the systemic overfishing of the world’s oceans; the clear-cutting of primeval forests; the maddening persistence of poverty; and above all, the accelerating effects of global climate change.

Each item in this dark litany inflicts suffering on the world in its own, awful way. Yet as a group, they share some common characteristics. Each problem is messy, with lots of moving parts. Each is riddled with perverse incentives, which can lead local actors to behave in a way that is not in the common interest. Each is opaque, with dynamics that are only partially understood, even by experts; each can, as a result, often be made worse by seemingly rational and well-intentioned interventions. When things do go wrong, each has consequences that diverge dramatically from our day-to-day experiences, making their full effects hard to imagine, predict, and rehearse. And each is global in scale, raising questions about who has the legal obligation to act—and creating incentives for leaders to disavow responsibility (and sometimes even question the legitimacy of the problem itself).

With dynamics like these, it’s little wonder systems theorists label these kinds of problems “wicked” or even “super wicked.” It’s even less surprising that these challenges remain, by and large, externalities to the global system—inadequately measured, perennially underinvested in, and poorly accounted for—until their consequences spill disastrously and expensively into view.

For real progress to occur, we’ve got to move these externalities into the global system, so that we can fully assess their costs, and so that we can sufficiently incentivize and reward stakeholders for addressing them and penalize them if they don’t. And that’s going to require a revolution in measurement, reporting, and financial instrumentation—the mechanisms by which we connect global problems with the resources required to address them at scale.

Thankfully, just such a revolution is under way.

It’s a complex story with several moving parts, but it begins with important new technical developments in three critical areas of technology: remote sensing and big data, artificial intelligence, and cloud computing.

Remote sensing and big data allow us to collect unprecedented streams of observations about our planet and our impacts upon it, and dramatic advances in AI enable us to extract the deeper meaning and patterns contained in those vast data streams. The rise of the cloud empowers anyone with an Internet connection to access and interact with these insights, at a fraction of the traditional cost.

In the years to come, these technologies will shift much of the current conversation focused on big data to one focused on “big indicators”—highly detailed, continuously produced, global indicators that track change in the health of the Earth’s most important systems, in real time. Big indicators will form an important mechanism for guiding human action, allow us to track the impact of our collective actions and interventions as never before, enable better and more timely decisions, transform reporting, and empower new kinds of policy and financing instruments. In short, they will reshape how we tackle a number of global problems, and everyone—especially nonprofits, NGOs, and actors within the social and environmental sectors—will play a role in shaping and using them….(More)”.