Illuminating Big Data will leave governments in the dark


Robin Wigglesworth in the Financial Times: “Imagine a world where interminable waits for backward-looking, frequently-revised economic data seem as archaically quaint as floppy disks, beepers and a civil internet. This fantasy realm may be closer than you think.

The Bureau of Economic Analysis will soon publish its preliminary estimate for US economic growth in the first three months of the year, finally catching up on its regular schedule after a government shutdown paralysed the agency. But other data are still delayed, and the final official result for US gross domestic product won’t be available until July. Along the way there are likely to be many tweaks.

Collecting timely and accurate data are a Herculean task, especially for an economy as vast and varied as the US’s. But last week’s World Bank-International Monetary Fund’s annual spring meetings offered some clues on a brighter, more digital future for economic data.

The IMF hosted a series of seminars and discussions exploring how the hot new world of Big Data could be harnessed to produce more timely economic figures — and improve economic forecasts. Jiaxiong Yao, an IMF official in its African department, explained how it could use satellites to measure the intensity of night-time lights, and derive a real-time gauge of economic health.

“If a country gets brighter over time, it is growing. If it is getting darker then it probably needs an IMF programme,” he noted. Further sessions explored how the IMF could use machine learning — a popular field of artificial intelligence — to improve its influential but often faulty economic forecasts; and real-time shipping data to map global trade flows.

Sophisticated hedge funds have been mining some of these new “alternative” data sets for some time, but statistical agencies, central banks and multinational organisations such as the IMF and the World Bank are also starting to embrace the potential.

The amount of digital data around the world is already unimaginably vast. As more of our social and economic activity migrates online, the quantity and quality is going to increase exponentially. The potential is mind-boggling. Setting aside the obvious and thorny privacy issues, it is likely to lead to a revolution in the world of economic statistics. …

Yet the biggest issues are not the weaknesses of these new data sets — all statistics have inherent flaws — but their nature and location.

Firstly, it depends on the lax regulatory and personal attitudes towards personal data continuing, and there are signs of a (healthy) backlash brewing.

Secondly, almost all of this alternative data is being generated and stored in the private sector, not by government bodies such as the Bureau of Economic Analysis, Eurostat or the UK’s Office for National Statistics.

Public bodies are generally too poorly funded to buy or clean all this data themselves, meaning hedge funds will benefit from better economic data than the broader public. We might, in fact, need legislation mandating that statistical agencies receive free access to any aggregated private sector data sets that might be useful to their work.

That would ensure that our economic officials and policymakers don’t fly blind in an increasingly illuminated world….(More)”.

Data Collaboratives as an enabling infrastructure for AI for Good


Blog Post by Stefaan G. Verhulst: “…The value of data collaboratives stems from the fact that the supply of and demand for data are generally widely dispersed — spread across government, the private sector, and civil society — and often poorly matched. This failure (a form of “market failure”) results in tremendous inefficiencies and lost potential. Much data that is released is never used. And much data that is actually needed is never made accessible to those who could productively put it to use.

Data collaboratives, when designed responsibly, are the key to addressing this shortcoming. They draw together otherwise siloed data and a dispersed range of expertise, helping match supply and demand, and ensuring that the correct institutions and individuals are using and analyzing data in ways that maximize the possibility of new, innovative social solutions.

Roadmap for Data Collaboratives

Despite their clear potential, the evidence base for data collaboratives is thin. There’s an absence of a systemic, structured framework that can be replicated across projects and geographies, and there’s a lack of clear understanding about what works, what doesn’t, and how best to maximize the potential of data collaboratives.

At the GovLab, we’ve been working to address these shortcomings. For emerging economies considering the use of data collaboratives, whether in pursuit of Artificial Intelligence or other solutions, we present six steps that can be considered in order to create data collaborative that are more systematic, sustainable, and responsible.

The need for making Data Collaboratives Systematic, Sustainable and Responsible
  • Increase Evidence and Awareness
  • Increase Readiness and Capacity
  • Address Data Supply and Demand Inefficiencies and Uncertainties
  • Establish a New “Data Stewards” Function
  • Develop and strengthen policies and governance practices for data collaboration

Renovating Democracy: Governing in the Age of Globalization and Digital Capitalism


Book by Nathan Gardels and Nicolas Berggruen: “The rise of populism in the West and the rise of China in the East have stirred a rethinking of how democratic systems work—and how they fail. The impact of globalism and digital capitalism is forcing worldwide attention to the starker divide between the “haves” and the “have-nots,” challenging how we think about the social contract.

With fierce clarity and conviction, Renovating Democracy tears down our basic structures and challenges us to conceive of an alternative framework for governance. To truly renovate our global systems, the authors argue for empowering participation without populism by integrating social networks and direct democracy into the system with new mediating institutions that complement representative government. They outline steps to reconfigure the social contract to protect workers instead of jobs, shifting from a “redistribution” after wealth to “pre-distribution” with the aim to enhance the skills and assets of those less well-off. Lastly, they argue for harnessing globalization through “positive nationalism” at home while advocating for global cooperation—specifically with a partnership with China—to create a viable rules-based world order. 

Thought provoking and persuasive, Renovating Democracy serves as a point of departure that deepens and expands the discourse for positive change in governance….(More)”.

Safeguards for human studies can’t cope with big data


Nathaniel Raymond at Nature: “One of the primary documents aiming to protect human research participants was published in the US Federal Register 40 years ago this week. The Belmont Report was commissioned by Congress in the wake of the notorious Tuskegee syphilis study, in which researchers withheld treatment from African American men for years and observed how the disease caused blindness, heart disease, dementia and, in some cases, death.

The Belmont Report lays out core principles now generally required for human research to be considered ethical. Although technically governing only US federally supported research, its influence reverberates across academia and industry globally. Before academics with US government funding can begin research involving humans, their institutional review boards (IRBs) must determine that the studies comply with regulation largely derived from a document that was written more than a decade before the World Wide Web and nearly a quarter of a century before Facebook.

It is past time for a Belmont 2.0. We should not be asking those tasked with protecting human participants to single-handedly identify and contend with the implications of the digital revolution. Technological progress, including machine learning, data analytics and artificial intelligence, has altered the potential risks of research in ways that the authors of the first Belmont report could not have predicted. For example, Muslim cab drivers can be identified from patterns indicating that they stop to pray; the Ugandan government can try to identify gay men from their social-media habits; and researchers can monitor and influence individuals’ behaviour online without enrolling them in a study.

Consider the 2014 Facebook ‘emotional contagion study’, which manipulated users’ exposure to emotional content to evaluate effects on mood. That project, a collaboration with academic researchers, led the US Department of Health and Human Services to launch a long rule-making process that tweaked some regulations governing IRBs.

A broader fix is needed. Right now, data science overlooks risks to human participants by default….(More)”.

Data Cultures, Culture as Data


Introduction to Special Issue of Cultural Analytics by Amelia Acker and Tanya Clement: “Data have become pervasive in research in the humanities and the social sciences. New areas, objects, and situations for study have developed; and new methods for working with data are shepherded by new epistemologies and (potential) paradigm shifts. But data didn’t just happen to us. We have happened to data. In every field, scholars are drawing boundaries between data and humans as if making meaning with data is innocent work. But these boundaries are never innocent. Questions are emerging about the relationships of culture to data—urgent questions that focus on the codification (or code-ification) of social and cultural bias and the erosion of human agency, subjectivity, and identity.

For this special issue of Cultural Analytics we invited submissions to respond to these concerns as they relate to the proximity and distance between the creation of data and its collection; the nature of data as object or content; modes and contexts of data circulation, dissemination and preservation; histories and imaginary data futures; data expertise; data and technological progressivism; the cultivation and standardization of data; and the cultures, communities, and consciousness of data production. The contributions we received ranged in type from research or theory articles to data reviews and opinion pieces responding to the theme of “data cultures”. Each contribution asks questions we should all be asking: What is the role we play in the data cultures/culture as data we form around sociomaterial practices? How can we better understand how these practices effect, and affect, the materialization of subjects, objects, and the relations between them? How can we engage our data culture(s) in practical, critical, and generative ways? As Karen Barad writes, “We are responsible for the world in which we live not because it is an arbitrary construction of our choosing, but because it is sedimented out of particular practices that we have a role in shaping.”1Ultimately, our contributors are focused on this central concern: where is our agency in the responsibility of shaping data cultures? What role can scholarship play in better understanding our culture as data?…(More)”.

There Are Better Ways to Do Democracy


Article by Peter Coy: “The Brexit disaster has stained the reputation of direct democracy. The United Kingdom’s trauma began in 2016, when then-Prime Minister David Cameron miscalculated that he could strengthen Britain’s attachment to the European Union by calling a referendum on it. The Leave campaign made unkeepable promises about Brexit’s benefits. Voters spent little time studying the facts because there was a vanishingly small chance that any given vote would make the difference by breaking a tie. Leave won—and Google searches for “What is the EU” spiked after the polls closed.

Brexit is only one manifestation of a global problem. Citizens want elected officials to be as responsive as Uber drivers, but they don’t always take their own responsibilities seriously. This problem isn’t new. America’s Founding Fathers worried that democracy would devolve into mob rule; the word “democracy” appears nowhere in the Declaration of Independence or the Constitution.

While fears about democratic dysfunction are understandable, there are ways to make voters into real participants in the democratic process without giving in to mobocracy. Instead of referendums, which often become lightning rods for extremism, political scientists say it’s better to make voters think like jurors, whose decisions affect the lives and fortunes of others.

Guided deliberation, also known as deliberative democracy, is one way to achieve that. Ireland used it in 2016 and 2017 to help decide whether to repeal a constitutional amendment that banned abortion in most cases. A 99-person Citizens’ Assembly was selected to mirror the Irish population. It met over five weekends to evaluate input from lawyers and obstetricians, pro-life and pro-choice groups, and more than 13,000 written submissions from the public, guided by a chairperson from the Irish supreme court. Together they concluded that the legislature should have the power to allow abortion under a broader set of conditions, a recommendation that voters approved in a 2018 referendum; abortion in Ireland became legal in January 2019.

Done right, deliberative democracy brings out the best in citizens. “My experience shows that some of the most polarising issues can be tackled in this manner,” Louise Caldwell, an Irish assembly member, wrote in a column for the Guardian in January. India’s village assemblies, which involve all the adults in local decision-making, are a form of deliberative democracy on a grand scale. A March article in the journal Science says that “evidence from places such as Colombia, Belgium, Northern Ireland, and Bosnia shows that properly structured deliberation can promote recognition, understanding, and learning.” Even French President Emmanuel Macron has used it, convening a three-month “great debate” to solicit the public’s views on some of the issues raised by the sometimes-violent Yellow Vest movement. On April 8, Prime Minister Edouard Philippe presented one key finding: The French have “zero tolerance” for new taxes…(More)”.

The Technology Fallacy: How People Are the Real Key to Digital Transformation


Book by Gerald C. Kane, Anh Nguyen Phillips, Jonathan R. Copulsky and Garth R. Andrus: “Digital technologies are disrupting organizations of every size and shape, leaving managers scrambling to find a technology fix that will help their organizations compete. This book offers managers and business leaders a guide for surviving digital disruptions—but it is not a book about technology. It is about the organizational changes required to harness the power of technology. The authors argue that digital disruption is primarily about people and that effective digital transformation involves changes to organizational dynamics and how work gets done. A focus only on selecting and implementing the right digital technologies is not likely to lead to success. The best way to respond to digital disruption is by changing the company culture to be more agile, risk tolerant, and experimental.

The authors draw on four years of research, conducted in partnership with MIT Sloan Management Review and Deloitte, surveying more than 16,000 people and conducting interviews with managers at such companies as Walmart, Google, and Salesforce. They introduce the concept of digital maturity—the ability to take advantage of opportunities offered by the new technology—and address the specifics of digital transformation, including cultivating a digital environment, enabling intentional collaboration, and fostering an experimental mindset. Every organization needs to understand its “digital DNA” in order to stop “doing digital” and start “being digital.”

Digital disruption won’t end anytime soon; the average worker will probably experience numerous waves of disruption during the course of a career. The insights offered by The Technology Fallacy will hold true through them all….(More)”.

Digital Health Data And Information Sharing: A New Frontier For Health Care Competition?


Paper by Lucia Savage, Martin Gaynor and Julie Adler-Milstein: “There are obvious benefits to having patients’ health information flow across health providers. Providers will have more complete information about patients’ health and treatment histories, allowing them to make better treatment recommendations, and avoid unnecessary and duplicative testing or treatment. This should result in better and more efficient treatment, and better health outcomes. Moreover, the federal government has provided substantial incentives for the exchange of health information. Since 2009, the federal government has spent more than $40 billion to ensure that most physicians and hospitals use electronic health records, and to incentivize the use of electronic health information and health information exchange (the enabling statute is the Health Information Technology for Clinical Health Act), and in 2016 authorized substantial fines for failing to share appropriate information.

Yet, in spite of these incentives and the clear benefits to patients, the exchange of health information remains limited. There is evidence that this limited exchange in due in part to providers and platforms attempting to retain, rather than share, information (“information blocking”). In this article we examine legal and business reasons why health information may not be flowing. In particular, we discuss incentives providers and platforms can have for information blocking as a means to maintain or enhance their market position and thwart competition. Finally, we recommend steps to better understand whether the absence of information exchange, is due to information blocking that harms competition and consumers….(More)”

Synthetic data: innovation for public good


Blog Post by Catrin Cheung: “What is synthetic data, and how can it be used for public good? ….Synthetic data are artificially generated data that have the look and structure of real data, but do not contain any information on individuals. They also contain more general characteristics that are used to find patterns in the data.

They are modelled on real data, but designed in a way which safeguards the legal, ethical and confidentiality requirements of the original data. Given their resemblance to the original data, synthetic data are useful in a range of situations, for example when data is sensitive or missing. They are used widely as teaching materials, to test code or mathematical models, or as training data for machine learning models….

There’s currently a wealth of research emerging from the health sector, as the nature of data published is often sensitive. Public Health England have synthesised cancer data which can be freely accessed online. NHS Scotland are making advances in cutting-edge machine learning methods such as Variational Auto Encoders and Generative Adversarial Networks (GANs).

There is growing interest in this area of research, and its influence extends beyond the statistical community. While the Data Science Campus have also used GANs to generate synthetic data in their latest research, its power is not limited to data generation. It can be trained to construct features almost identical to our own across imagery, music, speech and text. In fact, GANs have been used to create a painting of Edmond de Belamy, which sold for $432,500 in 2018!

Within the ONS, a pilot to create synthetic versions of securely held Labour Force Survey data has been carried out using a package in R called “synthpop”. This synthetic dataset can be shared with approved researchers to de-bug codes, prior to analysis of data held in the Secure Research Service….

Although much progress is done in this field, one challenge that persists is guaranteeing the accuracy of synthetic data. We must ensure that the statistical properties of synthetic data match properties of the original data.

Additional features, such as the presence of non-numerical data, add to this difficult task. For example, if something is listed as “animal” and can take the possible values “dog”,”cat” or “elephant”, it is difficult to convert this information into a format suitable for precise calculations. Furthermore, given that datasets have different characteristics, there is no straightforward solution that can be applied to all types of data….particular focus was also placed on the use of synthetic data in the field of privacy, following from the challenges and opportunities identified by the National Statistician’s Quality Review of privacy and data confidentiality methods published in December 2018….(More)”.

e-Democracy: Toward a New Model of (Inter)active Society


Book by Alfredo M. Ronchi: “This book explores the main elements of e-Democracy, the term normally used to describe the implementation of democratic government processes by electronic means. It provides insights into the main technological and human issues regarding governance, government, participation, inclusion, empowerment, procurement and, last but not least, ethical and privacy issues. Its main aim is to bridge the gap between technological solutions, their successful implementation, and the fruitful utilization of the main set of e-Services totally or partially delivered by governments or non-government organizations.


Today, various parameters actively influence e-Services’ success or failure: cultural aspects, organisational issues, bureaucracy and workflows, infrastructure and technology in general, user habits, literacy, capacity or merely interaction design. This includes having a significant population of citizens who are willing and able to adopt and use online services; as well as developing the managerial and technical capability to implement applications that meet citizens’ needs. This book helps readers understand the mutual dependencies involved; further, a selection of success stories and failures, duly commented on, enables readers to identify the right approach to innovation in governmental e-Services….(More)”