Data-Driven Government: The Role of Chief Data Officers


Jane Wiseman for IBM Center for The Business of Government: “Governments at all levels have seen dramatic increases in availability and use of data over the past decade.

The push for data-driven government is currently of intense interest at the federal level as it develops an integrated federal data strategy as part of its goal to “leverage data as a strategic asset.” There is also pending legislation to require agencies to designate chief data officers (CDOs).

This report focuses on the expanding use of data at the federal level and how to best manage it. Ms. Wiseman says: “The purpose of this report is to advance the use of data in government by describing the work of pioneering federal CDOs and providing a framework for thinking about how a new analytics leader might establish his or her office and use data to advance the mission of the agency.”

Ms. Wiseman’s report provides rich profiles of five pioneering CDOs in the federal government and how they have defined their new roles. Based on her research and interviews, she offers insights into how the role of agency CDOs is evolving in different agencies and the reasons agency leaders are establishing these roles.  She also offers advice on how new CDOs can be successful at the federal level, based on the experiences of the pioneers as well as the experiences of state and local CDOs….(More)”.

The Cost-Benefit Revolution


Book by Cass Sunstein: “Why policies should be based on careful consideration of their costs and benefits rather than on intuition, popular opinion, interest groups, and anecdotes.

Opinions on government policies vary widely. Some people feel passionately about the child obesity epidemic and support government regulation of sugary drinks. Others argue that people should be able to eat and drink whatever they like. Some people are alarmed about climate change and favor aggressive government intervention. Others don’t feel the need for any sort of climate regulation. In The Cost-Benefit Revolution, Cass Sunstein argues our major disagreements really involve facts, not values. It follows that government policy should not be based on public opinion, intuitions, or pressure from interest groups, but on numbers—meaning careful consideration of costs and benefits. Will a policy save one life, or one thousand lives? Will it impose costs on consumers, and if so, will the costs be high or negligible? Will it hurt workers and small businesses, and, if so, precisely how much?

As the Obama administration’s “regulatory czar,” Sunstein knows his subject in both theory and practice. Drawing on behavioral economics and his well-known emphasis on “nudging,” he celebrates the cost-benefit revolution in policy making, tracing its defining moments in the Reagan, Clinton, and Obama administrations (and pondering its uncertain future in the Trump administration). He acknowledges that public officials often lack information about costs and benefits, and outlines state-of-the-art techniques for acquiring that information. Policies should make people’s lives better. Quantitative cost-benefit analysis, Sunstein argues, is the best available method for making this happen—even if, in the future, new measures of human well-being, also explored in this book, may be better still…(More)”.

Swarm AI Outperforms in Stanford Medical Study


Press Release: “Stanford University School of Medicine and Unanimous AI presented a new study today showing that a small group of doctors, connected by intelligence algorithms that enable them to work together as a “hive mind,” could achieve higher diagnostic accuracy than the individual doctors or machine learning algorithms alone.  The technology used is called Swarm AI and it empowers networked human groups to combine their individual insights in real-time, using AI algorithms to converge on optimal solutions.

As presented at the 2018 SIIM Conference on Machine Intelligence in Medical Imaging, the study tasked a group of experienced radiologists with diagnosing the presence of pneumonia in chest X-rays. This is one of the most widely performed imaging procedures in the US, with more than 1 million adults hospitalized with pneumonia each year. But, despite this prevalence, accurately diagnosing X-rays is highly challenging with significant variability across radiologists. This makes it both an optimal task for applying new AI technologies, and an important problem to solve for the medical community.

When generating diagnoses using Swarm AI technology, the average error rate was reduced by 33% compared to traditional diagnoses by individual practitioners.  This is an exciting result, showing the potential of AI technologies to amplify the accuracy of human practitioners while maintaining their direct participation in the diagnostic process.

Swarm AI technology was also compared to the state-of-the-art in automated diagnosis using software algorithms that do not employ human practitioners.  Currently, the best system in the world for the automated diagnosing of pneumonia from chest X-rays is the CheXNet system from Stanford University, which made headlines in 2017 by significantly outperforming individual practitioners using deep-learning derived algorithms.

The Swarm AI system, which combines real-time human insights with AI technology, was 22% more accurate in binary classification than the software-only CheXNet system.  In other words, by connecting a group of radiologists into a medical “hive mind”, the hybrid human-machine system was able to outperform individual human doctors as well as the state-of-the-art in deep-learning derived algorithms….(More)”.

Is the Government More Entrepreneurial Than You Think?


 Freakonomics Radio (Podcast): We all know the standard story: our economy would be more dynamic if only the government would get out of the way. The economist Mariana Mazzucato says we’ve got that story backward. She argues that the government, by funding so much early-stage research, is hugely responsible for big successes in tech, pharma, energy, and more. But the government also does a terrible job in claiming credit — and, more important, getting a return on its investment….

Quote:

MAZZUCATO: “…And I’ve been thinking about this especially around the big data and the kind of new questions around privacy with Facebook, etc. Instead of having a situation where all the data basically gets captured, which is citizens’ data, by companies which then, in some way, we have to pay into in terms of accessing these great new services — whether they’re free or not, we’re still indirectly paying. We should have the data in some sort of public repository because it’s citizens’ data. The technology itself was funded by the citizens. What would Uber be without GPS, publicly financed? What would Google be without the Internet, publicly financed? So, the tech was financed from the state, the citizens; it’s their data. Why not completely reverse the current relationship and have that data in a public repository which companies actually have to pay into to get access to it under certain strict conditions which could be set by an independent advisory council?… (More)”

Protecting the Confidentiality of America’s Statistics: Adopting Modern Disclosure Avoidance Methods at the Census Bureau


John Abowd at US Census: “…Throughout our history, we have been leaders in statistical data protection, which we call disclosure avoidance. Other statistical agencies use the terms “disclosure limitation” and “disclosure control.” These terms are all synonymous. Disclosure avoidance methods have evolved since the censuses of the early 1800s, when the only protection used was simply removing names. Executive orders, and a series of laws modified the legal basis for these protections, which were finally codified in the 1954 Census Act (13 U.S.C. Sections 8(b) and 9). We have continually added better and stronger protections to keep the data we publish anonymous and underlying records confidential.

However, historical methods cannot completely defend against the threats posed by today’s technology. Growth in computing power, advances in mathematics, and easy access to large, public databases pose a significant threat to confidentiality. These forces have made it possible for sophisticated users to ferret out common data points between databases using only our published statistics. If left unchecked, those users might be able to stitch together these common threads to identify the people or businesses behind the statistics as was done in the case of the Netflix Challenge.

The Census Bureau has been addressing these issues from every feasible angle and changing rapidly with the times to ensure that we protect the data our census and survey respondents provide us. We are doing this by moving to a new, advanced, and far more powerful confidentiality protection system, which uses a rigorous mathematical process that protects respondents’ information and identity in all of our publications.

The new tool is based on the concept known in scientific and academic circles as “differential privacy.” It is also called “formal privacy” because it provides provable mathematical guarantees, similar to those found in modern cryptography, about the confidentiality protections that can be independently verified without compromising the underlying protections.

“Differential privacy” is based on the cryptographic principle that an attacker should not be able to learn any more about you from the statistics we publish using your data than from statistics that did not use your data. After tabulating the data, we apply carefully constructed algorithms to modify the statistics in a way that protects individuals while continuing to yield accurate results. We assume that everyone’s data are vulnerable and provide the same strong, state-of-the-art protection to every record in our database.

The Census Bureau did not invent the science behind differential privacy. However, we were the first organization anywhere to use it when we incorporated differential privacy into the OnTheMap application in 2008. It was used in this event to protect block-level residential population data. Recently, Google, Apple, Microsoft, and Uber have all followed the Census Bureau’s lead, adopting differentially privacy systems as the standard for protecting user data confidentiality inside their browsers (Chrome), products (iPhones), operating systems (Windows 10), and apps (Uber)….(More)”.

Social media big data analytics: A survey


Norjihan Abdul Ghani et al in Computers in Human Behavior: “Big data analytics has recently emerged as an important research area due to the popularity of the Internet and the advent of the Web 2.0 technologies. Moreover, the proliferation and adoption of social media applications have provided extensive opportunities and challenges for researchers and practitioners. The massive amount of data generated by users using social media platforms is the result of the integration of their background details and daily activities.

This enormous volume of generated data known as “big data” has been intensively researched recently. A review of the recent works is presented to obtain a broad perspective of the social media big data analytics research topic. We classify the literature based on important aspects. This study also compares possible big data analytics techniques and their quality attributes. Moreover, we provide a discussion on the applications of social media big data analytics by highlighting the state-of-the-art techniques, methods, and the quality attributes of various studies. Open research challenges in big data analytics are described as well….(More)”.

How Social Media Came To The Rescue After Kerala’s Floods


Kamala Thiagarajan at NPR: Devastating rainfall followed by treacherous landslides have killed 210 people since August 8 and displaced over a million in the southern Indian state of Kerala. India’s National Disaster Relief Force launched its biggest ever rescue operation in the state, evacuating over 10,000 people. The Indian army and the navy were deployed as well.

But they had some unexpected assistance.

Thousands of Indian citizens used mobile phone technology and social media platforms to mobilize relief efforts….

In many other cases, it was ordinary folk who harnessed social media and their own resources to play a role in relief and rescue efforts.

As the scope of the disaster became clear, the state government of Kerala reached out to software engineers from around the world. They joined hands with the state-government-run Information Technology Cell, coming together on Slack, a communications platform, to create the website www.keralarescue.in

The website allowed volunteers who were helping with disaster relief in Kerala’s many flood-affected districts to share the needs of stranded people so that authorities could act.

Johann Binny Kuruvilla, a travel blogger, was one of many volunteers. He put in 14-hour shifts at the District Emergency Operations Center in Ernakulam, Kochi.

The first thing he did, he says, was to harness the power of Whatsapp, a critical platform for dispensing information in India. He joined five key Whatsapp groups with hundreds of members who were coordinating rescue and relief efforts. He sent them his number and mentioned that he would be in a position to communicate with a network of police, army and navy personnel. Soon he was receiving an average of 300 distress calls a day from people marooned at home and faced with medical emergencies.

No one trained volunteers like Kuruvilla. “We improvised and devised our own systems to store data,” he says. He documented the information he received on Excel spreadsheets before passing them on to authorities.

He was also the contact point for INSPIRE, a fraternity of mechanical engineering students at a government-run engineering college at Barton Hill in Kerala. The students told him they had made nearly 300 power banks for charging phones, using four 1.5 volt batteries and cables, and, he says, “asked us if we could help them airdrop it to those stranded in flood-affected areas.” A power bank could boost a mobile phone’s charge by 20 percent in minutes, which could be critical for people without access to electricity. Authorities agreed to distribute the power banks, wrapping them in bubble wrap and airdropping them to areas where people were marooned.

Some people took to social media to create awareness of the aftereffects of the flooding.

Anand Appukuttan, 38, is a communications designer. Working as a consultant he currently lives in Chennai, 500 miles by road from Kerala, and designs infographics, mobile apps and software for tech companies. Appukuttan was born and brought up in Kottayam, a city in South West Kerala. When he heard of the devastation caused by the floods, he longed to help. A group of experts on disaster management reached out to him over Facebook on August 18, asking if he would share his time and expertise in creating flyers for awareness; he immediately agreed….(More)”.

Behavioural science and policy: where are we now and where are we going?


Michael Sanders et al in Behavioral Public Policy: “The use of behavioural sciences in government has expanded and matured in the last decade. Since the Behavioural Insights Team (BIT) has been part of this movement, we sketch out the history of the team and the current state of behavioural public policy, recognising that other works have already told this story in detail. We then set out two clusters of issues that have emerged from our work at BIT. The first cluster concerns current challenges facing behavioural public policy: the long-term effects of interventions; repeated exposure effects; problems with proxy measures; spillovers and general equilibrium effects and unintended consequences; cultural variation; ‘reverse impact’; and the replication crisis. The second cluster concerns opportunities: influencing the behaviour of government itself; scaling interventions; social diffusion; nudging organisations; and dealing with thorny problems. We conclude that the field will need to address these challenges and take these opportunities in order to realise the full potential of behavioural public policy….(More)”.

The Risks of Dangerous Dashboards in Basic Education


Lant Pritchett at the Center for Global Development: “On June 1, 2009 Air France flight 447 from Rio de Janeiro to Paris crashed into the Atlantic Ocean killing all 228 people on board. While the Airbus 330 was flying on auto-pilot, the different speed indicators received by the on-board navigation computers started to give conflicting speeds, almost certainly because the pitot tubes responsible for measuring air speed had iced over. Since the auto-pilot could not resolve conflicting signals and hence did not know how fast the plane was actually going, it turned control of the plane over to the two first officers (the captain was out of the cockpit). Subsequent flight simulator trials replicating the conditions of the flight conclude that had the pilots done nothing at all everyone would have lived—nothing was actually wrong; only the indicators were faulty, not the actual speed. But, tragically, the pilots didn’t do nothing….

What is the connection to education?

Many countries’ systems of basic education are in “stall” condition.

A recent paper of Beatty et al. (2018) uses information from the Indonesia Family Life Survey, a representative household survey that has been carried out in several waves with the same individuals since 2000 and contains information on whether individuals can answer simple arithmetic questions. Figure 1, showing the relationship between the level of schooling and the probability of answering a typical question correctly, has two shocking results.

First, the difference in the likelihood a person can answer a simple mathematics question correctly differs by only 20 percent between individuals who have completed less than primary school (<PS)—who can answer correctly (adjusted for guessing) about 20 percent of the time—and those who have completed senior secondary school or more (>=SSS), who answer correctly only about 40 percent of the time. These are simple multiple choice questions like whether 56/84 is the same fraction as (can be reduced to) 2/3, and whether 1/3-1/6 equals 1/6. This means that in an entire year of schooling, less than 2 additional children per 100 gain the ability to answer simple arithmetic questions.

Second, this incredibly poor performance in 2000 got worse by 2014. …

What has this got to do with education dashboards? The way large bureaucracies prefer to work is to specify process compliance and inputs and then measure those as a means of driving performance. This logistical mode of managing an organization works best when both process compliance and inputs are easily “observable” in the economist’s sense of easily verifiable, contractible, adjudicated. This leads to attention to processes and inputs that are “thin” in the Clifford Geertz sense (adopted by James Scott as his primary definition of how a “high modern” bureaucracy and hence the state “sees” the world). So in education one would specify easily-observable inputs like textbook availability, class size, school infrastructure. Even if one were talking about “quality” of schooling, a large bureaucracy would want this too reduced to “thin” indicators, like the fraction of teachers with a given type of formal degree, or process compliance measures, like whether teachers were hired based on some formal assessment.

Those involved in schooling can then become obsessed with their dashboards and the “thin” progress that is being tracked and easily ignore the loud warning signals saying: Stall!…(More)”.

Countries Can Learn from France’s Plan for Public Interest Data and AI


Nick Wallace at the Center for Data Innovation: “French President Emmanuel Macron recently endorsed a national AI strategy that includes plans for the French state to make public and private sector datasets available for reuse by others in applications of artificial intelligence (AI) that serve the public interest, such as for healthcare or environmental protection. Although this strategy fails to set out how the French government should promote widespread use of AI throughout the economy, it will nevertheless give a boost to AI in some areas, particularly public services. Furthermore, the plan for promoting the wider reuse of datasets, particularly in areas where the government already calls most of the shots, is a practical idea that other countries should consider as they develop their own comprehensive AI strategies.

The French strategy, drafted by mathematician and Member of Parliament Cédric Villani, calls for legislation to mandate repurposing both public and private sector data, including personal data, to enable public-interest uses of AI by government or others, depending on the sensitivity of the data. For example, public health services could use data generated by Internet of Things (IoT) devices to help doctors better treat and diagnose patients. Researchers could use data captured by motorway CCTV to train driverless cars. Energy distributors could manage peaks and troughs in demand using data from smart meters.

Repurposed data held by private companies could be made publicly available, shared with other companies, or processed securely by the public sector, depending on the extent to which sharing the data presents privacy risks or undermines competition. The report suggests that the government would not require companies to share data publicly when doing so would impact legitimate business interests, nor would it require that any personal data be made public. Instead, Dr. Villani argues that, if wider data sharing would do unreasonable damage to a company’s commercial interests, it may be appropriate to only give public authorities access to the data. But where the stakes are lower, companies could be required to share the data more widely, to maximize reuse. Villani rightly argues that it is virtually impossible to come up with generalizable rules for how data should be shared that would work across all sectors. Instead, he argues for a sector-specific approach to determining how and when data should be shared.

After making the case for state-mandated repurposing of data, the report goes on to highlight four key sectors as priorities: health, transport, the environment, and defense. Since these all have clear implications for the public interest, France can create national laws authorizing extensive repurposing of personal data without violating the General Data Protection Regulation (GDPR) which allows national laws that permit the repurposing of personal data where it serves the public interest. The French strategy is the first clear effort by an EU member state to proactively use this clause in aid of national efforts to bolster AI….(More)”.