The Causes, Costs and Consequences of Bad Government Data


Katherine Barrett & Richard Greene in Governing: “Data is the lifeblood of state government. It’s the crucial commodity that’s necessary to manage projects, avoid fraud, assess program performance, keep the books in balance and deliver services efficiently. But even as the trend toward greater reliance on data has accelerated over the past decades, the information itself has fallen dangerously short of the mark. Sometimes it doesn’t exist at all. But worse than that, all too often it’s just wrong.

There are examples everywhere. Last year, the California auditor’s office issued a report that looked at accounting records at the State Controller’s Office to see whether it was accurately recording sick leave and vacation credits. “We found circumstances where instead of eight hours, it was 80 and in one case, 800,” says Elaine Howle, the California state auditor. “And the system didn’t have controls to say that’s impossible.” The audit found 200,000 questionable hours of leave due to data entry errors, with a value of $6 million.

Mistakes like that are embarrassing, and can lead to unequal treatment of valued employees. Sometimes, however, decisions made with bad data can have deeper consequences. In 2012, the secretary of environmental protection in Pennsylvania told Congress that there was no evidence the state’s water quality had been affected by fracking. “Tens of thousands of wells have been hydraulically fractured in Pennsylvania,” he said, “without any indication that groundwater quality has been impacted.”

But by August 2014, the same department published a list of 248 incidents of damage to well water due to gas development. Why didn’t the department pick up on the water problems sooner? A key reason was that the data collected by its six regional offices had not been forwarded to the central office. At the same time, the regions differed greatly in how they collected, stored, transmitted and dealt with the information. An audit concluded that Pennsylvania’s complaint tracking system for water quality was ineffective and failed to provide “reliable information to effectively manage the program.”

When data is flawed, the consequences can reach throughout the entire government enterprise. Services are needlessly duplicated; evaluation of successful programs is difficult; tax dollars go uncollected; infrastructure maintenance is conducted inefficiently; health-care dollars are wasted. The list goes on and on. Increasingly, states are becoming aware of just how serious the problem is. “The poor quality of government data,” says Dave Yost, Ohio’s state auditor, “is probably the most important emerging trend for government executives, across the board, at all levels.”

Just how widespread a problem is data quality? In aGoverning telephone survey with more than 75 officials in 46 states, about 7 out of 10 said that data problems were frequently or often an impediment to doing their business effectively. No one who worked with program data said this was rarely the case. (View the full results of the survey in this infographic.)…(More)

See also: Bad Data Is at All Levels of Government and The Next Big Thing in Data Analytics

Disruptive Technology that Could Transform Government-Citizen Relationships


David Raths at GovTech: “William Gibson, the science fiction writer who coined the term “cyberspace,” once said: “The future is already here — it’s just not very evenly distributed.” That may be exactly the way to look at the selection of disruptive technologies we have chosen to highlight in eight critical areas of government, ranging from public safety to health to transportation. ….

PUBLIC SAFETY: WEARABLE TECH IS TRANSFORMING EMERGENCY RESPONSE

The wearable technology market is expected to grow from $20 billion in 2015 to almost $70 billion in 2025, according to research firm IDTechEx. As commercial applications bloom, more will find their way into the public sector and emergency response.

This year has seen an increase in the number of police departments using body cameras. And already under development are wireless devices that monitor a responder’s breathing, heart rate and blood pressure, as well as potentially harmful environmental conditions, and relay concerns back to incident command.

But rather than sitting back and waiting for the market to develop, the U.S. Department of Homeland Security is determined to spur innovation in the field. DHS’ research and development arm is funding a startup accelerator program called Emerge managed by the Center for Innovative Technology (CIT), a Virginia-based nonprofit. Two accelerators, in Texas and Illinois, will work with 10 to 15 startups this year to develop wearable products and adopt them for first responder use….

HEALTH & HUMAN SERVICES: ‘HOT-SPOTTING’ FOR POPULATION HEALTH MANAGEMENT

A hot health-care trend is population health management: using data to improve health at a community level as well as an individual level. The growth in sophistication of GIS tools has allowed public health researchers to more clearly identify and start addressing health resource disparities.

Dr. Jeffrey Brenner, a Camden, N.J.-based physician, uses data gathered in a health information exchange (HIE) to target high-cost individuals. The Camden Coalition of Healthcare Providers uses the HIE data to identify high-cost “hot spots” — high-rise buildings where a large number of hospital emergency room “super users” live. By identifying and working with these individuals on patient-centered care coordination issues, the coalition has been able to reduce emergency room use and in-patient stays….

PARKS & RECREATION: TRACKING TREES FOR A BETTER FUTURE

A combination of advances in mobile data collection systems and geocoding lets natural resources and parks agencies be more proactive about collecting tree data, managing urban forests and quantifying their value, as forests become increasingly important resources in an era of climate change.

Philadelphia Parks and Recreation has added approximately 2 million trees to its database in the past few years. It plans to create a digital management system for all of them. Los Angeles City Parks uses the Davey Tree Expert Co.’s Web-based TreeKeeper management software to manage existing tree inventories and administer work orders. The department can also more easily look at species balance to manage against pests, disease and drought….

CORRECTIONS: VIDEO-BASED TOOLS TRANSFORM PRISONS AND JAILS

Videoconferencing is disrupting business as usual in U.S. jails and prisons in two ways: One is the rising use of telemedicine to reduce inmate health-care costs and to increase access to certain types of care for prisoners. The other is video visitation between inmates and families.

A March 2015 report by Southern California Public Radio noted that the federal court-appointed receiver overseeing inmate health care in California is reviewing telemedicine capabilities to reduce costly overtime billing by physicians and nurses at prisons. In one year, overtime has more than doubled for this branch of corrections, from more than $12 million to nearly $30 million….

FINANCE & BUDGETING: DATA PORTALS OFFER TRANSPARENCY AT UNPRECEDENTED LEVELS

The transparency and open data movements have hit the government finance sector in a big way and promise to be an area of innovation in the years ahead.

A partnership between Ohio Treasurer Josh Mandel and the finance visualization startup OpenGov will result in one of the most sweeping statewide transparency efforts to date.

The initiative offers 3,900-plus local governments — from townships, cities and counties to school districts and more — a chance to place revenues and expenditures online free of charge through the state’s budget transparency site OhioCheckbook.com. Citizens will be able to track local government revenues and expenditures via interactive graphs that illustrate not only a bird’s-eye view of a budget, but also the granular details of check-by-check spending….

DMV: DRIVERS’ LICENSES: THERE WILL SOON BE AN APP FOR THAT

The laminated driver’s license you keep in your wallet may eventually give way to an app on your smartphone, and that change may have wider significance for how citizens interact digitally with their government. Legislatures in at least three states have seen bills introduced authorizing their transportation departments to begin piloting digital drivers’ licenses…..

TRANSPORTATION & MASS TRANSIT: BIG BREAKTHROUGHS ARE JUST AROUND THE CORNER

Nothing is likely to be more disruptive to transportation, mass transit and urban planning than the double whammy of connected vehicle technology and autonomous vehicles.
The U.S. Department of Transportation expects great things from the connected vehicles of the future ­— and that future may be just around the corner. Vehicle-to-infrastructure communication capabilities and anonymous information from passengers’ wireless devices relayed through dedicated short-range connections could provide transportation agencies with improved traffic, transit and parking data, making it easier to manage transportation systems and improve traffic safety….. (More)”

How Our Days Became Numbered


Review by Clive Cookson of ‘How Our Days Became Numbered’, by Dan Bouk in the Financial Times: “The unemployed lumber worker whose 1939 portrait adorns the cover of How Our Days Became Numbered has a “face fit for a film star”, as Dan Bouk puts it. But he is not there for his looks. Bouk wants us to focus on his bulging bicep, across which is tattooed “SSN 535-07-5248”: his social security number.

The photograph of Thomas Cave by documentary photographer Dorothea Lange illustrates the high water mark of American respect for statistical labelling. Cave was so proud of his newly issued number that he had it inked forever on his arm.

When the Roosevelt administration introduced the federal social security system in the 1930s, it worked out rates of contribution and benefit on the basis of statistical practices already established by life insurance companies. The industry is at the heart of Bouk’s history of personal data collection and analysis — because it worked out how to measure and predict the health of ordinary Americans in the late 19th and early 20th centuries. (More)”

Scientists Are Hoarding Data And It’s Ruining Medical Research


Ben Goldacre at Buzzfeed: “We like to imagine that science is a world of clean answers, with priestly personnel in white coats, emitting perfect outputs, from glass and metal buildings full of blinking lights.

The reality is a mess. A collection of papers published on Wednesday — on one of the most commonly used medical treatments in the world — show just how bad things have become. But they also give hope.

The papers are about deworming pills that kill parasites in the gut, at extremely low cost. In developing countries, battles over the usefulness of these drugs have become so contentious that some people call them “The Worm Wars.”…

This “deworm everybody” approach has been driven by a single, hugely influential trial published in 2004 by two economists, Edward Miguel and Michael Kremer. This trial, done in Kenya, found that deworming whole schools improved children’s health, school performance, and school attendance. What’s more, these benefits apparently extended to children in schools several miles away, even when those children didn’t get any deworming tablets (presumably, people assumed, by interrupting worm transmission from one child to the next).

A decade later, in 2013, these two economists did something that very few researchers have ever done. They handed over their entire dataset to independent researchers on the other side of the world, so that their analyses could be checked in public. What happened next has every right to kick through a revolution in science and medicine….

This kind of statistical replication is almost vanishingly rare. A recent study set out to find all well-documented cases in which the raw data from a randomized trial had been reanalysed. It found just 37, out of many thousands. What’s more, only five were conducted by entirely independent researchers, people not involved in the original trial.

These reanalyses were more than mere academic fun and games. The ultimate outcomes of the trials changed, with terrifying frequency: One-third of them were so different that the take-home message of the trial shifted.

This matters. Medical trials aren’t conducted out of an abstract philosophical interest, for the intellectual benefit of some rarefied class in ivory towers. Researchers do trials as a service, to find out what works, because they intend to act on the results. It matters that trials get an answer that is not just accurate, but also reliable.

So here we have an odd situation. Independent reanalysis can improve the results of clinical trials, and help us not go down blind alleys, or give the wrong treatment to the wrong people. It’s pretty cheap, compared to the phenomenal administrative cost of conducting a trial. And it spots problems at an alarmingly high rate.

And yet, this kind of independent check is almost never done. Why not? Partly, it’s resources. But more than that, when people do request raw data, all too often the original researchers duck, dive, or simply ignore requests….

Two years ago I published a book on problems in medicine. Front and center in this howl was “publication bias,” the problem of clinical trial results being routinely and legally withheld from doctors, researchers, and patients. The best available evidence — from dozens of studieschasing results for completed trials — shows that around half of all clinical trials fail to report their results. The same is true of industry trials, and academic trials. What’s more, trials with positive results are about twice as likely to post results, so we see a biased half of the literature.

This is a cancer at the core of evidence-based medicine. When half the evidence is withheld, doctors and patients cannot make informed decisions about which treatment is best. When I wrote about this, various people from the pharmaceutical industry cropped up to claim that the problem was all in the past. So I befriended some campaigners, we assembled a group of senior academics, and started the AllTrials.net campaign with one clear message: “All trials must be registered, with their full methods and results reported.”

Dozens of academic studies had been published on the issue, and that alone clearly wasn’t enough. So we started collecting signatures, and we now have more than 85,000 supporters. At the same time we sought out institutional support. Eighty patient groups signed up in the first month, with hundreds more since then. Some of the biggest research funders, and even government bodies, have now signed up.

This week we’re announcing support from a group of 85 pension funds and asset managers, representing more than 3.5 trillion euros in funds, who will be asking the pharma companies they invest in to make plans to ensure that all trials — past, present, and future — report their results properly. Next week, after two years of activity in Europe, we launch our campaign in the U.S….(More)”

Setting High and Compatible Standards


Laura Bacon at Omidyar Network:  “…Standards enable interoperability, replicability, and efficiency. Airplane travel would be chaotic at best and deadly at worst if flights and air traffic control did not use common codes for call signs, flight numbers, location, date, and time. Trains that cross national borders need tracks built to a standard gauge as evidenced by Spain’s experience in making its trains interoperable with the rest of the continent’s.

Standards matter in data collection and publication as well.  This is especially true for those datasets that matter most to people’s lives, such as health, education, agriculture, and water. Disparate standards for basic category definitions like geography and organizations mean that data sources cannot be easily or cost-effectively analyzed for cross-comparison and decision making.

Compatible data standards that enable data being ‘joined up,’ would enable more efficacious logging and use of immunization records, controlling the spread of infectious disease, helping educators prioritize spending based on the greatest needs, and identifying the beneficial owners of companies to help ensure transparent and legal business transactions.

Data: More Valuable When Joined Up

Lots of efforts, time, and money are poured into the generation and publication of open data. And where open data is important in itself, the biggest return on investment is potentially from the inter-linkages among datasets. However, it is very difficult to yield this return because of the now-missing standards and building blocks (e.g., geodata, organizational identifiers, project identifiers) that would enable joining up of data.

Omidyar Network currently supports open data standards for contracting, extractives, budgets, and others. If “joining up” work is not considered and executed at early stages, these standards 1) could evolve in silos and 2) may not reach their full capacity.

Interoperability will not happen automatically; specific investments and efforts must be made to develop the public good infrastructure for the joining up of key datasets….The two organizations leading this project have an impressive track record working in this area. Development Initiatives is a global organization working to empower people to make more effective use of information. In 2013, it commissioned Open Knowledge Foundation to publish a cross-initiative scoping study, Joined-Up Data: Building Blocks for Common Standards, which recommended focus areas, shared learning, and the adoption of joined-up data and common standards for all publishers. Partnering with Development Initiatives is Publish What You Fund,…(More)”

Collective Intelligence in Patient Organisations


New report by Lydia Nicholas and Stefana Broadbent (Nesta):”… examines patient organisations’ ever more critical role as knowledge brokers in an increasingly complex, data-rich healthcare system.

Key findings

  • Patient organisations are important examples of collective intelligence practiced in challenging conditions with the aim of tackling complex problems.
  • With more long term conditions and multimorbidities, more data, more available options in diagnostics, treatments, and care, knowledge is becoming one of the most critical assets of patients seeking optimal care.
  • Patient organisations, working as collectives, are in an excellent position to support the work of translating, assembling and analysing the information involved in healthcare.
  • Innovative patient organisations are already supporting the development of peer relationships, driving ambitious research programmes, sharing skills and unlocking the energy and expertise of patients. But they need support from better tools to extend this critical work.

Unlike many popular examples of collective intelligence such as open source software, people coming to patient organisations are not motivated by pre-existing technical skills, but by urgent personal needs. This makes them a hugely productive site of research.

The ‘thinking challenges’ patients face are enormous and complex, involving an ever-growing store of medical information, the practical and bureaucratic skills of living with a condition. Many go beyond adherence to understanding and partaking in research.

The health care system is under strain from increasing demand and resource pressure. The NHS and other healthcare networks have committed to engage and empower patients and support them in developing expertise, enabling them to take a more active role in their own care. But knowledge tools and systems that engage only with individuals tend to exacerbate existing health care divides. Health knowledge work is hard, and requires time and resources.

In this report we argue that patient organisations have a pivotal role to play in distributing the burden and benefit of knowledge work amongst participants. They need new and better tools to support their work developing connections between the many individuals and institutions of the healthcare system, driving ambitious research programmes, and facilitating peer support….(More)

 

The digital revolution liberating Latin American people


Luis Alberto Moreno in the Financial Times: “Imagine a place where citizens can deal with the state entirely online, where all health records are electronic and the wait for emergency care is just seven minutes. Singapore? Switzerland? Try Colima, Mexico.

Pessimists fear the digital revolution will only widen social and economic disparities in the developing world — particularly in Latin America, the world’s most unequal region. But Colima, though small and relatively prosperous, shows how some of the region’s governments are harnessing these tools to modernise services, improve quality of life and share the benefits of technology more equitably.

In the past 10 years, this state of about 600,000 people has transformed the way government works, going completely digital. Its citizens can carry out 62 procedures online, from applying for permits to filing crime reports. No internet at home? Colima offers hundreds of free WiFi hotspots.

Colombia and Peru are taking broadband to remote corners of their rugged territories. Bogotá has subsidised the ex­pansion of its fibre optic network, which now links virtually every town in the country. Peru is expanding a programme that aims to bring WiFi to schools, hospitals and other public buildings in each of its 25 regions. The Colombian plan, Vive Digital, fosters internet adoption among all its citizens. Taxes on computers, tablets and smartphones have been scrapped. Low-income families have been given vouchers to sign up for broadband. In five years, the percentage of households connected to the internet jumped from 16 per cent to 50 per cent. Among small businesses it soared from 7 per cent to 61 per cent .

Inexpensive devices and ubiquitous WiFi, however, do not guarantee widespread usage. Diego Molano Vega, an architect of Vive Digital, found that many programs designed for customers in developed countries were ill suited to most Colombians. “There are no poor people in Silicon Valley,” he says. Latin American governments should use their purchasing power to push for development of digital services easily adopted by their citizens and businesses. Chile is a leader: it has digitised hundreds of trámites — bureaucratic procedures involving endless forms and queues. In a 4,300km-longcountry of mountains, deserts and forests, this enables access to all sorts of services through the internet. Entrepreneurs can now register businesses online for free in a single day.

In Chile, entrepreneurs can now register new businesses online for free in a single day

Technology can be harnessed to boost equity in education. Brazil’s Mato Grosso do Sul state launched a free online service to prepare high school students for a tough national exam in which a good grade is a prerequisite for admission to federal universities. On average the results of the students who used the service were 31 per cent higher than those of their peers, prompting 10 other states to adopt the system.

Digital tools can also help raise competitiveness in business. Uruguay’s livestock information system keeps track of the country’s cattle. The publicly financed electronic registry ensures every beast can be traced, making it easier to monitor outbreaks of diseases….(More)”

 

Collaborative Innovation


Book by Mitsuru Kodama onDeveloping Health Support Ecosystems…With the development of the aging society and the increased importance of emergency risk management in recent years, a large number of medical care challenges – advancing medical treatments, care & support, pharmacological treatments, greater health awareness, emergency treatments, telemedical treatment and care, the introduction of electronic charts, and rising costs – are emerging as social issues throughout the whole world. Hospitals and other medical institutions must develop and maintain superior management to achieve systems that can provide better medical care, welfare and health while enabling “support innovation.” Key medical care, welfare and health industries play a crucial role in this, but also of importance are management innovation models that enable “collaborative innovation” by closely linking diverse fields such as ICT, energy, electric equipment, machinery and transport.

Looking across different industries, Collaborative Innovation offers new knowledge and insights on the extraordinary value and increasing necessity of collaboration across different organizations in improving the health and lives of people. It breaks new ground with its research theme of building “health support ecosystems,” focusing on protecting people through collaborative innovation. This book opens up new, wide-ranging interdisciplinary academic research domains combining the humanities with science across various areas including general business administration, economics, information technology, medical informatics and drug information science….(More)”

Using Twitter as a data source: An overview of current social media research tools


Wasim Ahmed at the LSE Impact Blog: “I have a social media research blog where I find and write about tools that can be used to capture and analyse data from social media platforms. My PhD looks at Twitter data for health, such as the Ebola outbreak in West Africa. I am increasingly asked why I am looking at Twitter, and what tools and methods there are of capturing and analysing data from other platforms such as Facebook, or even less traditional platforms such as Amazon book reviews. Brainstorming a couple of responses to this question by talking to members of the New Social Media New Social Science network, there are at least six reasons:

  1. Twitter is a popular platform in terms of the media attention it receives and it therefore attracts more research due to its cultural status
  2. Twitter makes it easier to find and follow conversations (i.e., by both its search feature and by tweets appearing in Google search results)
  3. Twitter has hashtag norms which make it easier gathering, sorting, and expanding searches when collecting data
  4. Twitter data is easy to retrieve as major incidents, news stories and events on Twitter are tend to be centred around a hashtag
  5. The Twitter API is more open and accessible compared to other social media platforms, which makes Twitter more favourable to developers creating tools to access data. This consequently increases the availability of tools to researchers.
  6. Many researchers themselves are using Twitter and because of their favourable personal experiences, they feel more comfortable with researching a familiar platform.

It is probable that a combination of response 1 to 6 have led to more research on Twitter. However, this raises another distinct but closely related question: when research is focused so heavily on Twitter, what (if any) are the implications of this on our methods?

As for the methods that are currently used in analysing Twitter data i.e., sentiment analysis, time series analysis (examining peaks in tweets), network analysis etc., can these be applied to other platforms or are different tools, methods and techniques required? In addition to qualitative methods such as content analysis, I have used the following four methods in analysing Twitter data for the purposes of my PhD, below I consider whether these would work for other social media platforms:

  1. Sentiment analysis works well with Twitter data, as tweets are consistent in length (i.e., <= 140) would sentiment analysis work well with, for example Facebook data where posts may be longer?
  2. Time series analysis is normally used when examining tweets overtime to see when a peak of tweets may occur, would examining time stamps in Facebook posts, or Instagram posts, for example, produce the same results? Or is this only a viable method because of the real-time nature of Twitter data?
  3. Network analysis is used to visualize the connections between people and to better understand the structure of the conversation. Would this work as well on other platforms whereby users may not be connected to each other i.e., public Facebook pages?
  4. Machine learning methods may work well with Twitter data due to the length of tweets (i.e., <= 140) but would these work for longer posts and for platforms that are not text based i.e., Instagram?

It may well be that at least some of these methods can be applied to other platforms, however they may not be the best methods, and may require the formulation of new methods, techniques, and tools.

So, what are some of the tools available to social scientists for social media data? In the table below I provide an overview of some the tools I have been using (which require no programming knowledge and can be used by social scientists):…(More)”

Democratising the Data Revolution


Jonathan Gray at Open Knowledge: “What will the “data revolution” do? What will it be about? What will it count? What kinds of risks and harms might it bring? Whom and what will it serve? And who will get to decide?

Today we are launching a new discussion paper on “Democratising the Data Revolution”, which is intended to advance thinking and action around civil society engagement with the data revolution. It looks beyond the disclosure of existing information, towards more ambitious and substantive forms of democratic engagement with data infrastructures.1

It concludes with a series of questions about what practical steps institutions and civil society organisations might take to change what is measured and how, and how these measurements are put to work.

You can download the full PDF report here, or continue to read on in this blog post.

What Counts?

How might civil society actors shape the data revolution? In particular, how might they go beyond the question of what data is disclosed towards looking at what is measured in the first place? To kickstart discussion around this topic, we will look at three kinds of intervention: changing existing forms of measurement, advocating new forms of measurement and undertaking new forms of measurement.

Changing Existing Forms of Measurement

Rather than just focusing on the transparency, disclosure and openness of public information, civil society groups can argue for changing what is measured with existing data infrastructures. One example of this is recent campaigning around company ownership in the UK. Advocacy groups wanted to unpick networks of corporate ownership and control in order to support their campaigning and investigations around tax avoidance, tax evasion and illicit financial flows.

While the UK company register recorded information about “nominal ownership”, it did not include information about so-called “beneficial ownership”, or who ultimately benefits from the ownership and control of companies. Campaigners undertook an extensive programme of activities to advocate for changes and extensions to existing data infrastructures – including via legislation, software systems, and administrative protocols.2

Advocating New Forms of Measurement

As well as changing or recalibrating existing forms of measurement, campaigners and civil society organisations can make the case for the measurement of things which were not previously measured. For example, over the past several decades social and political campaigning has resulted in new indicators about many different issues – such as gender inequality, health, work, disability, pollution or education.3 In such cases activists aimed to establish a given indicator as important and relevant for public institutions, decision makers, and broader publics – in order to, for example, inform policy development or resource allocation.

Undertaking New Forms of Measurement

Historically, many civil society organisations and advocacy groups have collected their own data to make the case for action on issues that they work on – from human rights abuses to endangered species….(More)”