The age of analytics: Competing in a data-driven world


Updated report by the McKinsey Global Institute: “Back in 2011, the McKinsey Global Institute published a report highlighting the transformational potential of big data. Five years later, we remain convinced that this potential has not been overhyped. In fact, we now believe that our 2011 analyses gave only a partial view. The range of applications and opportunities has grown even larger today. The convergence of several technology trends is accelerating progress. The volume of data continues to double every three years as information pours in from digital platforms, wireless sensors, and billions of mobile phones. Data storage capacity has increased, while its cost has plummeted. Data scientists now have unprecedented computing power at their disposal, and they are devising ever more sophisticated algorithms….

There has been uneven progress in capturing value from data and analytics…

  • ƒ The EU public sector: Our 2011 report analyzed how the European Union’s public sector could use data and analytics to make government services more efficient, reduce fraud and errors in transfer payments, and improve tax collection, potentially achieving some €250 billion worth of annual savings. But only about 10 to 20 percent of this has materialized. Some agencies have moved more interactions online, and many (particularly tax agencies) have introduced pre-filled forms. But across Europe and other advanced economies, adoption and capabilities vary greatly. The complexity of existing systems and the difficulty of attracting scarce analytics talent with public-sector salaries have slowed progress. Despite this, we see even wider potential today for societies to use analytics to make more evidence-based decisions in many aspects of government. ƒ

US health care: To date, only 10 to 20 percent of the opportunities we outlined in 2011 have been realized by the US health-care sector. A range of barriers—including a lack of incentives, the difficulty of process and organizational changes, a shortage of technical talent, data-sharing challenges, and regulations—have combined to limit adoption. Within clinical operations, the biggest success has been the shift to electronic medical records, although the vast stores of data they contain have not yet been fully mined. While payers have been slow to capitalize on big data for accounting and pricing, a growing industry now aggregates and synthesizes clinical records, and analytics have taken on new importance in public health surveillance. Many pharmaceutical firms are using analytics in R&D, particularly in streamlining clinical trials. While the health-care sector continues to lag in adoption, there are enormous unrealized opportunities to transform clinical care and deliver personalized medicine… (More)”

Executive Summary (PDF–1MB)

Full Report (PDF–3MB)

Appendix (PDF–533KB)

What does Big Data mean to public affairs research?


Ines Mergel, R. Karl Rethemeyer, and Kimberley R. Isett at LSE’s The Impact Blog: “…Big Data promises access to vast amounts of real-time information from public and private sources that should allow insights into behavioral preferences, policy options, and methods for public service improvement. In the private sector, marketing preferences can be aligned with customer insights gleaned from Big Data. In the public sector however, government agencies are less responsive and agile in their real-time interactions by design – instead using time for deliberation to respond to broader public goods. The responsiveness Big Data promises is a virtue in the private sector but could be a vice in the public.

Moreover, we raise several important concerns with respect to relying on Big Data as a decision and policymaking tool. While in the abstract Big Data is comprehensive and complete, in practice today’sversion of Big Data has several features that should give public sector practitioners and scholars pause. First, most of what we think of as Big Data is really ‘digital exhaust’ – that is, data collected for purposes other than public sector operations or research. Data sets that might be publicly available from social networking sites such as Facebook or Twitter were designed for purely technical reasons. The degree to which this data lines up conceptually and operationally with public sector questions is purely coincidental. Use of digital exhaust for purposes not previously envisioned can go awry. A good example is Google’s attempt to predict the flu based on search terms.

Second, we believe there are ethical issues that may arise when researchers use data that was created as a byproduct of citizens’ interactions with each other or with a government social media account. Citizens are not able to understand or control how their data is used and have not given consent for storage and re-use of their data. We believe that research institutions need to examine their institutional review board processes to help researchers and their subjects understand important privacy issues that may arise. Too often it is possible to infer individual-level insights about private citizens from a combination of data points and thus predict their behaviors or choices.

Lastly, Big Data can only represent those that spend some part of their life online. Yet we know that certain segments of society opt in to life online (by using social media or network-connected devices), opt out (either knowingly or passively), or lack the resources to participate at all. The demography of the internet matters. For instance, researchers tend to use Twitter data because its API allows data collection for research purposes, but many forget that Twitter users are not representative of the overall population. Instead, as a recent Pew Social Media 2016 update shows, only 24% of all online adults use Twitter. Internet participation generally is biased in terms of age, educational attainment, and income – all of which correlate with gender, race, and ethnicity. We believe therefore that predictive insights are potentially biased toward certain parts of the population, making generalisations highly problematic at this time….(More)”

Just good enough data: Figuring data citizenships through air pollution sensing and data stories


Jennifer Gabrys, Helen Pritchard, and Benjamin Barratt in Big Data & Society: “Citizen sensing, or the use of low-cost and accessible digital technologies to monitor environments, has contributed to new types of environmental data and data practices. Through a discussion of participatory research into air pollution sensing with residents of northeastern Pennsylvania concerned about the effects of hydraulic fracturing, we examine how new technologies for generating environmental data also give rise to new problems for analysing and making sense of citizen-gathered data. After first outlining the citizen data practices we collaboratively developed with residents for monitoring air quality, we then describe the data stories that we created along with citizens as a method and technique for composing data. We further mobilise the concept of ‘just good enough data’ to discuss the ways in which citizen data gives rise to alternative ways of creating, valuing and interpreting datasets. We specifically consider how environmental data raises different concerns and possibilities in relation to Big Data, which can be distinct from security or social media studies. We then suggest ways in which citizen datasets could generate different practices and interpretive insights that go beyond the usual uses of environmental data for regulation, compliance and modelling to generate expanded data citizenships….(More)”

From policing to news, how algorithms are changing our lives


Carl Miller at The National: “First, write out the numbers one to 100 in 10 rows. Cross out the one. Then circle the two, and cross out all of the multiples of two. Circle the three, and do likewise. Follow those instructions, and you’ve just completed the first three steps of an algorithm, and an incredibly ancient one. Twenty-three centuries ago, Eratosthenes was sat in the great library of Alexandria, using this process (it is called Eratosthenes’ Sieve) to find and separate prime numbers. Algorithms are nothing new, indeed even the word itself is old. Fifteen centuries after Eratosthenes, Algoritmi de numero Indorum appeared on the bookshelves of European monks, and with it, the word to describe something very simple in essence: follow a series of fixed steps, in order, to achieve a given answer to a given problem. That’s it, that’s an algorithm. Simple.

 Apart from, of course, the story of algorithms is not so simple, nor so humble. In the shocked wake of Donald Trump’s victory in the United States presidential election, a culprit needed to be found to explain what had happened. What had, against the odds, and in the face of thousands of polls, caused this tectonic shift in US political opinion? Soon the finger was pointed. On social media, and especially on Facebook, it was alleged that pro-Trump stories, based on inaccurate information, had spread like wildfire, often eclipsing real news and honestly-checked facts.
But no human editor was thrust into the spotlight. What took centre stage was an algorithm; Facebook’s news algorithm. It was this, critics said, that was responsible for allowing the “fake news” to circulate. This algorithm wasn’t humbly finding prime numbers; it was responsible for the news that you saw (and of course didn’t see) on the largest source of news in the world. This algorithm had somehow risen to become more powerful than any newspaper editor in the world, powerful enough to possibly throw an election.
So why all the fuss? Something is now happening in society that is throwing algorithms into the spotlight. They have taken on a new significance, even an allure and mystique. Algorithms are simply tools but a web of new technologies are vastly increasing the power that these tools have over our lives. The startling leaps forward in artificial intelligence have meant that algorithms have learned how to learn, and to become capable of accomplishing tasks and tackling problems that they were never been able to achieve before. Their learning is fuelled with more data than ever before, collected, stored and connected with the constellations of sensors, data farms and services that have ushered in the age of big data.

Algorithms are also doing more things; whether welding, driving or cooking, thanks to robotics. Wherever there is some kind of exciting innovation happening, algorithms are rarely far away. They are being used in more fields, for more things, than ever before and are incomparably, incomprehensibly more capable than the algorithms recognisable to Eratosthenes….(More)”

Big Data Coming In Faster Than Biomedical Researchers Can Process It


Richard Harris at NPR: “Biomedical research is going big-time: Megaprojects that collect vast stores of data are proliferating rapidly. But scientists’ ability to make sense of all that information isn’t keeping up.

This conundrum took center stage at a meeting of patient advocates, called Partnering For Cures, in New York City on Nov. 15.

On the one hand, there’s an embarrassment of riches, as billions of dollars are spent on these megaprojects.

There’s the White House’s Cancer Moonshot (which seeks to make 10 years of progress in cancer research over the next five years), the Precision Medicine Initiative (which is trying to recruit a million Americans to glean hints about health and disease from their data), The BRAIN Initiative (to map the neural circuits and understand the mechanics of thought and memory) and the International Human Cell Atlas Initiative (to identify and describe all human cell types).

“It’s not just that any one data repository is growing exponentially, the number of data repositories is growing exponentially,” said Dr. Atul Butte, who leads the Institute for Computational Health Sciences at the University of California, San Francisco.

One of the most remarkable efforts is the federal government’s push to get doctors and hospitals to put medical records in digital form. That shift to electronic records is costing billions of dollars — including more than $28 billion alone in federal incentives to hospitals, doctors and others to adopt them. The investment is creating a vast data repository that could potentially be mined for clues about health and disease, the way websites and merchants gather data about you to personalize the online ads you see and for other commercial purposes.

But, unlike the data scientists at Google and Facebook, medical researchers have done almost nothing as yet to systematically analyze the information in these records, Butte said. “As a country, I think we’re investing close to zero analyzing any of that data,” he said.

Prospecting for hints about health and disease isn’t going to be easy. The raw data aren’t very robust and reliable. Electronic medical records are often kept in databases that aren’t compatible with one another, at least without a struggle. Some of the potentially revealing details are also kept as free-form notes, which can be hard to extract and interpret. Errors commonly creep into these records….(More)”

How Should a Society Be?


Brian Christian: “This is another example where AI—in this case, machine-learning methods—intersects with these ethical and civic questions in an ultimately promising and potentially productive way. As a society we have these values in maxim form, like equal opportunity, justice, fairness, and in many ways they’re deliberately vague. This deliberate flexibility and ambiguity are what allows things to be a living document that stays relevant. But here we are in this world where we have to say of some machine-learning model, is this racially fair? We have to define these terms, computationally or numerically.

It’s problematic in the short term because we have no idea what we’re doing; we don’t have a way to approach that problem yet. In the slightly longer term—five or ten years—there’s a profound opportunity to come together as a polis and get precise about what we mean by justice or fairness with respect to certain protected classes. Does that mean it’s got an equal false positive rate? Does that mean it has an equal false negative rate? What is the tradeoff that we’re willing to make? What are the constraints that we want to put on this model-building process? That’s a profound question, and we haven’t needed to address it until now. There’s going to be a civic conversation in the next few years about how to make these concepts explicit….(More) (Video)”

Esri, Waze Partnership: A Growing Trend in Sharing Data for the Benefit of All?


Justine Brown at GovTech: “Esri and Waze announced in mid-October that they’re partnering to help local governments alleviate traffic congestion and analyze congestion patterns. Called the Waze Connected Citizens Program, the program — which enables local governments that use the Esri ArcGIS platform to exchange publicly available traffic data with Waze — may represent a growing trend in which citizens and government share data for the benefit of all.

Connecting Esri and Waze data will allow cities to easily share information about the conditions of their roads with drivers, while drivers anonymously report accidents, potholes and other road condition information back to the cities. Local governments can then merge that data into their existing emergency dispatch and street maintenance systems….

Through the Connected Citizen program, Waze shares two main data sets with its government partners: Jams and Alerts….If there’s a major traffic jam in an unusual area, a traffic management center operator might be triggered to examine that area further. For example, Boston recently used Waze jam data to identify a couple of traffic-prone intersections in the Seaport district….Similarly if a Waze user reports a crash, that information shows up on the city’s existing ArcGIS map. City personnel can assess the crash and combine the Waze data with its existing data sets, if desired. The city can then notify emergency response, for example, to address the accident and send out emergency vehicles if necessary….

The Connected Citizen Program could also provide local governments an alternative to IoT investments, because a city can utilize real-time reports from the road rather than investing in sensors and IoT infrastructure. The Kentucky Transportation Cabinet, for instance, uses data from the Connected Citizen Program in several ways, including to monitor and detect automobile accidents on its roadways….(More)”

How to Hold Algorithms Accountable


Nicholas Diakopoulos and Sorelle Friedler at MIT Technology Review:  Algorithms are now used throughout the public and private sectors, informing decisions on everything from education and employment to criminal justice. But despite the potential for efficiency gains, algorithms fed by big data can also amplify structural discrimination, produce errors that deny services to individuals, or even seduce an electorate into a false sense of security. Indeed, there is growing awareness that the public should be wary of the societal risks posed by over-reliance on these systems and work to hold them accountable.

Various industry efforts, including a consortium of Silicon Valley behemoths, are beginning to grapple with the ethics of deploying algorithms that can have unanticipated effects on society. Algorithm developers and product managers need new ways to think about, design, and implement algorithmic systems in publicly accountable ways. Over the past several months, we and some colleagues have been trying to address these goals by crafting a set of principles for accountable algorithms….

Accountability implies an obligation to report and justify algorithmic decision-making, and to mitigate any negative social impacts or potential harms. We’ll consider accountability through the lens of five core principles: responsibility, explainability, accuracy, auditability, and fairness.

Responsibility. For any algorithmic system, there needs to be a person with the authority to deal with its adverse individual or societal effects in a timely fashion. This is not a statement about legal responsibility but, rather, a focus on avenues for redress, public dialogue, and internal authority for change. This could be as straightforward as giving someone on your technical team the internal power and resources to change the system, making sure that person’s contact information is publicly available.

Explainability. Any decisions produced by an algorithmic system should be explainable to the people affected by those decisions. These explanations must be accessible and understandable to the target audience; purely technical descriptions are not appropriate for the general public. Explaining risk assessment scores to defendants and their legal counsel would promote greater understanding and help them challenge apparent mistakes or faulty data. Some machine-learning models are more explainable than others, but just because there’s a fancy neural net involved doesn’t mean that a meaningful explanationcan’t be produced.

Accuracy. Algorithms make mistakes, whether because of data errors in their inputs (garbage in, garbage out) or statistical uncertainty in their outputs. The principle of accuracy suggests that sources of error and uncertainty throughout an algorithm and its data sources need to be identified, logged, and benchmarked. Understanding the nature of errors produced by an algorithmic system can inform mitigation procedures.

Auditability. The principle of auditability states that algorithms should be developed to enable third parties to probe and review the behavior of an algorithm. Enabling algorithms to be monitored, checked, and criticized would lead to more conscious design and course correction in the event of failure. While there may be technical challenges in allowing public auditing while protecting proprietary information, private auditing (as in accounting) could provide some public assurance. Where possible, even limited access (e.g., via an API) would allow the public a valuable chance to audit these socially significant algorithms.

Fairness. As algorithms increasingly make decisions based on historical and societal data, existing biases and historically discriminatory human decisions risk being “baked in” to automated decisions. All algorithms making decisions about individuals should be evaluated for discriminatory effects. The results of the evaluation and the criteria used should be publicly released and explained….(More)”

Big data promise exponential change in healthcare


Gonzalo Viña in the Financial Times (Special Report: ): “When a top Formula One team is using pit stop data-gathering technology to help a drugmaker improve the way it makes ventilators for asthma sufferers, there can be few doubts that big data are transforming pharmaceutical and healthcare systems.

GlaxoSmithKline employs online technology and a data algorithm developed by F1’s elite McLaren Applied Technologies team to minimise the risk of leakage from its best-selling Ventolin (salbutamol) bronchodilator drug.

Using multiple sensors and hundreds of thousands of readings, the potential for leakage is coming down to “close to zero”, says Brian Neill, diagnostics director in GSK’s programme and risk management division.

This apparently unlikely venture for McLaren, known more as the team of such star drivers as Fernando Alonso and Jenson Button, extends beyond the work it does with GSK. It has partnered with Birmingham Children’s hospital in a £1.8m project utilising McLaren’s expertise in analysing data during a motor race to collect such information from patients as their heart and breathing rates and oxygen levels. Imperial College London, meanwhile, is making use of F1 sensor technology to detect neurological dysfunction….

Big data analysis is already helping to reshape sales and marketing within the pharmaceuticals business. Great potential, however, lies in its ability to fine tune research and clinical trials, as well as providing new measurement capabilities for doctors, insurers and regulators and even patients themselves. Its applications seem infinite….

The OECD last year said governments needed better data governance rules given the “high variability” among OECD countries about protecting patient privacy. Recently, DeepMind, the artificial intelligence company owned by Google, signed a deal with a UK NHS trust to process, via a mobile app, medical data relating to 1.6m patients. Privacy advocates say this as “worrying”. Julia Powles, a University of Cambridge technology law expert, asks if the company is being given “a free pass” on the back of “unproven promises of efficiency and innovation”.

Brian Hengesbaugh, partner at law firm Baker & McKenzie in Chicago, says the process of solving such problems remains “under-developed”… (More)

New Data Portal to analyze governance in Africa