The deception that lurks in our data-driven world


Alexis C. Madrigal at Fusion: “…There’s this amazing book called Seeing Like a State, which shows how governments and other big institutions try to reduce the vast complexity of the world into a series of statistics that their leaders use to try to comprehend what’s happening.

The author, James C. Scott, opens the book with an extended anecdote about the Normalbaum. In the second half of the 18th century, Prussian rulers wanted to know how many “natural resources” they had in the tangled woods of the country. So, they started counting. And they came up with these huge tables that would let them calculate how many board-feet of wood they could pull from a given plot of forest. All the rest of the forest, everything it did for the people and the animals and general ecology of the place was discarded from the analysis.

The world proved too unruly. Their data wasn’t perfect.

But the world proved too unruly. Their data wasn’t perfect. So they started creating new forests, the Normalbaum, planting all the trees at the same time, and monoculturing them so that there were no trees in the forest that couldn’t be monetized for wood. “The fact is that forest science and geometry, backed by state power, had the capacity to transform the real, diverse, and chaotic old-growth forest into a new, more uniform forest that closely resembled the administrative grid of its techniques,” Scott wrote.

normal forrest plan

The spreadsheet became the world! They even planted the trees in rows, like a grid.

German foresters got very scientific with their fertilizer applications and management practices. And the scheme really worked—at least for a hundred years. Pretty much everyone across the world adopted their methods.

Then the forests started dying.

“In the German case, the negative biological and ultimately commercial consequences of the stripped-down forest became painfully obvious only after the second rotation of conifers had been planted,” Scott wrote.

The complex ecosystem that underpinned the growth of these trees through generations—all the microbial and inter-species relationships—were torn apart by the rigor of the Normalbaum. The nutrient cycles were broken. Resilience was lost. The hidden underpinnings of the world were revealed only when they were gone. The Germans, like they do, came up with a new word for what happened: Waldsterben, or forest death.

The hidden underpinnings of the world were revealed only when they were gone.

Sometimes, when I look out at our world—at the highest level—in which thin data have come to stand in for huge complex systems of human and biological relationships, I wonder if we’re currently deep in the Normalbaum phase of things, awaiting the moment when Waldsterbensets in.

Take the ad-supported digital media ecosystem. The idea is brilliant: capture data on people all over the web and then use what you know to show them relevant ads, ads they want to see. Not only that, but because it’s all tracked, unlike broadcast or print media, an advertiser can measure what they’re getting more precisely. And certainly the digital advertising market has grown, taking share from most other forms of media. The spreadsheet makes a ton of sense—which is one reason for the growth predictions that underpin the massive valuations of new media companies.

But scratch the surface, like Businessweek recently did, and the problems are obvious. A large percentage of the traffic to many stories and videos consists of software pretending to be human.

“The art is making the fake traffic look real, often by sprucing up websites with just enough content to make them appear authentic,” Businessweek says. “Programmatic ad-buying systems don’t necessarily differentiate between real users and bots, or between websites with fresh, original work, and Potemkin sites camouflaged with stock photos and cut-and-paste articles.”

Of course, that’s not what high-end media players are doing. But the cheap programmatic ads, fueled by fake traffic, drive down the pricesacross the digital media industry, making it harder to support good journalism. Meanwhile, users of many sites are rebelling against the business model by installing ad blockers.

The advertisers and ad-tech firms just wanted to capture user data to show them relevant ads. They just wanted to measure their ads more effectively. But placed into the real-world, the system that grew up around these desires has reshaped the media landscape in unpredictable ways.

We’ve deceived ourselves into thinking data is a camera, but it’s really an engine. Capturing data about something changes the way that something works. Even the mere collection of stats is not a neutral act, but a way of reshaping the thing itself….(More)”

Governments’ Self-Disruption Challenge


Mohamed A. El-Erian at Project Syndicate: “One of the most difficult challenges facing Western governments today is to enable and channel the transformative – and, for individuals and companies, self-empowering – forces of technological innovation. They will not succeed unless they become more open to creative destruction, allowing not only tools and procedures, but also mindsets, to be revamped and upgraded. The longer it takes them to meet this challenge, the bigger the lost opportunities for current and future generations.
Self-empowering technological innovation is all around us, affecting a growing number of people, sectors, and activities worldwide. Through an ever-increasing number of platforms, it is now easier than ever for households and corporations to access and engage in an expanding range of activities – from urban transportation to accommodation, entertainment, and media. Even the regulation-reinforced, fortress-like walls that have traditionally surrounded finance and medicine are being eroded.

…In fact, Western political and economic structures are, in some ways, specifically designed to resist deep and rapid change, if only to prevent temporary and reversible fluctuations from having an undue influence on underlying systems. This works well when politics and economies are operating in cyclical mode, as they usually have been in the West. But when major structural and secular challenges arise, as is the case today, the advanced countries’ institutional architecture acts as a major obstacle to effective action….Against this background, a rapid and comprehensive transformation is clearly not feasible. (In fact, it may not even be desirable, given the possibility of collateral damage and unintended consequences.) The best option for Western governments is thus to pursue gradual change, propelled by a variety of adaptive instruments, which would reach a critical mass over time.
Such tools include well-designed public-private partnerships, especially when it comes to modernizing infrastructure; disruptive outside advisers – selected not for what they think, but for how they think – in the government decision-making process; mechanisms to strengthen inter-agency coordination so that it enhances, rather than retards, policy responsiveness; and broader cross-border private-sector linkages to enhance multilateral coordination.
How economies function is changing, as relative power shifts from established, centralized forces toward those that respond to the unprecedented empowerment of individuals. If governments are to overcome the challenges they face and maximize the benefits of this shift for their societies, they need to be a lot more open to self-disruption. Otherwise, the transformative forces will leave them and their citizens behind….(More)”

Big Data and Mass Shootings


Holman W. Jenkins in the Wall Street Journal: “As always, the dots are connected after the fact, when the connecting is easy. …The day may be coming, sooner than we think, when such incidents can be stopped before they get started. A software program alerts police to a social-media posting by an individual of interest in their jurisdiction. An algorithm reminds them why the individual had become a person of interest—a history of mental illness, an episode involving a neighbor. Months earlier, discreet inquires by police had revealed an unhealthy obsession with weapons—key word, unhealthy. There’s no reason why gun owners, range operators and firearms dealers shouldn’t be a source of information for local police seeking information about who might merit special attention.

Sound scary? Big data exists to find the signal among the noise. Your data is the noise. It’s what computerized systems seek to disregard in their quest for information that actually would be useful to act on. Big data is interested in needles, not hay.

Still don’t trust the government? You’re barking up an outdated tree. Consider the absurdly ancillary debate last year on whether the government should be allowed to hold telephone “metadata” when the government already holds vastly more sensitive data on all of us in the form of tax, medical, legal and census records.

All this seems doubly silly given the spacious information about each of us contained in private databases, freely bought and sold by marketers. Bizarre is the idea that Facebook should be able to use our voluntary Facebook postings to decide what we might like to buy, but police shouldn’t use the same information to prevent crime.

Hitachi, the big Japanese company, began testing its crime-prediction software in several unnamed American cities this month. The project, called Hitachi Visualization Predictive Crime Analytics, culls crime records, map and transit data, weather reports, social media and other sources for patterns that might otherwise go unnoticed by police.

Colorado-based Intrado, working with LexisNexis and Motorola Solutions, already sells police a service that instantly scans legal, business and social-media records for information about persons and circumstances that officers may encounter when responding to a 911 call at a specific address. Hundreds of public safety agencies find the system invaluable though that didn’t stop the city of Bellingham, Wash., from rejecting it last year on the odd grounds that such software must be guilty of racial profiling.

Big data is changing how police allocate resources and go about fighting crime. …It once was freely asserted that police weren’t supposed to prevent crime, only solve it. But recent research shows investment in policing actually does reduce crime rates—and produces a large positive return measured in dollars and cents. A day will come when failing to connect the dots in advance of a mass-shooting won’t be a matter for upturned hands. It will be a matter for serious recrimination…(More)

The importance of human innovation in A.I. ethics


John C. Havens at Mashable: “….While welcoming the feedback that sensors, data and Artificial Intelligence provide, we’re at a critical inflection point. Demarcating the parameters between assistance and automation has never been more central to human well-being. But today, beauty is in the AI of the beholder. Desensitized to the value of personal data, we hemorrhage precious insights regarding our identity that define the moral nuances necessary to navigate algorithmic modernity.

If no values-based standards exist for Artificial Intelligence, then the biases of its manufacturers will define our universal code of human ethics. But this should not be their cross to bear alone. It’s time to stop vilifying the AI community and start defining in concert with their creations what the good life means surrounding our consciousness and code.

The intention of the ethics

“Begin as you mean to go forward.” Michael Stewart is founder, chairman & CEO of Lucid, an Artificial Intelligence company based in Austin that recently announced the formation of the industry’s first Ethics Advisory Panel (EAP). While Google claimed creation of a similar board when acquiring AI firm DeepMind in January 2014, no public realization of its efforts currently exist (as confirmed by a PR rep from Google for this piece). Lucid’s Panel, by comparison, has already begun functioning as a separate organization from the analytics side of the business and provides oversight for the company and its customers. “Our efforts,” Stewart says, “are guided by the principle that our ethics group is obsessed with making sure the impact of our technology is good.”

Kay Firth-Butterfield is chief officer of the EAP, and is charged with being on the vanguard of the ethical issues affecting the AI industry and society as a whole. Internally, the EAP provides the hub of ethical behavior for the company. Someone from Firth-Butterfield’s office even sits on all core product development teams. “Externally,” she notes, “we plan to apply Cyc intelligence (shorthand for ‘encyclopedia,’ Lucid’s AI causal reasoning platform) for research to demonstrate the benefits of AI and to advise Lucid’s leadership on key decisions, such as the recent signing of the LAWS letter and the end use of customer applications.”

Ensuring the impact of AI technology is positive doesn’t happen by default. But as Lucid is demonstrating, ethics doesn’t have to stymie innovation by dwelling solely in the realm of risk mitigation. Ethical processes aligning with a company’s core values can provide more deeply relevant products and increased public trust. Transparently including your customer’s values in these processes puts the person back into personalization….(Mashable)”

Five principles for applying data science for social good


Jake Porway at O’Reilly: “….Every week, a data or technology company declares that it wants to “do good” and there are countless workshops hosted by major foundations musing on what “big data can do for society.” Add to that a growing number of data-for-good programs from Data Science for Social Good’s fantastic summer program toBayes Impact’s data science fellowships to DrivenData’s data-science-for-good competitions, and you can see how quickly this idea of “data for good” is growing.

Yes, it’s an exciting time to be exploring the ways new datasets, new techniques, and new scientists could be deployed to “make the world a better place.” We’ve already seen deep learning applied to ocean health,satellite imagery used to estimate poverty levels, and cellphone data used to elucidate Nairobi’s hidden public transportation routes. And yet, for all this excitement about the potential of this “data for good movement,” we are still desperately far from creating lasting impact. Many efforts will not only fall short of lasting impact — they will make no change at all….

So how can these well-intentioned efforts reach their full potential for real impact? Embracing the following five principles can drastically accelerate a world in which we truly use data to serve humanity.

1. “Statistics” is so much more than “percentages”

We must convey what constitutes data, what it can be used for, and why it’s valuable.

There was a packed house for the March 2015 release of the No Ceilings Full Participation Report. Hillary Clinton, Melinda Gates, and Chelsea Clinton stood on stage and lauded the report, the culmination of a year-long effort to aggregate and analyze new and existing global data, as the biggest, most comprehensive data collection effort about women and gender ever attempted. One of the most trumpeted parts of the effort was the release of the data in an open and easily accessible way.

I ran home and excitedly pulled up the data from the No Ceilings GitHub, giddy to use it for our DataKind projects. As I downloaded each file, my heart sunk. The 6MB size of the entire global dataset told me what I would find inside before I even opened the first file. Like a familiar ache, the first row of the spreadsheet said it all: “USA, 2009, 84.4%.”

What I’d encountered was a common situation when it comes to data in the social sector: the prevalence of inert, aggregate data. ….

2. Finding problems can be harder than finding solutions

We must scale the process of problem discovery through deeper collaboration between the problem holders, the data holders, and the skills holders.

In the immortal words of Henry Ford, “If I’d asked people what they wanted, they would have said a faster horse.” Right now, the field of data science is in a similar position. Framing data solutions for organizations that don’t realize how much is now possible can be a frustrating search for faster horses. If data cleaning is 80% of the hard work in data science, then problem discovery makes up nearly the remaining 20% when doing data science for good.

The plague here is one of education. …

3. Communication is more important than technology

We must foster environments in which people can speak openly, honestly, and without judgment. We must be constantly curious about each other.

At the conclusion of one of our recent DataKind events, one of our partner nonprofit organizations lined up to hear the results from their volunteer team of data scientists. Everyone was all smiles — the nonprofit leaders had loved the project experience, the data scientists were excited with their results. The presentations began. “We used Amazon RedShift to store the data, which allowed us to quickly build a multinomial regression. The p-value of 0.002 shows …” Eyes glazed over. The nonprofit leaders furrowed their brows in telegraphed concentration. The jargon was standing in the way of understanding the true utility of the project’s findings. It was clear that, like so many other well-intentioned efforts, the project was at risk of gathering dust on a shelf if the team of volunteers couldn’t help the organization understand what they had learned and how it could be integrated into the organization’s ongoing work…..

4. We need diverse viewpoints

To tackle sector-wide challenges, we need a range of voices involved.

One of the most challenging aspects to making change at the sector level is the range of diverse viewpoints necessary to understand a problem in its entirety. In the business world, profit, revenue, or output can be valid metrics of success. Rarely, if ever, are metrics for social change so cleanly defined….

Challenging this paradigm requires diverse, or “collective impact,” approaches to problem solving. The idea has been around for a while (h/t Chris Diehl), but has not yet been widely implemented due to the challenges in successful collective impact. Moreover, while there are many diverse collectives committed to social change, few have the voice of expert data scientists involved. DataKind is piloting a collective impact model called DataKind Labs, that seeks to bring together diverse problem holders, data holders, and data science experts to co-create solutions that can be applied across an entire sector-wide challenge. We just launchedour first project with Microsoft to increase traffic safety and are hopeful that this effort will demonstrate how vital a role data science can play in a collective impact approach.

5. We must design for people

Data is not truth, and tech is not an answer in-and-of-itself. Without designing for the humans on the other end, our work is in vain.

So many of the data projects making headlines — a new app for finding public services, a new probabilistic model for predicting weather patterns for subsistence farmers, a visualization of government spending — are great and interesting accomplishments, but don’t seem to have an end user in mind. The current approach appears to be “get the tech geeks to hack on this problem, and we’ll have cool new solutions!” I’ve opined that, though there are many benefits to hackathons, you can’t just hack your way to social change….(More)”

Data-Driven Innovation: Big Data for Growth and Well-Being


“A new OECD report on data-driven innovation finds that countries could be getting much more out of data analytics in terms of economic and social gains if governments did more to encourage investment in “Big Data” and promote data sharing and reuse.

The migration of economic and social activities to the Internet and the advent of The Internet of Things – along with dramatically lower costs of data collection, storage and processing and rising computing power – means that data-analytics is increasingly driving innovation and is potentially an important new source of growth.

The report suggest countries act to seize these benefits, by training more and better data scientists, reducing barriers to cross-border data flows, and encouraging investment in business processes to incorporate data analytics.

Few companies outside of the ICT sector are changing internal procedures to take advantage of data. For example, data gathered by companies’ marketing departments is not always used by other departments to drive decisions and innovation. And in particular, small and medium-sized companies face barriers to the adoption of data-related technologies such as cloud computing, partly because they have difficulty implementing organisational change due to limited resources, including the shortage of skilled personnel.

At the same time, governments will need to anticipate and address the disruptive effects of big data on the economy and overall well-being, as issues as broad as privacy, jobs, intellectual property rights, competition and taxation will be impacted. Read the Policy Brief

TABLE OF CONTENTS
Preface
Foreword
Executive summary
The phenomenon of data-driven innovation
Mapping the global data ecosystem and its points of control
How data now drive innovation
Drawing value from data as an infrastructure
Building trust for data-driven innovation
Skills and employment in a data-driven economy
Promoting data-driven scientific research
The evolution of health care in a data-rich environment
Cities as hubs for data-driven innovation
Governments leading by example with public sector data

 

A matter of public trust: measuring how government performs


Gai Brodtmann at the Sydney Morning Herald:”…Getting trust is hard. Losing it is easy. And the work of maintaining trust in our democracy, and the public institutions it rests on, is constant, quiet and careful.

That trust is built on accountability and transparency. It relies on an assurance that government programs are well managed and delivered efficiently and effectively to give the best results for Australians.

And it demands impartial adjudicators to provide that assurance.

The first is getting the metrics, the key performance indicators, right. The indicators should be a fundamental way of judging whether a program is being implemented effectively and achieving its aims. If significant variations from expected performance are observed, it’s a sure sign that closer examination of the program is needed.

A lot of effort has gone into indicators in recent years. Progress has been made, but everyone recognises the issue is complex. Establishing meaningful indicators, which are aligned across and up and down agencies, and become business as usual, is not easy. And it cannot be done independently of other public sector reform.

Cultural change is inevitably at the heart of all these discussions and two aspects of that strike me. The first is risk-aversion. The second is the silo problem.

A crucial challenge in overcoming a too-timid approach to doing business is that we do not, on the whole, have incentives in the system that encourage taking risks. In fact, many of the incentives do the opposite….

But a more balanced risk-management culture will only germinate if both the government and the Parliament – including its committees – change their ways to recognise that innovative policy design and complex program implementation needs to embrace risk to be successful.

The second aspect of cultural change is the problem of too many silos. Perhaps, in some simpler past, public service agencies could generally operate with exclusive rights and functions within their own well-defined boundaries. But as social and economic challenges become more complex, this isn’t feasible.

Modern government in Australia is still coming to grips with this new imperative. Programs often involve multi-agency collaboration across jurisdictions, where the boundaries are well and truly crossed both within jurisdictions and across them. Unsurprisingly, ensuring a consistent approach and assessing outcomes has been difficult to achieve.

Collaboration between different entities is not strange to the private sector. One lesson we can draw is that, in collaborations, it is important to have a clear line of authority and control….(More)”

New Frontiers in Social Innovation Research


Book edited by Alex Nicholls, Julie Simon, Madeleine Gabriel: “Interest in social innovation continues to rise, from governments setting up social innovation ‘labs’ to large corporations developing social innovation strategies. Yet theory lags behind practice, and this hampers our ability to understand social innovation and make the most of its potential. This collection brings together work by leading social innovation researchers globally, exploring the practice and process of researching social innovation, its nature and effects. Combining theoretical chapters and empirical studies, it shows how social innovation is blurring traditional boundaries between the market, the state and civil society, thereby developing new forms of services, relationships and collaborations. It takes a critical perspective, analyzing potential downsides of social innovation that often remain unexplored or are glossed over, yet concludes with a powerful vision of the potential for social innovation to transform society. It aims to be a valuable resource for students and researchers, as well as policymakers and others supporting and leading social innovation….(More)”

What we can learn from the failure of Google Flu Trends


David Lazer and Ryan Kennedy at Wired: “….The issue of using big data for the common good is far more general than Google—which deserves credit, after all, for offering the occasional peek at their data. These records exist because of a compact between individual consumers and the corporation. The legalese of that compact is typically obscure (how many people carefully read terms and conditions?), but the essential bargain is that the individual gets some service, and the corporation gets some data.

What is left out that bargain is the public interest. Corporations and consumers are part of a broader society, and many of these big data archives offer insights that could benefit us all. As Eric Schmidt, CEO of Google, has said, “We must remember that technology remains a tool of humanity.” How can we, and corporate giants, then use these big data archives as a tool to serve humanity?

Google’s sequel to GFT, done right, could serve as a model for collaboration around big data for the public good. Google is making flu-related search data available to the CDC as well as select research groups. A key question going forward will be whether Google works with these groups to improve the methodology underlying GFT. Future versions should, for example, continually update the fit of the data to flu prevalence—otherwise, the value of the data stream will rapidly decay.

This is just an example, however, of the general challenge of how to build models of collaboration amongst industry, government, academics, and general do-gooders to use big data archives to produce insights for the public good. This came to the fore with the struggle (and delay) for finding a way to appropriately share mobile phone data in west Africa during the Ebola epidemic (mobile phone data are likely the best tool for understanding human—and thus Ebola—movement). Companies need to develop efforts to share data for the public good in a fashion that respects individual privacy.

There is not going to be a single solution to this issue, but for starters, we are pushing for a “big data” repository in Boston to allow holders of sensitive big data to share those collections with researchers while keeping them totally secure. The UN has its Global Pulse initiative, setting up collaborative data repositories around the world. Flowminder, based in Sweden, is a nonprofit dedicated to gathering mobile phone data that could help in response to disasters. But these are still small, incipient, and fragile efforts.

The question going forward now is how build on and strengthen these efforts, while still guarding the privacy of individuals and the proprietary interests of the holders of big data….(More)”

Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism


Stefan Baack at Big Data and Society: “This article shows how activists in the open data movement re-articulate notions of democracy, participation, and journalism by applying practices and values from open source culture to the creation and use of data. Focusing on the Open Knowledge Foundation Germany and drawing from a combination of interviews and content analysis, it argues that this process leads activists to develop new rationalities around datafication that can support the agency of datafied publics. Three modulations of open source are identified: First, by regarding data as a prerequisite for generating knowledge, activists transform the sharing of source code to include the sharing of raw data. Sharing raw data should break the interpretative monopoly of governments and would allow people to make their own interpretation of data about public issues. Second, activists connect this idea to an open and flexible form of representative democracy by applying the open source model of participation to political participation. Third, activists acknowledge that intermediaries are necessary to make raw data accessible to the public. This leads them to an interest in transforming journalism to become an intermediary in this sense. At the same time, they try to act as intermediaries themselves and develop civic technologies to put their ideas into practice. The article concludes with suggesting that the practices and ideas of open data activists are relevant because they illustrate the connection between datafication and open source culture and help to understand how datafication might support the agency of publics and actors outside big government and big business….(More)”