What’s wrong with big data?


James Bridle in the New Humanist: “In a 2008 article in Wired magazine entitled “The End of Theory”, Chris Anderson argued that the vast amounts of data now available to researchers made the traditional scientific process obsolete. No longer would they need to build models of the world and test them against sampled data. Instead, the complexities of huge and totalising datasets would be processed by immense computing clusters to produce truth itself: “With enough data, the numbers speak for themselves.” As an example, Anderson cited Google’s translation algorithms which, with no knowledge of the underlying structures of languages, were capable of inferring the relationship between them using extensive corpora of translated texts. He extended this approach to genomics, neurology and physics, where scientists are increasingly turning to massive computation to make sense of the volumes of information they have gathered about complex systems. In the age of big data, he argued, “Correlation is enough. We can stop looking for models.”

This belief in the power of data, of technology untrammelled by petty human worldviews, is the practical cousin of more metaphysical assertions. A belief in the unquestionability of data leads directly to a belief in the truth of data-derived assertions. And if data contains truth, then it will, without moral intervention, produce better outcomes. Speaking at Google’s private London Zeitgeist conference in 2013, Eric Schmidt, Google Chairman, asserted that “if they had had cellphones in Rwanda in 1994, the genocide would not have happened.” Schmidt’s claim was that technological visibility – the rendering of events and actions legible to everyone – would change the character of those actions. Not only is this statement historically inaccurate (there was plenty of evidence available of what was occurring during the genocide from UN officials, US satellite photographs and other sources), it’s also demonstrably untrue. Analysis of unrest in Kenya in 2007, when over 1,000 people were killed in ethnic conflicts, showed that mobile phones not only spread but accelerated the violence. But you don’t need to look to such extreme examples to see how a belief in technological determinism underlies much of our thinking and reasoning about the world.

“Big data” is not merely a business buzzword, but a way of seeing the world. Driven by technology, markets and politics, it has come to determine much of our thinking, but it is flawed and dangerous. It runs counter to our actual findings when we employ such technologies honestly and with the full understanding of their workings and capabilities. This over-reliance on data, which I call “quantified thinking”, has come to undermine our ability to reason meaningfully about the world, and its effects can be seen across multiple domains.

The assertion is hardly new. Writing in the Dialectic of Enlightenment in 1947, Theodor Adorno and Max Horkheimer decried “the present triumph of the factual mentality” – the predecessor to quantified thinking – and succinctly analysed the big data fallacy, set out by Anderson above. “It does not work by images or concepts, by the fortunate insights, but refers to method, the exploitation of others’ work, and capital … What men want to learn from nature is how to use it in order wholly to dominate it and other men. That is the only aim.” What is different in our own time is that we have built a world-spanning network of communication and computation to test this assertion. While it occasionally engenders entirely new forms of behaviour and interaction, the network most often shows to us with startling clarity the relationships and tendencies which have been latent or occluded until now. In the face of the increased standardisation of knowledge, it becomes harder and harder to argue against quantified thinking, because the advances of technology have been conjoined with the scientific method and social progress. But as I hope to show, technology ultimately reveals its limitations….

“Eroom’s law” – Moore’s law backwards – was recently formulated to describe a problem in pharmacology. Drug discovery has been getting more expensive. Since the 1950s the number of drugs approved for use in human patients per billion US dollars spent on research and development has halved every nine years. This problem has long perplexed researchers. According to the principles of technological growth, the trend should be in the opposite direction. In a 2012 paper in Nature entitled “Diagnosing the decline in pharmaceutical R&D efficiency” the authors propose and investigate several possible causes for this. They begin with social and physical influences, such as increased regulation, increased expectations and the exhaustion of easy targets (the “low hanging fruit” problem). Each of these are – with qualifications – disposed of, leaving open the question of the discovery process itself….(More)

Teaching an Algorithm to Understand Right and Wrong


Greg Satell at Harvard Business Review: “In his Nicomachean Ethics, Aristotle states that it is a fact that “all knowledge and every pursuit aims at some good,” but then continues, “What then do we mean by the good?” That, in essence, encapsulates the ethical dilemma. We all agree that we should be good and just, but it’s much harder to decide what that entails.

Since Aristotle’s time, the questions he raised have been continually discussed and debated. From the works of great philosophers like Kant, Bentham, andRawls to modern-day cocktail parties and late-night dorm room bull sessions, the issues are endlessly mulled over and argued about but never come to a satisfying conclusion.

Today, as we enter a “cognitive era” of thinking machines, the problem of what should guide our actions is gaining newfound importance. If we find it so difficult to denote the principles by which a person should act justly and wisely, then how are we to encode them within the artificial intelligences we are creating? It is a question that we need to come up with answers for soon.

Designing a Learning Environment

Every parent worries about what influences their children are exposed to. What TV shows are they watching? What video games are they playing? Are they hanging out with the wrong crowd at school? We try not to overly shelter our kids because we want them to learn about the world, but we don’t want to expose them to too much before they have the maturity to process it.

In artificial intelligence, these influences are called a “machine learning corpus.”For example, if you want to teach an algorithm to recognize cats, you expose it to thousands of pictures of cats and things that are not cats. Eventually, it figures out how to tell the difference between, say, a cat and a dog. Much as with human beings, it is through learning from these experiences that algorithms become useful.

However, the process can go horribly awry, as in the case of Microsoft’s Tay, aTwitter bot that the company unleashed on the microblogging platform. In under a day, Tay went from being friendly and casual (“Humans are super cool”) to downright scary (“Hitler was right and I hate Jews”). It was profoundly disturbing.

Francesca Rossi, an AI researcher at IBM, points out that we often encode principles regarding influences into societal norms, such as what age a child needs to be to watch an R-rated movie or whether they should learn evolution in school. “We need to decide to what extent the legal principles that we use to regulate humans can be used for machines,” she told me.

However, in some cases algorithms can alert us to bias in our society that we might not have been aware of, such as when we Google “grandma” and see only white faces. “There is a great potential for machines to alert us to bias,” Rossi notes. “We need to not only train our algorithms but also be open to the possibility that they can teach us about ourselves.”…

Another issue that we will have to contend with is that we will have to decide not only what ethical principles to encode in artificial intelligences but also how they are coded. As noted above, for the most part, “Thou shalt not kill” is a strict principle. Other than a few rare cases, such as the Secret Service or a soldier, it’s more like a preference that is greatly affected by context….

As pervasive as artificial intelligence is set to become in the near future, the responsibility rests with society as a whole. Put simply, we need to take the standards by which artificial intelligences will operate just as seriously as those that govern how our political systems operate and how are children are educated.

It is a responsibility that we cannot shirk….(More)

Federal Privacy Council’s Law Library


Federal Privacy Council: “The Law Library is a compilation of information about and links to select Federal laws related to the creation, collection, use, processing, storage, maintenance, dissemination, disclosure, and disposal of personally identifiable information (PII) by departments and agencies within the Federal Government. The Law Library does not include all laws that are relevant to privacy or the management of PII in the Federal Government.

The Law Library only includes laws applicable to the Federal Government. Although some of the laws included may also be applicable to entities outside of the Federal Government, the information provided on the Law Library pages is strictly limited to the application of those laws to the Federal Government; the information provided does not in any way address the application of any law to the private sector or other non-Federal entities.

The Law Library pages have been prepared by members of the Federal Privacy Council and consist of information from and links to other Federal Government websites. The Federal Privacy Council is not responsible for the content of any third-party website, and links to other websites do not constitute or imply endorsement or recommendation of those sites or the information they provide.

The material in the Law Library is provided for informational purposes only. The information provided may not reflect current legal developments or agency-specific requirements, and it may not be correct or complete. The Federal Privacy Council does not have authority to provide legal advice, to set policies for the Federal Government, or to represent the views of the Federal Government or the views of any agency within the Federal Government; accordingly, the information on this website in no way constitutes policy or legal advice, nor does it in any way reflect Federal Government views or opinions.  Agencies shall consult law, regulation, and policy, including OMB guidance, to understand applicable requirements….(More)”

Open Data Collection (PLOS)


Daniella Lowenberg, Amy Ross, Emma Ganley at PLOS: “In the spirit of Open Con and highlighting the state of Open Data, PLOS is proud to release our Open Data Collection. The many values of Open Data are becoming increasingly apparent, and as a result, we are seeing an adoption of Open Data policies across publishers, funders and organizations. Open Data has proven a fantastic tool to help evaluate the replicability of published research, and even politicians are taking a stand in favor of Open Data as a mechanism to advance science rapidly. In March of 2014, PLOS updated our Data Policy to reflect the need for the underlying data to be as open as the paper itself resulting in complete transparency of the research. Two and-a-half years later, we have seen over 60,000 published papers with open data sets and an increase in submissions reflecting open data practices and policies….

To create this Open Data Collection, we exhaustively searched for relevant articles published across PLOS that discuss open data in some way. Then, in collaboration with our external advisor, Melissa Haendel, we have selected 26 of those articles with the aim to highlight a broad scope of research articles, guidelines, and commentaries about data sharing, data practices, and data policies from different research fields. Melissa has written an engaging blog post detailing the rubric and reasons behind her selections….(More)”

Make Democracy Great Again: Let’s Try Some ‘Design Thinking’


Ken Carbone in the Huffington Post: “Allow me to begin with the truth. I’ve never studied political science, run for public office nor held a position in government. For the last forty years I’ve led a design agency working with enduring brands across the globe. As with any experienced person in my profession, I have used research, deductive reasoning, logic and “design thinking“ to solve complex problems and create opportunities. Great brands that are showing their age turn to our agency to get back on course. In this light, I believe American democracy is a prime target for some retooling….

The present campaign cycle has left many voters wondering how such divisiveness and national embarrassment could be happening in the land of the free and home of the brave. This could be viewed as symptomatic of deeper structural problems in our tradition bound 240 year-old democracy. Great brands operate on a “innovate or die” model to insure success. The continual improvement of how a business operates and adapts to market conditions is a sound and critical practice.

Although the current election frenzy will soon be over, I want to examine three challenges to our election process and propose possible solutions for consideration. I’ll use the same diagnostic thinking I use with major corporations:

Term Limits…

Voting and Voter registration…

Political Campaigns…

In June of this year I attended the annual leadership conference of AIGA, the professional association for design, in Raleigh NC. A provocative question posed to a select group of designers was “What would you do if you were Secretary of Design.” The responses addressed issues concerning positive social change, education and Veteran Affairs. The audience was full of several hundred trained professionals whose everyday problem solving methods encourage divergent thinking to explore many solutions (possible or impossible) and then use convergent thinking to select and realize the best resolution. This is the very definition of “design thinking.” That leads to progress….(More)”.

Is Social Media Killing Democracy?


Phil Howard at Culture Digitally: “This is the big year for computational propaganda—using immense data sets to manipulate public opinion over social media.  Both the Brexit referendum and US election have revealed the limits of modern democracy, and social media platforms are currently setting those limits. 

Platforms like Twitter and Facebook now provide a structure for our political lives.  We’ve always relied on many kinds of sources for our political news and information.  Family, friends, news organizations, charismatic politicians certainly predate the internet.  But whereas those are sources of information, social media now provides the structure for political conversation.  And the problem is that these technologies permit too much fake news, encourage our herding instincts, and aren’t expected to provide public goods.

First, social algorithms allow fake news stories from untrustworthy sources to spread like wildfire over networks of family and friends.  …

Second, social media algorithms provide very real structure to what political scientists often call “elective affinity” or “selective exposure”…

The third problem is that technology companies, including Facebook and Twitter, have been given a “moral pass” on the obligations we hold journalists and civil society groups to….

Facebook has run several experiments now, published in scholarly journals, demonstrating that they have the ability to accurately anticipate and measure social trends.  Whereas journalists and social scientists feel an obligation to openly analyze and discuss public preferences, we do not expect this of Facebook.  The network effects that clearly were unmeasured by pollsters were almost certainly observable to Facebook.  When it comes to news and information about politics, or public preferences on important social questions, Facebook has a moral obligation to share data and prevent computational propaganda.  The Brexit referendum and US election have taught us that Twitter and Facebook are now media companies.  Their engineering decisions are effectively editorial decisions, and we need to expect more openness about how their algorithms work.  And we should expect them to deliberate about their editorial decisions.

There are some ways to fix these problems.  Opaque software algorithms shape what people find in their news feeds.  We’ve all noticed fake news stories, often called clickbait, and while these can be an entertaining part of using the internet, it is bad when they are used to manipulate public opinion.  These algorithms work as “bots” on social media platforms like Twitter, where they were used in both the Brexit and US Presidential campaign to aggressively advance the case for leaving Europe and the case for electing Trump.  Similar algorithms work behind the scenes on Facebook, where they govern what content from your social networks actually gets your attention. 

So the first way to strengthen democratic practices is for academics, journalists, policy makers and the interested public to audit social media algorithms….(More)”.

We All Need Help: “Big Data” and the Mismeasure of Public Administration


Essay by Stephane Lavertu in Public Administration Review: “Rapid advances in our ability to collect, analyze, and disseminate information are transforming public administration. This “big data” revolution presents opportunities for improving the management of public programs, but it also entails some risks. In addition to potentially magnifying well-known problems with public sector performance management—particularly the problem of goal displacement—the widespread dissemination of administrative data and performance information increasingly enables external political actors to peer into and evaluate the administration of public programs. The latter trend is consequential because external actors may have little sense of the validity of performance metrics and little understanding of the policy priorities they capture. The author illustrates these potential problems using recent research on U.S. primary and secondary education and suggests that public administration scholars could help improve governance in the data-rich future by informing the development and dissemination of organizational report cards that better capture the value that public agencies deliver….(More)”.

Results Through Transparency: Does Publicity Lead to Better Procurement?


Paper by Charles Kenny and Ben Crisman: “Governments buy about $9 trillion worth of goods and services a year, and their procurement policies are increasingly subject to international standards and institutional regulation including the WTO Plurilateral Agreement on Government Procurement, Open Government Partnership commitments and International Financial Institution procurement rules. These standards focus on transparency and open competition as key tools to improve outcomes. While there is some evidence on the impact of competition on prices in government procurement, there is less on the impact of specific procurement rules including transparency on competition or procurement outcomes. Using a database of World Bank financed contracts, we explore the impact of a relatively minor procurement rule governing advertising on competition using regression discontinuity design and matching methods….(More)”

Digital Government: Leveraging Innovation to Improve Public Sector Performance and Outcomes for Citizens


Book edited by Svenja Falk, Andrea Römmele, Andrea and Michael Silverman: “This book focuses on the implementation of digital strategies in the public sectors in the US, Mexico, Brazil, India and Germany. The case studies presented examine different digital projects by looking at their impact as well as their alignment with their national governments’ digital strategies. The contributors assess the current state of digital government, analyze the contribution of digital technologies in achieving outcomes for citizens, discuss ways to measure digitalization and address the question of how governments oversee the legal and regulatory obligations of information technology. The book argues that most countries formulate good strategies for digital government, but do not effectively prescribe and implement corresponding policies and programs. Showing specific programs that deliver results can help policy makers, knowledge specialists and public-sector researchers to develop best practices for future national strategies….(More)”

Crowd-sourcing pollution control in India


Springwise: “Following orders by the national government to improve the air quality of the New Delhi region by reducing air pollution, the Environment Pollution (Prevention and Control) Authority created the Hawa Badlo app. Designed for citizens to report cases of air pollution, each complaint is sent to the appropriate official for resolution.

Free to use, the app is available for both iOS and Android. Complaints are geo-tagged, and there are two different versions available – one for citizens and one for government officials. Officials must provide photographic evidence to close a case. The app itself produces weekly reports listings the numbers and status of complaints, along with any actions taken to resolve the problem. Currently focusing on pollution from construction, unpaved roads and the burning of garbage, the team behind the app plans to expand its use to cover other types of pollution as well.

From providing free wi-fi when the air is clean enough to mapping air-quality in real-time, air pollution solutions are increasingly involving citizens….(More)”