Index: The Data Universe


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe and was originally published in 2013.

  • How much data exists in the digital universe as of 2012: 2.7 zetabytes*
  • Increase in the quantity of Internet data from 2005 to 2012: +1,696%
  • Percent of the world’s data created in the last two years: 90
  • Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
  • Percent of the digital universe in 2005 created by the U.S. and western Europe vs. emerging markets: 48 vs. 20
  • Percent of the digital universe in 2012 created by emerging markets: 36
  • Percent of the digital universe in 2020 predicted to be created by China alone: 21
  • How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
  • Percent of which enterprises have liability or responsibility for (copyright, privacy, compliance with regulations, etc.): 80
  • Amount included in the Obama Administration’s 2-12 Big Data initiative: over $200 million
  • Amount the Department of Defense is investing annually on Big Data projects as of 2012: over $250 million
  • Data created per day in 2012: 2.5 quintillion bytes
  • How many terabytes* of data collected by the U.S. Library of Congress as of April 2011: 235
  • How many terabytes of data collected by Walmart per hour as of 2012: 2,560, or 2.5 petabytes*
  • Projected growth in global data generated per year, as of 2011: 40%
  • Number of IT jobs created globally by 2015 to support big data: 4.4 million (1.9 million in the U.S.)
  • Potential shortage of data scientists in the U.S. alone predicted for 2018: 140,000-190,000, in addition to 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
  • Time needed to sequence the complete human genome (analyzing 3 billion base pairs) in 2003: ten years
  • Time needed in 2013: one week
  • The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
  • Projected amount of digital information created annually that will either live in or pass through the cloud: 1/3
  • Increase in data collection volume year-over-year in 2012: 400%
  • Increase in number of individual data collectors from 2011 to 2012: nearly double (over 300 data collection parties in 2012)

*1 zetabyte = 1 billion terabytes | 1 petabyte = 1,000 terabytes | 1 terabyte = 1,000 gigabytes | 1 gigabyte = 1 billion bytes

Sources

How Much Transparency Do We Really Want?


William Galston in the Wall Street Journal: “Transparency is very nearly the opposite of privacy. In the current controversy, it is a demand that the government make public matters it conducts in private and wants to keep private.

image

The argument for disclosure goes like this: If the government is acting in the name of the people, the people need to know what their government is doing. How else can they judge these activities? Democratic government means accountability to the public, and accountability requires disclosure. History testifies to the link between secrecy and the abuse of public power. Without disclosure, the people will find it difficult to restrain government’s excesses—most importantly, secret activities that could endanger our liberties.
Government transparency has a distinguished history. In 1795, Immanuel Kant propounded what is often called the principle of publicity: Roughly, if you cannot reveal the principle that guides your policy without undermining that policy, then the policy itself is fatally flawed from a moral point of view.
Little more than a century later, in his famous “Fourteen Points” speech about U.S. war aims and the principles that would guide the peace, President Woodrow Wilson called for “Open covenants of peace, openly arrived at, after which there shall be no private international understandings of any kind but diplomacy shall proceed always frankly and in the public view.”…
Yet the relation between collective security and individual liberty is not zero-sum. Because another 9/11-scale terrorist event might well lead to even more intrusive antiterrorism measures, reducing the likelihood of such an event could end up preventing serious infringements on liberty. Up to a point, liberty and security can be mutually reinforcing. But at what point do they become opposed?
This is not a judgment that can be left to experts in the executive branch. Ultimately, the people, acting through their elected representatives, must decide—and it is hard to see how they can do so unless all representatives, not just a select few, have the information they need to participate in such a decision. As we learned in the 1970s, however, public deliberation on intelligence matters is anything but cost-free.”
 

White House Expands Guidance on Promoting Open Data


NextGov: “White House officials have announced expanded technical guidance to help agencies make more data accessible to the public in machine-readable formats.
Following up on President Obama’s May executive order linking the pursuit of open data to economic growth, innovation and government efficiency, two budget and science office spokesmen on Friday published a blog post highlighting new instructions and answers to frequently asked questions.
Nick Sinai, deputy chief technology officer at the Office of Science and Technology Policy, and Dominic Sale, supervisory policy analyst at the Office of Management and Budget, noted that the policy now in place means that all “newly generated government data will be required to be made available in open, machine-readable formats, greatly enhancing their accessibility and usefulness, while ensuring privacy and security.”

Behold: A Digital Bill of Rights for the Internet, by the Internet


Mashable: “The digital rights conversation was thrust into the mainstream spotlight after news of ongoing, widespread mass surveillance programs leaked to the public. Always a hot topic, these revelations sparked a strong online debate among the Internet community.
It also made us here at Mashable reflect on the digital freedoms and protections we feel each user should be guaranteed as a citizen of the Internet. To highlight some of the great conversations taking place about digital rights online, we asked the digital community to collaborate with us on the creation of a crowdsourced Digital Bill of Rights.
After six weeks of public discussions, document updates and changes, as well as incorporating input from digital rights experts, Mashable is pleased to unveil its first-ever Digital Bill of Rights, made for the Internet, by the Internet.”
 

Hackers Called Into Civic Duty


Wall Street Journal: “Cash-strapped cities are turning to an unusual source to improve their online services on the cheap: helpful hackers, who use city data to create tools tracking everything from real-time subway delays to where to get a free flu shot near your home and information about a contentious school-closing plan.
Hackers have been popularly portrayed as giving fits to national-security officials and credit-card companies, but the term also refers to people who like to write their own computer programs and help solve a variety of problems. Recently, hackers have begun working with cities to find ways of building applications, or apps, that make use of data—which gets stripped of personally identifiable information—that municipalities are collecting anyway in the regular course of governance….Last year, Chicago Mayor Rahm Emanuel signed an executive order mandating the city make available all data not protected by privacy laws. Today, the city has nearly 950 data sets publicly available, the most of any U.S. city, according to Code for America, a nonprofit that promotes openness in government.”

Too much information


Ian Leslie in Aeon: “Our instincts for privacy evolved in tribal societies where walls didn’t exist. No wonder we are hopeless oversharers…A few years ago George Loewenstein, professor of behavioural economics at Carnegie Mellon University in Pittsburgh, set out to investigate how people think about the consequences of their privacy choices on the internet. He soon concluded that they don’t….‘Thinking about online privacy doesn’t come naturally to us,’ Loewenstein told me when I spoke to him on the phone. ‘Nothing in our evolution or culture has equipped us to deal with it.’…We might be particularly prone to disclosing private information to a well-designed digital interface, making an unconscious and often unwise association between ease-of-use and safety. …This is not the only way our deeply embedded real-world instincts can backfire online. Take our rather noble instinct for reciprocity: returning a favour. If I reveal personal information to you, you’re more likely to reveal something to me. This works reasonably well when you can see my face and make a judgment about how likely I am to betray your confidence…Giving people more control over their privacy choices won’t solve these deeper problems. Indeed, Loewenstein found evidence for a ‘control paradox’. Just as many people mistakenly think that driving is safer than flying because they feel they have more control over it, so giving people more privacy settings to fiddle with makes them worry less about what they actually divulge.”

Open Economics Principles


“The Open Economics Working Group would like to introduce the Open Economics Principles, a Statement on Openness of Economic Data and Code

Economic research is based on building on, reusing and openly criticising the published body of economic knowledge. Furthermore, empirical economic research and data play a central role for policy-making in many important areas of our economies and societies.
Openness enables and underpins scholarly enquiry and debate, and is crucial in ensuring the reproducibility of economic research and analysis. Thus, for economics to function effectively, and for society to reap the full benefits from economic research, it is therefore essential that economic research results, data and analysis be openly and freely available, wherever possible.

  1. Open by default…
  2. Privacy and confidentiality…
  3. Reward structures and data citation…
  4. Data availability….
  5. Publicly funded data should be open…
  6. Usable and discoverable…
See Reasons and Background: http://openeconomics.net/principles/”

Smart Government and Big, Open Data: The Trickle-Up Effect


Anthony Townsend at the Future Now Blog: “As we grow numb to the daily headlines decrying the unimaginable scope of data being collected from Internet companies by the National Security Agency’s Prism program, its worth remembering that governments themselves also produce mountains of data too. Tabulations of the most recent U.S. census, conducted in 2010, involved billions of data points and trillions of calculations. Not surprisingly, it is probably safe to assume that the federal government is also the world’s largest spender on database software—its tab with just one company, market-leader Oracle, passed $700 million in 2012 alone. Government data isn’t just big in scope. It is deep in history—governments have been accumulating data for centuries. In 2006, the genealogical research site Ancestry.com imported 600 terabytes of data (about what Facebook collects in a single day!) from the first fifteen U.S. censuses (1790 to 1930).

But the vast majority of data collected by governments never sees the light of day. It sits squirreled away on servers, and is only rarely cross-referenced in ways that private sector companies do all the time to gain insights into what’s actually going on across the country, and emerging problems and opportunities. Yet as governments all around the world have realized, if shared safely with due precautions to protect individual privacy, in the hand of citizens all of this data could be a national civic monument of tremendous economic and social value.”

International Principles on the Application of Human Rights to Communications Surveillance


Final version, 10 July 2013:  “As technologies that facilitate State surveillance of communications advance, States are failing to ensure that laws and regulations related to communications surveillance adhere to international human rights and adequately protect the rights to privacy and freedom of expression. This document attempts to explain how international human rights law applies in the current digital environment, particularly in light of the increase in and changes to communications surveillance technologies and techniques. These principles can provide civil society groups, industry, States and others with a framework to evaluate whether current or proposed surveillance laws and practices are consistent with human rights.
These principles are the outcome of a global consultation with civil society groups, industry and international experts in communications surveillance law, policy and technology.”

New Book: Untangling the Web


By Aleks Krotoski: “The World Wide Web is the most revolutionary innovation of our time. In the last decade, it has utterly transformed our lives. But what real effects is it having on our social world? What does it mean to be a modern family when dinner table conversations take place over smartphones? What happens to privacy when we readily share our personal lives with friends and corporations? Are our Facebook updates and Twitterings inspiring revolution or are they just a symptom of our global narcissism? What counts as celebrity, when everyone can have a following or be a paparazzo? And what happens to relationships when love, sex and hate can be mediated by a computer? Social psychologist Aleks Krotoski has spent a decade untangling the effects of the Web on how we work, live and play. In this groundbreaking book, she uncovers how much humanity has – and hasn’t – changed because of our increasingly co-dependent relationship with the computer. In Untangling the Web, she tells the story of how the network became woven in our lives, and what it means to be alive in the age of the Internet.” Blog: http://untanglingtheweb.tumblr.com/