Index: The Data Universe


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe and was originally published in 2013.

  • How much data exists in the digital universe as of 2012: 2.7 zetabytes*
  • Increase in the quantity of Internet data from 2005 to 2012: +1,696%
  • Percent of the world’s data created in the last two years: 90
  • Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
  • Percent of the digital universe in 2005 created by the U.S. and western Europe vs. emerging markets: 48 vs. 20
  • Percent of the digital universe in 2012 created by emerging markets: 36
  • Percent of the digital universe in 2020 predicted to be created by China alone: 21
  • How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
  • Percent of which enterprises have liability or responsibility for (copyright, privacy, compliance with regulations, etc.): 80
  • Amount included in the Obama Administration’s 2-12 Big Data initiative: over $200 million
  • Amount the Department of Defense is investing annually on Big Data projects as of 2012: over $250 million
  • Data created per day in 2012: 2.5 quintillion bytes
  • How many terabytes* of data collected by the U.S. Library of Congress as of April 2011: 235
  • How many terabytes of data collected by Walmart per hour as of 2012: 2,560, or 2.5 petabytes*
  • Projected growth in global data generated per year, as of 2011: 40%
  • Number of IT jobs created globally by 2015 to support big data: 4.4 million (1.9 million in the U.S.)
  • Potential shortage of data scientists in the U.S. alone predicted for 2018: 140,000-190,000, in addition to 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
  • Time needed to sequence the complete human genome (analyzing 3 billion base pairs) in 2003: ten years
  • Time needed in 2013: one week
  • The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
  • Projected amount of digital information created annually that will either live in or pass through the cloud: 1/3
  • Increase in data collection volume year-over-year in 2012: 400%
  • Increase in number of individual data collectors from 2011 to 2012: nearly double (over 300 data collection parties in 2012)

*1 zetabyte = 1 billion terabytes | 1 petabyte = 1,000 terabytes | 1 terabyte = 1,000 gigabytes | 1 gigabyte = 1 billion bytes

Sources

The Logic of Connective Action- Digital Media and the Personalization of Contentious Politics


New book by W. Lance Bennett and Alexandra Segerberg: “The Logic of Connective Action explains the rise of a personalized digitally networked politics in which diverse individuals address the common problems of our times such as economic fairness and climate change. Rich case studies from the United States, United Kingdom, and Germany illustrate a theoretical framework for understanding how large-scale connective action is coordinated using inclusive discourses such as “We Are the 99%” that travel easily through social media. In many of these mobilizations, communication operates as an organizational process that may replace or supplement familiar forms of collective action based on organizational resource mobilization, leadership, and collective action framing. In some cases, connective action emerges from crowds that shun leaders, as when Occupy protesters created media networks to channel resources and create loose ties among dispersed physical groups. In other cases, conventional political organizations deploy personalized communication logics to enable large-scale engagement with a variety of political causes. The Logic of Connective Action shows how power is organized in communication-based networks, and what political outcomes may result.”

Is Connectivity A Human Right?


Mark Zuckerberg (Facebook): For almost ten years, Facebook has been on a mission to make the world more open and connected. Today we connect more than 1.15 billion people each month, but as we started thinking about connecting the next 5 billion, we realized something important: the vast majority of people in the world don’t have access to the internet.
Today, only 2.7 billion people are online — a little more than one third of the world. That is growing by less than 9% each year, but that’s slow considering how early we are in the internet’s development. Even though projections show most people will get smartphones in the next decade, most people still won’t have data access because the cost of data remains much more expensive than the price of a smartphone.
Below, I’ll share a rough proposal for how we can connect the next 5 billion people, and a rough plan to work together as an industry to get there. We’ll discuss how we can make internet access more affordable by making it more efficient to deliver data, how we can use less data by improving the efficiency of the apps we build and how we can help businesses drive internet access by developing a new model to get people online.
I call this a “rough plan” because, like many long term technology projects, we expect the details to evolve. It may be possible to achieve more than we lay out here, but it may also be more challenging than we predict. The specific technical work will evolve as people contribute better ideas, and we welcome all feedback on how to improve this.
Connecting the world is one of the greatest challenges of our generation. This is just one small step toward achieving that goal. I’m excited to work together to make this a reality.
For the full version, click here.

Crowd-Sourcing the Nation: Now a National Effort


Release from the U.S. Department of the Interior, U.S. Geological Survey: “The mapping crowd-sourcing program, known as The National Map Corps (TNMCorps), encourages citizens to collect structures data by adding new features, removing obsolete points, and correcting existing data for The National Map database. Structures being mapped in the project include schools, hospitals, post offices, police stations and other important public buildings.
Since the start of the project in 2012, more than 780 volunteers have made in excess of 13,000 contributions.  In addition to basic editing, a second volunteer peer review process greatly enhances the quality of data provided back to The National Map.  A few months ago, volunteers in 35 states were actively involved.  This final release of states opens up the entire country for volunteer structures enhancement.
To show appreciation of our volunteer’s efforts, The National Map Corps has instituted a recognition program that awards “virtual” badges to volunteers. The badges consist of a series of antique surveying instruments ranging from the Order of the Surveyor’s Chain (25 – 50 points) to the Theodolite Assemblage (2000+ points). Additionally, volunteers are publically acclaimed (with permission) via Twitter, Facebook and Google+….
Tools on TNMCorps website explain how a volunteer can edit any area, regardless of their familiarity with the selected structures, and becoming a volunteer for TNMCorps is easy; go to The National Map Corps website to learn more and to sign up as a volunteer. If you have access to the Internet and are willing to dedicate some time to editing map data, we hope you will consider participating!”

From Machinery to Mobility: Government and Democracy in a Participative Age


From Machinery to Mobility

New book by Jeffrey Roy: “The Westminster-stylized model of Parliamentary democratic politics and public service accountability is increasingly out of step with the realities of today’s digitally and socially networked era. This book explores the reconfiguration of democratic and managerial governance within democratic societies due to the advent of technological mobility. More specifically, the traditional public sector prism of organizational and accountability – denoted as ‘machinery of government’, is increasingly strained in an era characterized by smart devices, social media, and cloud computing. This book examines the roots and implications of the tensions between machinery and mobility and the sorts of investments and initiatives that have been undertaken by governments around the world as well as their appropriateness and relative impacts. This book also examines the prospects for holistic adaptation of democratic and managerial systems going forward, identifying the most crucial directions and determinants for improving public sector performance in terms of outcomes, accountability, and agility. Accordingly, the ultimate aim of this initiative is to contribute to the formation of intellectual foundations for more systemic reforms of public sector governance in Canada and elsewhere, and to offer forward-looking trajectories for government adaptation in shifting from a traditional prism of ‘machinery’ to new organizational and institutional arrangements better suited for an era of ‘mobility’.”

Defense Against National Vulnerabilities in Public Data


DOD/DARPA Notice (See also Foreign Policy article): “OBJECTIVE: Investigate the national security threat posed by public data available either for purchase or through open sources. Based on principles of data science, develop tools to characterize and assess the nature, persistence, and quality of the data. Develop tools for the rapid anonymization and de-anonymization of data sources. Develop framework and tools to measure the national security impact of public data and to defend against the malicious use of public data against national interests.
DESCRIPTION: The vulnerabilities to individuals from a data compromise are well known and documented now as “identity theft.” These include regular stories published in the news and research journals documenting the loss of personally identifiable information by corporations and governments around the world. Current trends in social media and commerce, with voluntary disclosure of personal information, create other potential vulnerabilities for individuals participating heavily in the digital world. The Netflix Challenge in 2009 was launched with the goal of creating better customer pick prediction algorithms for the movie service [1]. An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data. This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge [2]. The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.
Could a modestly funded group deliver nation-state type effects using only public data?…”
The official link for this solicitation is: www.acq.osd.mil/osbp/sbir/solicitations/sbir20133.
 

I Flirt and Tweet. Follow Me at #Socialbot.


in The New York Times: “FROM the earliest days of the Internet, robotic programs, or bots, have been trying to pass themselves off as human. Chatbots greet users when they enter an online chat room, for example, or kick them out when they get obnoxious….

Now come socialbots. These automated charlatans are programmed to tweet and retweet. They have quirks, life histories and the gift of gab. Many of them have built-in databases of current events, so they can piece together phrases that seem relevant to their target audience. They have sleep-wake cycles so their fakery is more convincing, making them less prone to repetitive patterns that flag them as mere programs. Some have even been souped up by so-called persona management software, which makes them seem more real by adding matching Facebook, Reddit or Foursquare accounts, giving them an online footprint over time as they amass friends and like-minded followers.

Researchers say this new breed of bots is being designed not just with greater sophistication but also with grander goals: to sway elections, to influence the stock market, to attack governments, even to flirt with people and one another.

…Socialbots are tapping into an ever-expanding universe of social media. Last year, the number of Twitter accounts topped 500 million. Some researchers estimate that only 35 percent of the average Twitter user’s followers are real people. In fact, more than half of Internet traffic already comes from nonhuman sources like bots or other types of algorithms. Within two years, about 10 percent of the activity occurring on social online networks will be masquerading bots, according to technology researchers….

Much of the social media remains unregulated by campaign finance and transparency laws. So far, the Federal Election Commission has been reluctant to venture into this realm.

But the bots are likely to venture into ours, said Tim Hwang, chief scientist at the Pacific Social Architecting Corporation, which creates bots and technologies that can shape social behavior. “Our vision is that in the near future automatons will eventually be able to rally crowds, open up bank accounts, write letters,” he said, “all through human surrogates.”

Searching Big Data for ‘Digital Smoke Signals’


Steve Lohr in the New York Times: “It is the base camp of the United Nations Global Pulse team — a tiny unit inside an institution known for its sprawling bureaucracy, not its entrepreneurial hustle. Still, the focus is on harnessing technology in new ways — using data from social networks, blogs, cellphones and online commerce to transform economic development and humanitarian aid in poorer nations….

The efforts by Global Pulse and a growing collection of scientists at universities, companies and nonprofit groups have been given the label “Big Data for development.” It is a field of great opportunity and challenge. The goal, the scientists involved agree, is to bring real-time monitoring and prediction to development and aid programs. Projects and policies, they say, can move faster, adapt to changing circumstances and be more effective, helping to lift more communities out of poverty and even save lives.

Research by Global Pulse and other groups, for example, has found that analyzing Twitter messages can give an early warning of a spike in unemployment, price rises and disease. Such “digital smoke signals of distress,” Mr. Kirkpatrick said, usually come months before official statistics — and in many developing countries today, there are no reliable statistics.

Finding the signals requires data, though, and much of the most valuable data is held by private companies, especially mobile phone operators, whose networks carry text messages, digital-cash transactions and location data. So persuading telecommunications operators, and the governments that regulate and sometimes own them, to release some of the data is a top task for the group. To analyze the data, the groups apply tools now most widely used for pinpointing customers with online advertising.”

Innovation Network' Connects Leaders Across Latin America to Share Ideas


National Democratic Institute: “Throughout Latin America, political and civic leaders are under increasing pressure to solve pervasive problems such as poverty, insecurity, corruption and lack of government transparency. Some of that pressure is generated by social media and other new communications tools available to constituents. But new technology is also aiding the response.
Revolutionary developments such as georeferencing and low-cost video conferencing have spawned new ways for political and civic leaders to address some of these problems. Georeferencing, for example, helps combat corruption by making it possible to track the location of individuals, such as government employees, at a given time to ensure they are performing work when and where they say they are.
Leaders are using new technology to push for campaign finance transparency in Colombia, and to improve how political parties in Argentina and Uruguay prepare their members to tackle public policy challenges by using web-based tools for virtual trainings. In Honduras, where it is common for corrupt teachers to claim pay for work in multiple districts, the government is using georeferencing to ensure that these teachers aren’t paid for work they didn’t do.
But despite the innovations, there is little communication among countries in the region, so new methods developed in one country are often unknown in another. To overcome that gap, NDI has supported the creation of Red Innovación (RI), or “Innovation Network,” a virtual online Spanish-language forum where social and political innovators from throughout the region can highlight initiatives, solicit feedback and harvest new ideas to help governments become more responsive, transparent and effective.
Red Innovación uses platforms such as Google Hangout videoconferences to help put political parties and civil society organizations in touch with experts on such topics as how to communicate more effectively, how cyberactivism works and how to use technology to promote transparency.”
 

Smart Government and Big, Open Data: The Trickle-Up Effect


Anthony Townsend at the Future Now Blog: “As we grow numb to the daily headlines decrying the unimaginable scope of data being collected from Internet companies by the National Security Agency’s Prism program, its worth remembering that governments themselves also produce mountains of data too. Tabulations of the most recent U.S. census, conducted in 2010, involved billions of data points and trillions of calculations. Not surprisingly, it is probably safe to assume that the federal government is also the world’s largest spender on database software—its tab with just one company, market-leader Oracle, passed $700 million in 2012 alone. Government data isn’t just big in scope. It is deep in history—governments have been accumulating data for centuries. In 2006, the genealogical research site Ancestry.com imported 600 terabytes of data (about what Facebook collects in a single day!) from the first fifteen U.S. censuses (1790 to 1930).

But the vast majority of data collected by governments never sees the light of day. It sits squirreled away on servers, and is only rarely cross-referenced in ways that private sector companies do all the time to gain insights into what’s actually going on across the country, and emerging problems and opportunities. Yet as governments all around the world have realized, if shared safely with due precautions to protect individual privacy, in the hand of citizens all of this data could be a national civic monument of tremendous economic and social value.”