big data

Civics for a Digital Age

Curated on September 20, 2013August 3, 2018 by Stefaan Verhulst

Jathan Sadowski in the Atlantic on “Eleven principles for relating to cities that are automated and smart: Over half of the world’s population lives in urban environments, and that number is rapidly growing according to the World Health Organization. Many of us interact with the physical environments of cities on a daily basis: the arteries that move traffic, the grids that energize our lives, the buildings that prevent and direct actions. For many tech companies, though, much of this urban infrastructure is ripe for a digital injection. Cities have been “dumb” for millennia. It’s about time they get “smart” — or so the story goes….
Before accepting the techno-hype as a fait accompli, we should consider the implications such widespread technological changes might have on society, politics, and life in general. Urban scholar and historian Lewis Mumford warned of “megamachines” where people become mere components — like gears and transistors — in a hierarchical, human machine. The proliferation of smart projects requires an updated way of thinking about their possibilities, complications, and effects.
A new book, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, by Anthony Townsend, a research director at the Institute for the Future, provides some groundwork for understanding how these urban projects are occurring and what guiding principles we might use in directing their development. Townsend sets out to sketch a new understanding of “civics,” one that will account for new technologies.
The foundation for his theory speaks to common, worthwhile concerns: “Until now, smart-city visions have been controlling us. What we need is a new social code to bring meaning and to exert control over the technological code of urban operating systems.” It’s easy to feel like technologies — especially urban ones that are, at once, ubiquitous and often unseen to city-dwellers — have undue influence over our lives. Townsend’s civics, which is based on eleven principles, looks to address, prevent, and reverse that techno-power.”

From Crowd-Sourcing Potholes to Community Policing

Curated on September 19, 2013August 3, 2018 by Stefaan Verhulst

New paper by Manik Suri (GovLab): “The tragic Boston Marathon bombing and hair-raising manhunt that ensued was a sobering event. It also served as a reminder that emerging “civic technologies” – platforms and applications that enable citizens to connect and collaborate with each other and with government – are more important today than ever before. As commentators have noted, local police and federal agents utilized a range of technological platforms to tap the “wisdom of the crowd,” relying on thousands of private citizens to develop a “hive mind” that identified two suspects within a record period of time.
In the immediate wake of the devastating attack on April 15th, investigators had few leads. But within twenty-four hours, senior FBI officials, determined to seek “assistance from the public,” called on everyone with information to submit all media, tips, and leads related to the Boston Marathon attack. This unusual request for help yielded thousands of images and videos from local Bostonians, tourists, and private companies through technological channels ranging from telephone calls and emails to Flickr posts and Twitter messages. In mere hours, investigators were able to “crowd-source” a tremendous amount of data – including thousands of images from personal cameras, amateur videos from smart phones, and cell-tower information from private carriers. Combing through data from this massive network of “eyes and ears,” law enforcement officials were quickly able to generate images of two lead suspects – enabling a “modern manhunt” to commence immediately.
Technological innovations have transformed our commercial, political, and social realities. These advances include new approaches to how we generate knowledge, access information, and interact with one another, as well as new pathways for building social movements and catalyzing political change. While a significant body of academic research has focused on the role of technology in transforming electoral politics and social movements, less attention has been paid to how technological innovation can improve the process of governance itself.
A growing number of platforms and applications lie at this intersection of technology and governance, in what might be termed the “civic technology” sector. Broadly speaking, this sector involves the application of new information and communication technologies – ranging from robust social media platforms to state-of-the-art big data analysis systems – to address public policy problems. Civic technologies encompass enterprises that “bring web technologies directly to government, build services on top of government data for citizens, and change the way citizens ask, get, or need services from government.” These technologies have the potential to transform governance by promoting greater transparency in policy-making, increasing government efficiency, and enhancing citizens’ participation in public sector decision-making.“

Three Paradoxes of Big Data

Curated on September 19, 2013August 3, 2018 by Stefaan Verhulst

New Paper by Neil M. Richards and Jonathan H. King in the Stanford Law Review Online: “Big data is all the rage. Its proponents tout the use of sophisticated analytics to mine large data sets for insight as the solution to many of our society’s problems. These big data evangelists insist that data-driven decisionmaking can now give us better predictions in areas ranging from college admissions to dating to hiring to medicine to national security and crime prevention. But much of the rhetoric of big data contains no meaningful analysis of its potential perils, only the promise. We don’t deny that big data holds substantial potential for the future, and that large dataset analysis has important uses today. But we would like to sound a cautionary note and pause to consider big data’s potential more critically. In particular, we want to highlight three paradoxes in the current rhetoric about big data to help move us toward a more complete understanding of the big data picture. First, while big data pervasively collects all manner of private information, the operations of big data itself are almost entirely shrouded in legal and commercial secrecy. We call this the Transparency Paradox. Second, though big data evangelists talk in terms of miraculous outcomes, this rhetoric ignores the fact that big data seeks to identify at the expense of individual and collective identity. We call this the Identity Paradox. And third, the rhetoric of big data is characterized by its power to transform society, but big data has power effects of its own, which privilege large government and corporate entities at the expense of ordinary individuals. We call this the Power Paradox. Recognizing the paradoxes of big data, which show its perils alongside its potential, will help us to better understand this revolution. It may also allow us to craft solutions to produce a revolution that will be as good as its evangelists predict.”

(Appropriate) Big Data for Climate Resilience?

Curated on September 13, 2013August 15, 2018 by Stefaan Verhulst

Amy Luers at the Stanford Social Innovation Review: “The answer to whether big data can help communities build resilience to climate change is yes—there are huge opportunities, but there are also risks.

Opportunities

Feedback: Strong negative feedback is core to resilience. A simple example is our body’s response to heat stress—sweating, which is a natural feedback to cool down our body. In social systems, feedbacks are also critical for maintaining functions under stress. For example, communication by affected communities after a hurricane provides feedback for how and where organizations and individuals can provide help. While this kind of feedback used to rely completely on traditional communication channels, now crowdsourcing and data mining projects, such as Ushahidi and Twitter Earthquake detector, enable faster and more-targeted relief.
Diversity: Big data is enhancing diversity in a number of ways. Consider public health systems. Health officials are increasingly relying on digital detection methods, such as Google Flu Trends or Flu Near You, to augment and diversify traditional disease surveillance.
Self-Organization: A central characteristic of resilient communities is the ability to self-organize. This characteristic must exist within a community (see the National Research Council Resilience Report), not something you can impose on it. However, social media and related data-mining tools (InfoAmazonia, Healthmap) can enhance situational awareness and facilitate collective action by helping people identify others with common interests, communicate with them, and coordinate efforts.

Risks

Eroding trust: Trust is well established as a core feature of community resilience. Yet the NSA PRISM escapade made it clear that big data projects are raising privacy concerns and possibly eroding trust. And it is not just an issue in government. For example, Target analyzes shopping patterns and can fairly accurately guess if someone in your family is pregnant (which is awkward if they know your daughter is pregnant before you do). When our trust in government, business, and communities weakens, it can decrease a society’s resilience to climate stress.
Mistaking correlation for causation: Data mining seeks meaning in patterns that are completely independent of theory (suggesting to some that theory is dead). This approach can lead to erroneous conclusions when correlation is mistakenly taken for causation. For example, one study demonstrated that data mining techniques could show a strong (however spurious) correlation between the changes in the S&P 500 stock index and butter production in Bangladesh. While interesting, a decision support system based on this correlation would likely prove misleading.
Failing to see the big picture: One of the biggest challenges with big data mining for building climate resilience is its overemphasis on the hyper-local and hyper-now. While this hyper-local, hyper-now information may be critical for business decisions, without a broader understanding of the longer-term and more-systemic dynamism of social and biophysical systems, big data provides no ability to understand future trends or anticipate vulnerabilities. We must not let our obsession with the here and now divert us from slower-changing variables such as declining groundwater, loss of biodiversity, and melting ice caps—all of which may silently define our future. A related challenge is the fact that big data mining tends to overlook the most vulnerable populations. We must not let the lure of the big data microscope on the “well-to-do” populations of the world make us blind to the less well of populations within cities and communities that have more limited access to smart phones and the Internet.”

Open data for accountable governance: Is data literacy the key to citizen engagement?

Curated on September 9, 2013May 29, 2019 by Stefaan Verhulst

Camilla Monckton at UNDP’s Voices of Eurasia blog: “How can technology connect citizens with governments, and how can we foster, harness, and sustain the citizen engagement that is so essential to anti-corruption efforts?
UNDP has worked on a number of projects that use technology to make it easier for citizens to report corruption to authorities:

Serbia’s SMS corruption reporting in the health sector
Montenegro’s ‘be responsible app’
Kosovo’s online corruption reporting site kallxo.com

These projects are showing some promising results, and provide insights into how a more participatory, interactive government could develop.
At the heart of the projects is the ability to use citizen generated data to identify and report problems for governments to address….

Wanted: Citizen experts

As Kenneth Cukier, The Economist’s Data Editor, has discussed, data literacy will become the new computer literacy. Big data is still nascent and it is impossible to predict exactly how it will affect society as a whole. What we do know is that it is here to stay and data literacy will be integral to our lives.
It is essential that we understand how to interact with big data and the possibilities it holds.
Data literacy needs to be integrated into the education system. Educating non-experts to analyze data is critical to enabling broad participation in this new data age.
As technology advances, key government functions become automated, and government data sharing increases, newer ways for citizens to engage will multiply.
Technology changes rapidly, but the human mind and societal habits cannot. After years of closed government and bureaucratic inefficiency, adaptation of a new approach to governance will take time and education.
We need to bring up a generation that sees being involved in government decisions as normal, and that views participatory government as a right, not an ‘innovative’ service extended by governments.

What now?

In the meantime, while data literacy lies in the hands of a few, we must continue to connect those who have the technological skills with citizen experts seeking to change their communities for the better – as has been done in many a Social Innovation Camps recently (in Montenegro, Ukraine and Armenia at Mardamej and Mardamej Relaoded and across the region at Hurilab).
The social innovation camp and hackathon models are an increasingly debated topic (covered by Susannah Vila, David Eaves, Alex Howard and Clay Johnson).
On the whole, evaluations are leading to newer models that focus on greater integration of mentorship to increase sustainability – which I readily support. However, I do have one comment:
Social innovation camps are often criticized for a lack of sustainability – a claim based on the limited number of apps that go beyond the prototype phase. I find a certain sense of irony in this, for isn’t this what innovation is about: Opening oneself up to the risk of failure in the hope of striking something great?
In the words of Vinod Khosla:

“No failure means no risk, which means nothing new.”

As more data is released, the opportunity for new apps and new ways for citizen interaction will multiply and, who knows, someone might come along and transform government just as TripAdvisor transformed the travel industry.”

Public Open Data: The Good, the Bad, the Future

Curated on September 5, 2013May 29, 2019 by Stefaan Verhulst

Camille Crittenden at IDEALAB: “Some of the most powerful tools combine official public data with social media or other citizen input, such as the recent partnership between Yelp and the public health departments in New York and San Francisco for restaurant hygiene inspection ratings. In other contexts, such tools can help uncover and ultimately reduce corruption by making it easier to “follow the money.”
Despite the opportunities offered by “free data,” this trend also raises new challenges and concerns, among them, personal privacy and security. While attention has been devoted to the unsettling power of big data analysis and “predictive analytics” for corporate marketing, similar questions could be asked about the value of public data. Does it contribute to community cohesion that I can find out with a single query how much my neighbors paid for their house or (if employed by public agencies) their salaries? Indeed, some studies suggest that greater transparency leads not to greater trust in government but to resignation and apathy.
Exposing certain law enforcement data also increases the possibility of vigilantism. California law requires the registration and publication of the home addresses of known sex offenders, for instance. Or consider the controversy and online threats that erupted when, shortly after the Newtown tragedy, a newspaper in New York posted an interactive map of gun permit owners in nearby counties.
…Policymakers and officials must still mind the “big data gap.”So what does the future hold for open data? Publishing data is only one part of the information ecosystem. To be useful, tools must be developed for cleaning, sorting, analyzing and visualizing it as well. …
For-profit companies and non-profit watchdog organizations will continue to emerge and expand, building on the foundation of this data flood. Public-private partnerships such as those between San Francisco and Appallicious or Granicus, startups created by Code for America’s Incubator, and non-partisan organizations like the Sunlight Foundation and MapLight rely on public data repositories for their innovative applications and analysis.
Making public data more accessible is an important goal and offers enormous potential to increase civic engagement. To make the most effective and equitable use of this resource for the public good, cities and other government entities should invest in the personnel and equipment — hardware and software — to make it universally accessible. At the same time, Chief Data Officers (or equivalent roles) should also be alert to the often hidden challenges of equity, inclusion, privacy, and security.”

Big Data and Disease Prevention: From Quantified Self to Quantified Communities

Curated on August 27, 2013August 3, 2018 by Stefaan Verhulst

New Paper by Meredith A. Barrett, Olivier Humblet, Robert A. Hiatt, and Nancy E. Adler: “Big data is often discussed in the context of improving medical care, but it also has a less appreciated but equally important role to play in preventing disease. Big data can facilitate action on the modifiable risk factors that contribute to a large fraction of the chronic disease burden, such as physical activity, diet, tobacco use, and exposure to pollution. It can do so by facilitating the discovery of risk factors for disease at population, subpopulation, and individual levels, and by improving the effectiveness of interventions to help people achieve healthier behaviors in healthier environments. In this article, we describe new sources of big data in population health, explore their applications, and present two case studies illustrating how big data can be leveraged for prevention. We also discuss the many implementation obstacles that must be overcome before this vision can become a reality.”

Index: The Data Universe

Curated on August 22, 2013September 6, 2019 by admin

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe and was originally published in 2013.

How much data exists in the digital universe as of 2012: 2.7 zetabytes*
Increase in the quantity of Internet data from 2005 to 2012: +1,696%
Percent of the world’s data created in the last two years: 90
Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
Percent of the digital universe in 2005 created by the U.S. and western Europe vs. emerging markets: 48 vs. 20
Percent of the digital universe in 2012 created by emerging markets: 36
Percent of the digital universe in 2020 predicted to be created by China alone: 21
How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
Percent of which enterprises have liability or responsibility for (copyright, privacy, compliance with regulations, etc.): 80
Amount included in the Obama Administration’s 2-12 Big Data initiative: over $200 million
Amount the Department of Defense is investing annually on Big Data projects as of 2012: over $250 million
Data created per day in 2012: 2.5 quintillion bytes
How many terabytes* of data collected by the U.S. Library of Congress as of April 2011: 235
How many terabytes of data collected by Walmart per hour as of 2012: 2,560, or 2.5 petabytes*
Projected growth in global data generated per year, as of 2011: 40%
Number of IT jobs created globally by 2015 to support big data: 4.4 million (1.9 million in the U.S.)
Potential shortage of data scientists in the U.S. alone predicted for 2018: 140,000-190,000, in addition to 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
Time needed to sequence the complete human genome (analyzing 3 billion base pairs) in 2003: ten years
Time needed in 2013: one week
The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
Projected amount of digital information created annually that will either live in or pass through the cloud: 1/3
Increase in data collection volume year-over-year in 2012: 400%
Increase in number of individual data collectors from 2011 to 2012: nearly double (over 300 data collection parties in 2012)

*1 zetabyte = 1 billion terabytes | 1 petabyte = 1,000 terabytes | 1 terabyte = 1,000 gigabytes | 1 gigabyte = 1 billion bytes

Sources

Collaboration In Biology's Century

Curated on August 19, 2013August 3, 2018 by Stefaan Verhulst

Todd Sherer, Chief Executive Officer of The Michael J. Fox Foundation for Parkinson’s Research, in Forbes: “he problem is, we all still work in a system that feeds on secrecy and competition. It’s hard enough work just to dream up win/win collaborative structures; getting them off the ground can feel like pushing a boulder up a hill. Yet there is no doubt that the realities of today’s research environment — everything from the accumulation of big data to the ever-shrinking availability of funds — demand new models for collaboration. Call it “collaboration 2.0.”…I share a few recent examples in the hope of increasing the reach of these initiatives, inspiring others like them, and encouraging frank commentary on how they’re working.
Open-Access Data
The successes of collaborations in the traditional sense, coupled with advanced techniques such as genomic sequencing, have yielded masses of data. Consortia of clinical sites around the world are working together to collect and characterize data and biospecimens through standardized methods, leading to ever-larger pools — more like Great Lakes — of data. Study investigators draw their own conclusions, but there is so much more to discover than any individual lab has the bandwidth for….
Crowdsourcing
A great way to grow engagement with resources you’re willing to share? Ask for it. Collaboration 2.0 casts a wide net. We dipped our toe in the crowdsourcing waters earlier this year with our Parkinson’s Data Challenge, which asked anyone interested to download a set of data that had been collected from PD patients and controls using smart phones. …
Cross-Disciplinary Collaboration 2.0
The more we uncover about the interconnectedness and complexity of the human system, the more proof we are gathering that findings and treatments for one disease may provide invaluable insights for others. We’ve seen some really intriguing crosstalk between the Parkinson’s and Alzheimer’s disease research communities recently…
The results should be: More ideas. More discovery. Better health.”

Five myths about big data

Curated on August 17, 2013August 3, 2018 by Stefaan Verhulst

Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”