(Appropriate) Big Data for Climate Resilience?


Amy Luers at the Stanford Social Innovation Review: “The answer to whether big data can help communities build resilience to climate change is yes—there are huge opportunities, but there are also risks.

Opportunities

  • Feedback: Strong negative feedback is core to resilience. A simple example is our body’s response to heat stress—sweating, which is a natural feedback to cool down our body. In social systems, feedbacks are also critical for maintaining functions under stress. For example, communication by affected communities after a hurricane provides feedback for how and where organizations and individuals can provide help. While this kind of feedback used to rely completely on traditional communication channels, now crowdsourcing and data mining projects, such as Ushahidi and Twitter Earthquake detector, enable faster and more-targeted relief.
  • Diversity: Big data is enhancing diversity in a number of ways. Consider public health systems. Health officials are increasingly relying on digital detection methods, such as Google Flu Trends or Flu Near You, to augment and diversify traditional disease surveillance.
  • Self-Organization: A central characteristic of resilient communities is the ability to self-organize. This characteristic must exist within a community (see the National Research Council Resilience Report), not something you can impose on it. However, social media and related data-mining tools (InfoAmazonia, Healthmap) can enhance situational awareness and facilitate collective action by helping people identify others with common interests, communicate with them, and coordinate efforts.

Risks

  • Eroding trust: Trust is well established as a core feature of community resilience. Yet the NSA PRISM escapade made it clear that big data projects are raising privacy concerns and possibly eroding trust. And it is not just an issue in government. For example, Target analyzes shopping patterns and can fairly accurately guess if someone in your family is pregnant (which is awkward if they know your daughter is pregnant before you do). When our trust in government, business, and communities weakens, it can decrease a society’s resilience to climate stress.
  • Mistaking correlation for causation: Data mining seeks meaning in patterns that are completely independent of theory (suggesting to some that theory is dead). This approach can lead to erroneous conclusions when correlation is mistakenly taken for causation. For example, one study demonstrated that data mining techniques could show a strong (however spurious) correlation between the changes in the S&P 500 stock index and butter production in Bangladesh. While interesting, a decision support system based on this correlation would likely prove misleading.
  • Failing to see the big picture: One of the biggest challenges with big data mining for building climate resilience is its overemphasis on the hyper-local and hyper-now. While this hyper-local, hyper-now information may be critical for business decisions, without a broader understanding of the longer-term and more-systemic dynamism of social and biophysical systems, big data provides no ability to understand future trends or anticipate vulnerabilities. We must not let our obsession with the here and now divert us from slower-changing variables such as declining groundwater, loss of biodiversity, and melting ice caps—all of which may silently define our future. A related challenge is the fact that big data mining tends to overlook the most vulnerable populations. We must not let the lure of the big data microscope on the “well-to-do” populations of the world make us blind to the less well of populations within cities and communities that have more limited access to smart phones and the Internet.”

Open data for accountable governance: Is data literacy the key to citizen engagement?


at UNDP’s Voices of Eurasia blog: “How can technology connect citizens with governments, and how can we foster, harness, and sustain the citizen engagement that is so essential to anti-corruption efforts?
UNDP has worked on a number of projects that use technology to make it easier for citizens to report corruption to authorities:

These projects are showing some promising results, and provide insights into how a more participatory, interactive government could develop.
At the heart of the projects is the ability to use citizen generated data to identify and report problems for governments to address….

Wanted: Citizen experts

As Kenneth Cukier, The Economist’s Data Editor, has discussed, data literacy will become the new computer literacy. Big data is still nascent and it is impossible to predict exactly how it will affect society as a whole. What we do know is that it is here to stay and data literacy will be integral to our lives.
It is essential that we understand how to interact with big data and the possibilities it holds.
Data literacy needs to be integrated into the education system. Educating non-experts to analyze data is critical to enabling broad participation in this new data age.
As technology advances, key government functions become automated, and government data sharing increases, newer ways for citizens to engage will multiply.
Technology changes rapidly, but the human mind and societal habits cannot. After years of closed government and bureaucratic inefficiency, adaptation of a new approach to governance will take time and education.
We need to bring up a generation that sees being involved in government decisions as normal, and that views participatory government as a right, not an ‘innovative’ service extended by governments.

What now?

In the meantime, while data literacy lies in the hands of a few, we must continue to connect those who have the technological skills with citizen experts seeking to change their communities for the better – as has been done in many a Social Innovation Camps recently (in Montenegro, Ukraine and Armenia at Mardamej and Mardamej Relaoded and across the region at Hurilab).
The social innovation camp and hackathon models are an increasingly debated topic (covered by Susannah Vila, David Eaves, Alex Howard and Clay Johnson).
On the whole, evaluations are leading to newer models that focus on greater integration of mentorship to increase sustainability – which I readily support. However, I do have one comment:
Social innovation camps are often criticized for a lack of sustainability – a claim based on the limited number of apps that go beyond the prototype phase. I find a certain sense of irony in this, for isn’t this what innovation is about: Opening oneself up to the risk of failure in the hope of striking something great?
In the words of Vinod Khosla:

“No failure means no risk, which means nothing new.”

As more data is released, the opportunity for new apps and new ways for citizen interaction will multiply and, who knows, someone might come along and transform government just as TripAdvisor transformed the travel industry.”

Public Open Data: The Good, the Bad, the Future


at IDEALAB: “Some of the most powerful tools combine official public data with social media or other citizen input, such as the recent partnership between Yelp and the public health departments in New York and San Francisco for restaurant hygiene inspection ratings. In other contexts, such tools can help uncover and ultimately reduce corruption by making it easier to “follow the money.”
Despite the opportunities offered by “free data,” this trend also raises new challenges and concerns, among them, personal privacy and security. While attention has been devoted to the unsettling power of big data analysis and “predictive analytics” for corporate marketing, similar questions could be asked about the value of public data. Does it contribute to community cohesion that I can find out with a single query how much my neighbors paid for their house or (if employed by public agencies) their salaries? Indeed, some studies suggest that greater transparency leads not to greater trust in government but to resignation and apathy.
Exposing certain law enforcement data also increases the possibility of vigilantism. California law requires the registration and publication of the home addresses of known sex offenders, for instance. Or consider the controversy and online threats that erupted when, shortly after the Newtown tragedy, a newspaper in New York posted an interactive map of gun permit owners in nearby counties.
…Policymakers and officials must still mind the “big data gap.”So what does the future hold for open data? Publishing data is only one part of the information ecosystem. To be useful, tools must be developed for cleaning, sorting, analyzing and visualizing it as well. …
For-profit companies and non-profit watchdog organizations will continue to emerge and expand, building on the foundation of this data flood. Public-private partnerships such as those between San Francisco and Appallicious or Granicus, startups created by Code for America’s Incubator, and non-partisan organizations like the Sunlight Foundation and MapLight rely on public data repositories for their innovative applications and analysis.
Making public data more accessible is an important goal and offers enormous potential to increase civic engagement. To make the most effective and equitable use of this resource for the public good, cities and other government entities should invest in the personnel and equipment — hardware and software — to make it universally accessible. At the same time, Chief Data Officers (or equivalent roles) should also be alert to the often hidden challenges of equity, inclusion, privacy, and security.”

Big Data and Disease Prevention: From Quantified Self to Quantified Communities


New Paper by Meredith A. Barrett, Olivier Humblet, Robert A. Hiatt, and Nancy E. Adler: “Big data is often discussed in the context of improving medical care, but it also has a less appreciated but equally important role to play in preventing disease. Big data can facilitate action on the modifiable risk factors that contribute to a large fraction of the chronic disease burden, such as physical activity, diet, tobacco use, and exposure to pollution. It can do so by facilitating the discovery of risk factors for disease at population, subpopulation, and individual levels, and by improving the effectiveness of interventions to help people achieve healthier behaviors in healthier environments. In this article, we describe new sources of big data in population health, explore their applications, and present two case studies illustrating how big data can be leveraged for prevention. We also discuss the many implementation obstacles that must be overcome before this vision can become a reality.”

Index: The Data Universe


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe and was originally published in 2013.

  • How much data exists in the digital universe as of 2012: 2.7 zetabytes*
  • Increase in the quantity of Internet data from 2005 to 2012: +1,696%
  • Percent of the world’s data created in the last two years: 90
  • Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
  • Percent of the digital universe in 2005 created by the U.S. and western Europe vs. emerging markets: 48 vs. 20
  • Percent of the digital universe in 2012 created by emerging markets: 36
  • Percent of the digital universe in 2020 predicted to be created by China alone: 21
  • How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
  • Percent of which enterprises have liability or responsibility for (copyright, privacy, compliance with regulations, etc.): 80
  • Amount included in the Obama Administration’s 2-12 Big Data initiative: over $200 million
  • Amount the Department of Defense is investing annually on Big Data projects as of 2012: over $250 million
  • Data created per day in 2012: 2.5 quintillion bytes
  • How many terabytes* of data collected by the U.S. Library of Congress as of April 2011: 235
  • How many terabytes of data collected by Walmart per hour as of 2012: 2,560, or 2.5 petabytes*
  • Projected growth in global data generated per year, as of 2011: 40%
  • Number of IT jobs created globally by 2015 to support big data: 4.4 million (1.9 million in the U.S.)
  • Potential shortage of data scientists in the U.S. alone predicted for 2018: 140,000-190,000, in addition to 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
  • Time needed to sequence the complete human genome (analyzing 3 billion base pairs) in 2003: ten years
  • Time needed in 2013: one week
  • The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
  • Projected amount of digital information created annually that will either live in or pass through the cloud: 1/3
  • Increase in data collection volume year-over-year in 2012: 400%
  • Increase in number of individual data collectors from 2011 to 2012: nearly double (over 300 data collection parties in 2012)

*1 zetabyte = 1 billion terabytes | 1 petabyte = 1,000 terabytes | 1 terabyte = 1,000 gigabytes | 1 gigabyte = 1 billion bytes

Sources

Collaboration In Biology's Century


Todd Sherer, Chief Executive Officer of The Michael J. Fox Foundation for Parkinson’s Research, in Forbes: “he problem is, we all still work in a system that feeds on secrecy and competition. It’s hard enough work just to dream up win/win collaborative structures; getting them off the ground can feel like pushing a boulder up a hill. Yet there is no doubt that the realities of today’s research environment — everything from the accumulation of big data to the ever-shrinking availability of funds — demand new models for collaboration. Call it “collaboration 2.0.”…I share a few recent examples in the hope of increasing the reach of these initiatives, inspiring others like them, and encouraging frank commentary on how they’re working.
Open-Access Data
The successes of collaborations in the traditional sense, coupled with advanced techniques such as genomic sequencing, have yielded masses of data. Consortia of clinical sites around the world are working together to collect and characterize data and biospecimens through standardized methods, leading to ever-larger pools — more like Great Lakes — of data. Study investigators draw their own conclusions, but there is so much more to discover than any individual lab has the bandwidth for….
Crowdsourcing
A great way to grow engagement with resources you’re willing to share? Ask for it. Collaboration 2.0 casts a wide net. We dipped our toe in the crowdsourcing waters earlier this year with our Parkinson’s Data Challenge, which asked anyone interested to download a set of data that had been collected from PD patients and controls using smart phones. …
Cross-Disciplinary Collaboration 2.0
The more we uncover about the interconnectedness and complexity of the human system, the more proof we are gathering that findings and treatments for one disease may provide invaluable insights for others. We’ve seen some really intriguing crosstalk between the Parkinson’s and Alzheimer’s disease research communities recently…
The results should be: More ideas. More discovery. Better health.”
 
 
 

Five myths about big data


Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”

Searching Big Data for ‘Digital Smoke Signals’


Steve Lohr in the New York Times: “It is the base camp of the United Nations Global Pulse team — a tiny unit inside an institution known for its sprawling bureaucracy, not its entrepreneurial hustle. Still, the focus is on harnessing technology in new ways — using data from social networks, blogs, cellphones and online commerce to transform economic development and humanitarian aid in poorer nations….

The efforts by Global Pulse and a growing collection of scientists at universities, companies and nonprofit groups have been given the label “Big Data for development.” It is a field of great opportunity and challenge. The goal, the scientists involved agree, is to bring real-time monitoring and prediction to development and aid programs. Projects and policies, they say, can move faster, adapt to changing circumstances and be more effective, helping to lift more communities out of poverty and even save lives.

Research by Global Pulse and other groups, for example, has found that analyzing Twitter messages can give an early warning of a spike in unemployment, price rises and disease. Such “digital smoke signals of distress,” Mr. Kirkpatrick said, usually come months before official statistics — and in many developing countries today, there are no reliable statistics.

Finding the signals requires data, though, and much of the most valuable data is held by private companies, especially mobile phone operators, whose networks carry text messages, digital-cash transactions and location data. So persuading telecommunications operators, and the governments that regulate and sometimes own them, to release some of the data is a top task for the group. To analyze the data, the groups apply tools now most widely used for pinpointing customers with online advertising.”

Big data, crowdsourcing and machine learning tackle Parkinson’s


Successful Workingplace: “Parkinson’s is a very tough disease to fight. People suffering from the disease often have significant tremors that keep them from being able to create accurate records of their daily challenges. Without this information, doctors are unable to fine tune drug dosages and other treatment regimens that can significantly improve the lives of sufferers.
It was a perfect catch-22 situation until recently, when the Michael J. Fox Foundation announced that LIONsolver, a company specializing in machine learning software, was able to differentiate Parkinson’s patients from healthy individuals and to also show the trend in symptoms of the disease over time.
To set up the competition, the Foundation worked with Kaggle, an organization that specializes in crowdsourced big data analysis competitions. The use of crowdsourcing as a way to get to the heart of very difficult Big Data problems works by allowing people the world over from a myriad of backgrounds and with diverse experiences to devote time on personally chosen challenges where they can bring the most value. It’s a genius idea for bringing some of the scarcest resources together with the most intractable problems.”
 

Orwell is drowning in data: the volume problem


Dom Shaw in OpenDemocracy: “During World War II, whilst Bletchley Park laboured in the front line of code breaking, the British Government was employing vast numbers of female operatives to monitor and report on telephone, mail and telegraph communications in and out of the country.
The biggest problem, of course, was volume. Without even the most primitive algorithm to detect key phrases that later were to cause such paranoia amongst the sixties and seventies counterculture, causing a whole generation of drug users to use a wholly unnecessary set of telephone synonyms for their desired substance, the army of women stationed in exchanges around the country was driven to report everything and then pass it on up to those whose job it was to analyse such content for significance.
Orwell’s vision of Big Brother’s omniscience was based upon the same model – vast armies of Winston Smiths monitoring data to ensure discipline and control. He saw a culture of betrayal where every citizen was held accountable for their fellow citizens’ political and moral conformity.
Up until the US Government’s Big Data Research and Development Initiative [12] and the NSA development of the Prism programme [13], the fault lines always lay in the technology used to collate or collect and the inefficiency or competing interests of the corporate systems and processes that interpreted the information. Not for the first time, the bureaucracy was the citizen’s best bulwark against intrusion.
Now that the algorithms have become more complex and the technology tilted towards passive surveillance through automation, the volume problem becomes less of an obstacle….
The technology for obtaining this information, and indeed the administration of it, is handled by corporations. The Government, driven by the creed that suggests private companies are better administrators than civil servants, has auctioned off the job to a dozen or more favoured corporate giants who are, as always, beholden not only to their shareholders, but to their patrons within the government itself….
The only problem the state had was managing the scale of the information gleaned from so many people in so many forms. Not any more. The volume problem has been overcome.”