Big Data and Disease Prevention: From Quantified Self to Quantified Communities


New Paper by Meredith A. Barrett, Olivier Humblet, Robert A. Hiatt, and Nancy E. Adler: “Big data is often discussed in the context of improving medical care, but it also has a less appreciated but equally important role to play in preventing disease. Big data can facilitate action on the modifiable risk factors that contribute to a large fraction of the chronic disease burden, such as physical activity, diet, tobacco use, and exposure to pollution. It can do so by facilitating the discovery of risk factors for disease at population, subpopulation, and individual levels, and by improving the effectiveness of interventions to help people achieve healthier behaviors in healthier environments. In this article, we describe new sources of big data in population health, explore their applications, and present two case studies illustrating how big data can be leveraged for prevention. We also discuss the many implementation obstacles that must be overcome before this vision can become a reality.”

A promising phenomenon of open data: A case study of the Chicago open data project


Paper by Maxat Kassen in Government Information Quarterly: “This article presents a case study of the open data project in the Chicago area. The main purpose of the research is to explore empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing. Today the main challenge in realization of any e-government projects is a traditional top–down administrative mechanism of their realization itself practically without any input from members of the civil society. In this respect, the author of the article argues that the open data concept realized at the local level may provide a real platform for promotion of proactive civic engagement. By harnessing collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner. Open data-driven projects that focused on visualization of environmental issues, mapping of utility management, evaluating of political lobbying, social benefits, closing digital divide, etc. are only some examples of such perspectives. These projects are perhaps harbingers of a new political reality where interactions among citizens at the local level will play an more important role than communication between civil society and government due to the empowering potential of the open data concept.”

How X Prize Contestants Will Hunt Down The Health Sensors Of The Future


Ariel Schwartz in Co.Exist: “The $10 million Qualcomm Tricorder X Prize asks entrants to perform an incredibly difficult feat: accurately diagnose 15 diseases in 30 patients in three days using only a mobile platform. To do that, competing teams need to have access to sophisticated sensors and related software.
Some of those sensors may be found among the finalists of the $2.25 million Nokia Sensing XCHALLENGE, a set of two consecutive competitions that challenges teams to advance sensing technology for gathering data about human health and the environment. The finalists for the first challenge, announced in early August, are diverse, though they do share one common trait: They’re all lab-on-a-chip technologies. “They’re small enough to be body wearable and programmable, but they use different methods,” says Mark Winter, senior director of the Nokia Sensing XCHALLENGE.”

The Global Database of Events, Language, and Tone (GDELT)


“The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level globally, to make all of this data freely available for open research, and to provide daily updates to create the first “realtime social sciences earth observatory.” Nearly a quarter-billion georeferenced events capture global behavior in more than 300 categories covering 1979 to present with daily updates.GDELT is designed to help support new theories and descriptive understandings of the behaviors and driving forces of global-scale social systems from the micro-level of the individual through the macro-level of the entire planet by offering realtime synthesis of global societal-scale behavior into a rich quantitative database allowing realtime monitoring and analytical exploration of those trends.
GDELT’s goal is to help uncover previously-obscured spatial, temporal, and perceptual evolutionary trends through new forms of analysis of the vast textual repositories that capture global societal activity, from news and social media archives to knowledge repositories.”

Index: The Data Universe


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe and was originally published in 2013.

  • How much data exists in the digital universe as of 2012: 2.7 zetabytes*
  • Increase in the quantity of Internet data from 2005 to 2012: +1,696%
  • Percent of the world’s data created in the last two years: 90
  • Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
  • Percent of the digital universe in 2005 created by the U.S. and western Europe vs. emerging markets: 48 vs. 20
  • Percent of the digital universe in 2012 created by emerging markets: 36
  • Percent of the digital universe in 2020 predicted to be created by China alone: 21
  • How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
  • Percent of which enterprises have liability or responsibility for (copyright, privacy, compliance with regulations, etc.): 80
  • Amount included in the Obama Administration’s 2-12 Big Data initiative: over $200 million
  • Amount the Department of Defense is investing annually on Big Data projects as of 2012: over $250 million
  • Data created per day in 2012: 2.5 quintillion bytes
  • How many terabytes* of data collected by the U.S. Library of Congress as of April 2011: 235
  • How many terabytes of data collected by Walmart per hour as of 2012: 2,560, or 2.5 petabytes*
  • Projected growth in global data generated per year, as of 2011: 40%
  • Number of IT jobs created globally by 2015 to support big data: 4.4 million (1.9 million in the U.S.)
  • Potential shortage of data scientists in the U.S. alone predicted for 2018: 140,000-190,000, in addition to 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
  • Time needed to sequence the complete human genome (analyzing 3 billion base pairs) in 2003: ten years
  • Time needed in 2013: one week
  • The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
  • Projected amount of digital information created annually that will either live in or pass through the cloud: 1/3
  • Increase in data collection volume year-over-year in 2012: 400%
  • Increase in number of individual data collectors from 2011 to 2012: nearly double (over 300 data collection parties in 2012)

*1 zetabyte = 1 billion terabytes | 1 petabyte = 1,000 terabytes | 1 terabyte = 1,000 gigabytes | 1 gigabyte = 1 billion bytes

Sources

A Modern Approach to Open Data


at the Sunlight Foundation blog: “Last year, a group of us who work daily with open government data — Josh Tauberer of GovTrack.us, Derek Willis at The New York Times, and myself — decided to stop each building the same basic tools over and over, and start building a foundation we could share.
noun_project_15212
We set up a small home at github.com/unitedstates, and kicked it off with a couple of projects to gather data on the people and work of Congress. Using a mix of automation and curation, they gather basic information from all over the government — THOMAS.gov, the House and Senate, the Congressional Bioguide, GPO’s FDSys, and others — that everyone needs to report, analyze, or build nearly anything to do with Congress.
Once we centralized this work and started maintaining it publicly, we began getting contributions nearly immediately. People educated us on identifiers, fixed typos, and gathered new data. Chris Wilson built an impressive interactive visualization of the Senate’s budget amendments by extending our collector to find and link the text of amendments.
This is an unusual, and occasionally chaotic, model for an open data project. github.com/unitedstates is a neutral space; GitHub’s permissions system allows many of us to share the keys, so no one person or institution controls it. What this means is that while we all benefit from each other’s work, no one is dependent or “downstream” from anyone else. It’s a shared commons in the public domain.
There are a few principles that have helped make the unitedstates project something that’s worth our time:…”

Is Connectivity A Human Right?


Mark Zuckerberg (Facebook): For almost ten years, Facebook has been on a mission to make the world more open and connected. Today we connect more than 1.15 billion people each month, but as we started thinking about connecting the next 5 billion, we realized something important: the vast majority of people in the world don’t have access to the internet.
Today, only 2.7 billion people are online — a little more than one third of the world. That is growing by less than 9% each year, but that’s slow considering how early we are in the internet’s development. Even though projections show most people will get smartphones in the next decade, most people still won’t have data access because the cost of data remains much more expensive than the price of a smartphone.
Below, I’ll share a rough proposal for how we can connect the next 5 billion people, and a rough plan to work together as an industry to get there. We’ll discuss how we can make internet access more affordable by making it more efficient to deliver data, how we can use less data by improving the efficiency of the apps we build and how we can help businesses drive internet access by developing a new model to get people online.
I call this a “rough plan” because, like many long term technology projects, we expect the details to evolve. It may be possible to achieve more than we lay out here, but it may also be more challenging than we predict. The specific technical work will evolve as people contribute better ideas, and we welcome all feedback on how to improve this.
Connecting the world is one of the greatest challenges of our generation. This is just one small step toward achieving that goal. I’m excited to work together to make this a reality.
For the full version, click here.

White House Expands Guidance on Promoting Open Data


NextGov: “White House officials have announced expanded technical guidance to help agencies make more data accessible to the public in machine-readable formats.
Following up on President Obama’s May executive order linking the pursuit of open data to economic growth, innovation and government efficiency, two budget and science office spokesmen on Friday published a blog post highlighting new instructions and answers to frequently asked questions.
Nick Sinai, deputy chief technology officer at the Office of Science and Technology Policy, and Dominic Sale, supervisory policy analyst at the Office of Management and Budget, noted that the policy now in place means that all “newly generated government data will be required to be made available in open, machine-readable formats, greatly enhancing their accessibility and usefulness, while ensuring privacy and security.”

Strengthening Local Capacity for Data-Driven Decisionmaking


A report by the National Neighborhood Indicators Partnership (NNIP): “A large share of public decisions that shape the fundamental character of American life are made at the local level; for example, decisions about controlling crime, maintaining housing quality, targeting social services, revitalizing low-income neighborhoods, allocating health care, and deploying early childhood programs. Enormous benefits would be gained if a much larger share of these decisions were based on sound data and analysis.
In the mid-1990s, a movement began to address the need for data for local decisionmaking.Civic leaders in several cities funded local groups to start assembling neighborhood and address-level data from multiple local agencies. For the first time, it became possible to track changing neighborhood conditions, using a variety of indicators, year by year between censuses. These new data intermediaries pledged to use their data in practical ways to support policymaking and community building and give priority to the interests of distressed neighborhoods. Their theme was “democratizing data,” which in practice meant making the data accessible to residents and community groups (Sawicki and Craig 1996).

The initial groups that took on this work formed the National Neighborhood Indicators Partnership (NNIP) to further develop these capacities and spread them to other cities. By 2012, NNIP partners were established in 37 cities, and similar capacities were in development in a number of others. The Urban Institute (UI) serves as the secretariat for the network. This report documents a strategic planning process undertaken by NNIP in 2012 and early 2013. The network’s leadership and funders re-examined the NNIP model in the context of 15 years of local partner experiences and the dramatic changes in technology and policy approaches that have occurred over that period. The first three sections explain NNIP functions and institutional structures and examine the potential role for NNIP in advancing the community information field in today’s environment.”

Do you want to live in a smart city?


Jane Wakefield from BBC News: “In the future everything in a city, from the electricity grid, to the sewer pipes to roads, buildings and cars will be connected to the network. Buildings will turn off the lights for you, self-driving cars will find you that sought-after parking space, even the rubbish bins will be smart. But how do we get to this smarter future. Who will be monitoring and controlling the sensors that will increasingly be on every building, lamp-post and pipe in the city?…
There is another chapter in the smart city story – and this one is being written by citizens, who are using apps, DIY sensors, smartphones and the web to solve the city problems that matter to them.
Don’t Flush Me is a neat little DIY sensor and app which is single-handedly helping to solve one of New York’s biggest water issues.
Every time there is heavy rain in the city, raw sewage is pumped into the harbour, at a rate of 27 billion gallons each year.
Using an Arduino processor, a sensor which measures water levels in the sewer overflows and a smart phone app, Don’t Flush Me lets people know when it is ‘safe to flush’.
Meanwhile Egg, a community-led sensor network, is alerting people to an often hidden problem in our cities.
Researchers estimate that two million people die each year as a result of air pollution and as cities get more over-crowded, the problem is likely to get worse.
Egg is compiling data about air quality by selling cheap sensor which people put outside their homes where they collect readings of green gases, nitrogen oxide (NO2) and carbon monoxide (CO)….
The reality is that most smart city projects are currently pretty small scale – creating tech hubs or green areas of the city, experimenting with smart electricity grids or introducing electric buses or bike-sharing schemes.”