White House Unveils Big Data Projects, Round Two


Information Week: “The White House Office of Science and Technology Policy (OSTP) and Networking and Information Technology R&D program (NITRD) on Tuesday introduced a slew of new big-data collaboration projects aimed at stimulating private-sector interest in federal data. The initiatives, announced at the White House-sponsored “Data to Knowledge to Action” event, are targeted at fields as varied as medical research, geointelligence, economics, and linguistics.
The new projects are a continuation of the Obama Administration’s Big Data Initiative, announced in March 2012, when the first round of big-data projects was presented.
Thomas Kalil, OSTP’s deputy director for technology and innovation, said that “dozens of new partnerships — more than 90 organizations,” are pursuing these new collaborative projects, including many of the best-known American technology, pharmaceutical, and research companies.
Among the initiatives, Amazon Web Services (AWS) and NASA have set up the NASA Earth eXchange, or NEX, a collaborative network to provide space-based data about our planet to researchers in Earth science. AWS will host much of NASA’s Earth-observation data as an AWS Public Data Set, making it possible, for instance, to crowdsource research projects.
An estimated 4.4 million jobs are being created between now and 2015 to support big-data projects. Employers, educational institutions, and government agencies are working to build the educational infrastructure to provide students with the skills they need to fill those jobs.
To help train new workers, IBM, for instance, has created a new assessment tool that gives university students feedback on their readiness for number-crunching careers in both the public and private sector. Eight universities that have a big data and analytics curriculum — Fordham, George Washington, Illinois Institute of Technology, University of Massachusetts-Boston, Northwestern, Ohio State, Southern Methodist, and the University of Virginia — will receive the assessment tool.
OSTP is organizing an initiative to create a “weather service” for pandemics, Kalil said, a way to use big data to identify and predict pandemics as early as possible in order to plan and prepare for — and hopefully mitigate — their effects.
The National Institutes of Health (NIH), meanwhile, is undertaking its ” Big Data to Knowledge” (BD2K) initiative to develop a range of standards, tools, software, and other approaches to make use of massive amounts of data being generated by the health and medical research community….”
See also:
November 12, 2013 – Fact Sheet: Progress by Federal Agencies: Data to Knowledge to Action
November 12, 2013 – Fact Sheet: New Announcements: Data to Knowledge to Action
November 12, 2013 – Press Release: Data to Knowledge to Action Event

What future do you want? Commission invites votes on what Europe could look like in 2050 to help steer future policy and research planning


European Commission – MEMO: “Vice-President Neelie Kroes, responsible for the Digital Agenda, is inviting people to join a voting and ranking process on 11 visions of what the world could look like in 20-40 years. The Commission is seeking views on living and learning, leisure and working in Europe in 2050, to steer long-term policy or research planning.
The visions have been gathered over the past year through the Futurium, an online debate platform that allows policymakers to not only consult citizens, but to collaborate and “co-create” with them, and at events throughout Europe. Thousands of thinkers – from high school students, to the Erasmus Students Network; from entrepreneurs and internet pioneers to philosophers and university professors, have engaged in a collective inquiry – a means of crowd-sourcing what our future world could look like.
Eleven over-arching themes have been drawn together from more than 200 ideas for the future. From today, everyone is invited to join the debate and offer their rating and rankings of the various ideas. The results of the feedback will help the European Commission make better decisions about how to fund projects and ideas that both shape the future and get Europe ready for that future….
The Futurium is a foresight project run by DG CONNECT, based on an open source approach. It develops visions of society, technologies, attitudes and trends in 2040-2050 and use these, for example as potential blueprints for future policy choices or EU research and innovation funding priorities.
It is an online platform developed to capture emerging trends and enable interested citizens to co-create compelling visions of the futures that matter to them.

This crowd-sourcing approach provides useful insights on:

  1. vision: where people want to go, how desirable and likely are the visions posted on the platform;
  2. policy ideas: what should ideally be done to realise the futures; the possible impacts and plausibility of policy ideas;
  3. evidence: scientific and other evidence to support the visions and policy ideas.

….
Connecting policy making to people: in an increasingly connected society, online outreach and engagement is an essential response to the growing demand for participation, helping to capture new ideas and to broaden the legitimacy of the policy making process (IP/10/1296). The Futurium is an early prototype of a more general policy-making model described in the paper “The Futurium—a Foresight Platform for Evidence-Based and Participatory Policymaking“.

The Futurium was developed to lay the groundwork for future policy proposals which could be considered by the European Parliament and the European Commission under their new mandates as of 2014. But the Futurium’s open, flexible architecture makes it easily adaptable to any policy-making context, where thinking ahead, stakeholder participation and scientific evidence are needed.”

The GovLab Academy: A Community and Platform for Learning and Teaching Governance Innovations


Press Release: “Today the Governance Lab (The GovLab) launches The GovLab Academy at the Open Government Partnership Annual Meeting in London.
Available at www.thegovlabacademy.org, the Academy is a free online community for those wanting to teach and learn how to solve public problems and improve lives using innovations in governance. A partnership between The GovLab  at New York University and MIT Media Lab’s Online Learning Initiative, the site launching today offers curated videos, podcasts, readings and activities designed to enable the purpose driven learner to deepen his or her practical knowledge at her own pace.
The GovLab Academy is funded by a grant from the John S. and James L. Knight Foundation. “The GovLab Academy addresses a growing need among policy makers at all levels – city, federal and global – to leverage advances in technology to govern differently,” says Carol Coletta, Vice President of Community and National Initiatives at the Knight Foundation.  “By connecting the latest technological innovations to a community of willing mentors, the Academy has the potential to catalyze more experimentation in a sector that badly needs it.”
Initial topics include using data to improve policymaking and cover the role of big data, urban analytics, smart disclosure and open data in governance. A second track focuses on online engagement and includes practical strategies for using crowdsourcing to solicit ideas, organize distributed work and gather data.  The site features both curated content drawn from a variety of sources and original interviews with innovators from government, civil society, the tech industry, the arts and academia talking about their work around the world implementing innovations in practice, what worked and what didn’t, to improve real people’s lives.
Beth Noveck, Founder and Director of The GovLab, describes its mission: “The Academy is an experiment in peer production where every teacher is a learner and every learner a teacher. Consistent with The GovLab’s commitment to measuring what works, we want to measure our success by the people contributing as well as consuming content. We invite everyone with ideas, stories, insights and practical wisdom to contribute to what we hope will be a thriving and diverse community for social change”.”

Big Data


Special Report on Big Data by Volta – A newsletter on Science, Technology and Society in Europe:  “Locating crime spots, or the next outbreak of a contagious disease, Big Data promises benefits for society as well as business. But more means messier. Do policy-makers know how to use this scale of data-driven decision-making in an effective way for their citizens and ensure their privacy?90% of the world’s data have been created in the last two years. Every minute, more than 100 million new emails are created, 72 hours of new video are uploaded to YouTube and Google processes more than 2 million searches. Nowadays, almost everyone walks around with a small computer in their pocket, uses the internet on a daily basis and shares photos and information with their friends, family and networks. The digital exhaust we leave behind every day contributes to an enormous amount of data produced, and at the same time leaves electronic traces that contain a great deal of personal information….
Until recently, traditional technology and analysis techniques have not been able to handle this quantity and type of data. But recent technological developments have enabled us to collect, store and process data in new ways. There seems to be no limitations, either to the volume of data or technology for storing and analyzing them. Big Data can map a driver’s sitting position to identify a car thief, it can use Google searches to predict outbreaks of the H1N1 flu virus, it can data-mine Twitter to predict the price of rice or use mobile phone top-ups to describe unemployment in Asia.
The word ‘data’ means ‘given’ in Latin. It commonly refers to a description of something that can be recorded and analyzed. While there is no clear definition of the concept of ‘Big Data’, it usually refers to the processing of huge amounts and new types of data that have not been possible with traditional tools.

‘The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way.’

The notion of Big Data is kind of misleading, argues Robindra Prabhu, a project manager at the Norwegian Board of Technology. “The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way. The digitalization of society gives us access to both ‘traditional’, structured data – like the content of a database or register – and unstructured data, for example the content in a text, pictures and videos. Information designed to be read by humans is now also readable by machines. And this development makes a whole new world of  data gathering and analysis available. Big Data is exciting not just because of the amount and variety of data out there, but that we can process data about so much more than before.”

Smart Citizens


FutureEverything: “This publication aims to shift the debate on the future of cities towards the central place of citizens, and of decentralised, open urban infrastructures. It provides a global perspective on how cities can create the policies, structures and tools to engender a more innovative and participatory society. The publication contains a series of 23 short essays representing some of the key voices developing an emerging discourse around Smart Citizens.  Contributors include:

  • Dan Hill, Smart Citizens pioneer and CEO of communications research centre and transdisciplinary studio Fabrica on why Smart Citizens Make Smart Cities.
  • Anthony Townsend, urban planner, forecaster and author of Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia on the tensions between place-making and city-making on the role of mobile technologies in changing the way that people interact with their surroundings.
  • Paul Maltby, Director of the Government Innovation Group and of the Open Data and Transparency in the UK Cabinet Office on how government can support a smarter society.
  • Aditya Dev Sood, Founder and CEO of the Center for Knowledge Societies, presents polarised hypothetical futures for India in 2025 that argues for the use of technology to bridge gaps in social inequality.
  • Adam Greenfield, New York City-based writer and urbanist, on Recuperating the Smart City.

Editors: Drew Hemment, Anthony Townsend
Download Here.

Google’s flu fail shows the problem with big data


Adam Kucharski in The Conversation: “When people talk about ‘big data’, there is an oft-quoted example: a proposed public health tool called Google Flu Trends. It has become something of a pin-up for the big data movement, but it might not be as effective as many claim.
The idea behind big data is that large amount of information can help us do things which smaller volumes cannot. Google first outlined the Flu Trends approach in a 2008 paper in the journal Nature. Rather than relying on disease surveillance used by the US Centers for Disease Control and Prevention (CDC) – such as visits to doctors and lab tests – the authors suggested it would be possible to predict epidemics through Google searches. When suffering from flu, many Americans will search for information related to their condition….
Between 2003 and 2008, flu epidemics in the US had been strongly seasonal, appearing each winter. However, in 2009, the first cases (as reported by the CDC) started in Easter. Flu Trends had already made its predictions when the CDC data was published, but it turned out that the Google model didn’t match reality. It had substantially underestimated the size of the initial outbreak.
The problem was that Flu Trends could only measure what people search for; it didn’t analyse why they were searching for those words. By removing human input, and letting the raw data do the work, the model had to make its predictions using only search queries from the previous handful of years. Although those 45 terms matched the regular seasonal outbreaks from 2003–8, they didn’t reflect the pandemic that appeared in 2009.
Six months after the pandemic started, Google – who now had the benefit of hindsight – updated their model so that it matched the 2009 CDC data. Despite these changes, the updated version of Flu Trends ran into difficulties again last winter, when it overestimated the size of the influenza epidemic in New York State. The incidents in 2009 and 2012 raised the question of how good Flu Trends is at predicting future epidemics, as opposed to merely finding patterns in past data.
In a new analysis, published in the journal PLOS Computational Biology, US researchers report that there are “substantial errors in Google Flu Trends estimates of influenza timing and intensity”. This is based on comparison of Google Flu Trends predictions and the actual epidemic data at the national, regional and local level between 2003 and 2013
Even when search behaviour was correlated with influenza cases, the model sometimes misestimated important public health metrics such as peak outbreak size and cumulative cases. The predictions were particularly wide of the mark in 2009 and 2012:

Original and updated Google Flu Trends (GFT) model compared with CDC influenza-like illness (ILI) data. PLOS Computational Biology 9:10
Click to enlarge

Although they criticised certain aspects of the Flu Trends model, the researchers think that monitoring internet search queries might yet prove valuable, especially if it were linked with other surveillance and prediction methods.
Other researchers have also suggested that other sources of digital data – from Twitter feeds to mobile phone GPS – have the potential to be useful tools for studying epidemics. As well as helping to analysing outbreaks, such methods could allow researchers to analyse human movement and the spread of public health information (or misinformation).
Although much attention has been given to web-based tools, there is another type of big data that is already having a huge impact on disease research. Genome sequencing is enabling researchers to piece together how diseases transmit and where they might come from. Sequence data can even reveal the existence of a new disease variant: earlier this week, researchers announced a new type of dengue fever virus….”

Are We Puppets in a Wired World?


Sue Halpern in The New York Review of Books: “Also not obvious was how the Web would evolve, though its open architecture virtually assured that it would. The original Web, the Web of static homepages, documents laden with “hot links,” and electronic storefronts, segued into Web 2.0, which, by providing the means for people without technical knowledge to easily share information, recast the Internet as a global social forum with sites like Facebook, Twitter, FourSquare, and Instagram.
Once that happened, people began to make aspects of their private lives public, letting others know, for example, when they were shopping at H+M and dining at Olive Garden, letting others know what they thought of the selection at that particular branch of H+M and the waitstaff at that Olive Garden, then modeling their new jeans for all to see and sharing pictures of their antipasti and lobster ravioli—to say nothing of sharing pictures of their girlfriends, babies, and drunken classmates, or chronicling life as a high-paid escort, or worrying about skin lesions or seeking a cure for insomnia or rating professors, and on and on.
The social Web celebrated, rewarded, routinized, and normalized this kind of living out loud, all the while anesthetizing many of its participants. Although they likely knew that these disclosures were funding the new information economy, they didn’t especially care…
The assumption that decisions made by machines that have assessed reams of real-world information are more accurate than those made by people, with their foibles and prejudices, may be correct generally and wrong in the particular; and for those unfortunate souls who might never commit another crime even if the algorithm says they will, there is little recourse. In any case, computers are not “neutral”; algorithms reflect the biases of their creators, which is to say that prediction cedes an awful lot of power to the algorithm creators, who are human after all. Some of the time, too, proprietary algorithms, like the ones used by Google and Twitter and Facebook, are intentionally biased to produce results that benefit the company, not the user, and some of the time algorithms can be gamed. (There is an entire industry devoted to “optimizing” Google searches, for example.)
But the real bias inherent in algorithms is that they are, by nature, reductive. They are intended to sift through complicated, seemingly discrete information and make some sort of sense of it, which is the definition of reductive.”
Books reviewed:

Open Data and Open Government: Rethinking Telecommunications Policy and Regulation


New paper by Ewan Sutherland: “While attention has been given to the uses of big data by network operators and to the provision of open data by governments, there has been no systematic attempt to re-examine the regulatory systems for telecommunications. The power of public authorities to access the big data held by operators could transform regulation by simplifying proof of bias or discrimination, making operators more susceptible to behavioural remedies, while it could also be used to deliver much finer granularity of decision making. By opening up data held by government and its agencies to enterprises, think tanks and research groups it should be possible to transform market regulation.

Smart Machines: IBM's Watson and the Era of Cognitive Computing


New book from Columbia Business School Publishing: “We are crossing a new frontier in the evolution of computing and entering the era of cognitive systems. The victory of IBM’s Watson on the television quiz show Jeopardy! revealed how scientists and engineers at IBM and elsewhere are pushing the boundaries of science and technology to create machines that sense, learn, reason, and interact with people in new ways to provide insight and advice.
In Smart Machines, John E. Kelly III, director of IBM Research, and Steve Hamm, a writer at IBM and a former business and technology journalist, introduce the fascinating world of “cognitive systems” to general audiences and provide a window into the future of computing. Cognitive systems promise to penetrate complexity and assist people and organizations in better decision making. They can help doctors evaluate and treat patients, augment the ways we see, anticipate major weather events, and contribute to smarter urban planning. Kelly and Hamm’s comprehensive perspective describes this technology inside and out and explains how it will help us conquer the harnessing and understanding of “big data,” one of the major computing challenges facing businesses and governments in the coming decades. Absorbing and impassioned, their book will inspire governments, academics, and the global tech industry to work together to power this exciting wave in innovation.”
See also Why cognitive systems?

And Data for All: On the Validity and Usefulness of Open Government Data


Paper presented at the the 13th International Conference on Knowledge Management and Knowledge Technologies: “Open Government Data (OGD) stands for a relatively young trend to make data that is collected and maintained by state authorities available for the public. Although various Austrian OGD initiatives have been started in the last few years, less is known about the validity and the usefulness of the data offered. Based on the data-set on Vienna’s stock of trees, we address two questions in this paper. First of all, we examine the quality of the data by validating it according to knowledge from a related discipline. It shows that the data-set we used correlates with findings from meteorology. Then, we explore the usefulness and exploitability of OGD by describing a concrete scenario in which this data-set can be supportive for citizens in their everyday life and by discussing further application areas in which OGD can be beneficial for different stakeholders and even commercially used.”