New report from the National Academy of Sciences: “Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data.
Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale–terabytes and petabytes–is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge–from computer science, statistics, machine learning, and application disciplines–that must be brought to bear to make useful inferences from massive data.”
Understanding the impact of releasing and re-using open government data
New Report by the European Public Sector Information Platform: “While there has been a proliferation of open data portals and data re-using tools and applications of tremendous speed in the last decade, research and understanding about the impact of opening up public sector information and open government data (OGD hereinafter) has been lacking behind.
Until now, there have been some research efforts to structure the concept of the impact of OGD suggesting various theories of change, their measuring methodologies or in some cases, concrete calculations as to what financial benefits opening government data brings on a table. For instance, the European Commission conducted a study on pricing of public sector information, which attempted evaluating direct and indirect economic impact of opening public data and identified key indicators to monitor the effects of open data portals. Also, Open Data Research Network issued a background report in April 2012 suggesting a general framework of key indicators to measure the impact of open data initiatives both on a provision and re-use stages.
Building on the research efforts up to date, this report will reflect upon the main types of impacts OGD may have and will also present key measuring frameworks to observe the change OGD initiatives may bring about.”
Connecting Grassroots to Government for Disaster Management
New Report by the Commons Lab (Wilson Center): “The growing use of social media and other mass collaboration technologies is opening up new opportunities in disaster management efforts, but is also creating new challenges for policymakers looking to incorporate these tools into existing frameworks, according to our latest report.
The Commons Lab, part of the Wilson Center’s Science & Technology Innovation Program, hosted a September 2012 workshop bringing together emergency responders, crisis mappers, researchers, and software programmers to discuss issues surrounding the adoption of these new technologies.
We are now proud to unveil “Connecting Grassroots to Government for Disaster Management: Workshop Summary,” a report discussing the key findings, policy suggestions, and success stories that emerged during the workshop. The report’s release coincides with the tenth annual Disaster Preparedness Month, sponsored by the Federal Emergency Management Agency in the Department of Homeland Security to help educate the public about preparing for emergencies. The report can be downloaded here.”
Open data for accountable governance: Is data literacy the key to citizen engagement?
Camilla Monckton at UNDP’s Voices of Eurasia blog: “How can technology connect citizens with governments, and how can we foster, harness, and sustain the citizen engagement that is so essential to anti-corruption efforts?
UNDP has worked on a number of projects that use technology to make it easier for citizens to report corruption to authorities:
- Serbia’s SMS corruption reporting in the health sector
- Montenegro’s ‘be responsible app’
- Kosovo’s online corruption reporting site kallxo.com
These projects are showing some promising results, and provide insights into how a more participatory, interactive government could develop.
At the heart of the projects is the ability to use citizen generated data to identify and report problems for governments to address….
Wanted: Citizen experts
As Kenneth Cukier, The Economist’s Data Editor, has discussed, data literacy will become the new computer literacy. Big data is still nascent and it is impossible to predict exactly how it will affect society as a whole. What we do know is that it is here to stay and data literacy will be integral to our lives.
It is essential that we understand how to interact with big data and the possibilities it holds.
Data literacy needs to be integrated into the education system. Educating non-experts to analyze data is critical to enabling broad participation in this new data age.
As technology advances, key government functions become automated, and government data sharing increases, newer ways for citizens to engage will multiply.
Technology changes rapidly, but the human mind and societal habits cannot. After years of closed government and bureaucratic inefficiency, adaptation of a new approach to governance will take time and education.
We need to bring up a generation that sees being involved in government decisions as normal, and that views participatory government as a right, not an ‘innovative’ service extended by governments.
What now?
In the meantime, while data literacy lies in the hands of a few, we must continue to connect those who have the technological skills with citizen experts seeking to change their communities for the better – as has been done in many a Social Innovation Camps recently (in Montenegro, Ukraine and Armenia at Mardamej and Mardamej Relaoded and across the region at Hurilab).
The social innovation camp and hackathon models are an increasingly debated topic (covered by Susannah Vila, David Eaves, Alex Howard and Clay Johnson).
On the whole, evaluations are leading to newer models that focus on greater integration of mentorship to increase sustainability – which I readily support. However, I do have one comment:
Social innovation camps are often criticized for a lack of sustainability – a claim based on the limited number of apps that go beyond the prototype phase. I find a certain sense of irony in this, for isn’t this what innovation is about: Opening oneself up to the risk of failure in the hope of striking something great?
In the words of Vinod Khosla:
“No failure means no risk, which means nothing new.”
As more data is released, the opportunity for new apps and new ways for citizen interaction will multiply and, who knows, someone might come along and transform government just as TripAdvisor transformed the travel industry.”
Fighting for Reliable Evidence
New book by Judy Gueron and Howard Rolston: “Once primarily used in medical clinical trials, random assignment experimentation is now accepted among social scientists across a broad range of disciplines. The technique has been used in social experiments to evaluate a variety of programs, from microfinance and welfare reform to housing vouchers and teaching methods. How did randomized experiments move beyond medicine and into the social sciences, and can they be used effectively to evaluate complex social problems? Fighting for Reliable Evidence provides an absorbing historical account of the characters and controversies that have propelled the wider use of random assignment in social policy research over the past forty years.
Drawing from their extensive experience evaluating welfare reform programs, noted scholar practitioners Judith M. Gueron and Howard Rolston portray randomized experiments as a vital research tool to assess the impact of social policy. In a random assignment experiment, participants are sorted into either a treatment group that participates in a particular program, or a control group that does not. Because the groups are randomly selected, they do not differ from one another systematically. Therefore any subsequent differences between the groups can be attributed to the influence of the program or policy. The theory is elegant and persuasive, but many scholars worry that such an experiment is too difficult or expensive to implement in the real world. Can a control group be truly insulated from the treatment policy? Would staffers comply with the random allocation of participants? Would the findings matter?”
Public Open Data: The Good, the Bad, the Future
Camille Crittenden at IDEALAB: “Some of the most powerful tools combine official public data with social media or other citizen input, such as the recent partnership between Yelp and the public health departments in New York and San Francisco for restaurant hygiene inspection ratings. In other contexts, such tools can help uncover and ultimately reduce corruption by making it easier to “follow the money.”
Despite the opportunities offered by “free data,” this trend also raises new challenges and concerns, among them, personal privacy and security. While attention has been devoted to the unsettling power of big data analysis and “predictive analytics” for corporate marketing, similar questions could be asked about the value of public data. Does it contribute to community cohesion that I can find out with a single query how much my neighbors paid for their house or (if employed by public agencies) their salaries? Indeed, some studies suggest that greater transparency leads not to greater trust in government but to resignation and apathy.
Exposing certain law enforcement data also increases the possibility of vigilantism. California law requires the registration and publication of the home addresses of known sex offenders, for instance. Or consider the controversy and online threats that erupted when, shortly after the Newtown tragedy, a newspaper in New York posted an interactive map of gun permit owners in nearby counties.
…Policymakers and officials must still mind the “big data gap.”So what does the future hold for open data? Publishing data is only one part of the information ecosystem. To be useful, tools must be developed for cleaning, sorting, analyzing and visualizing it as well. …
For-profit companies and non-profit watchdog organizations will continue to emerge and expand, building on the foundation of this data flood. Public-private partnerships such as those between San Francisco and Appallicious or Granicus, startups created by Code for America’s Incubator, and non-partisan organizations like the Sunlight Foundation and MapLight rely on public data repositories for their innovative applications and analysis.
Making public data more accessible is an important goal and offers enormous potential to increase civic engagement. To make the most effective and equitable use of this resource for the public good, cities and other government entities should invest in the personnel and equipment — hardware and software — to make it universally accessible. At the same time, Chief Data Officers (or equivalent roles) should also be alert to the often hidden challenges of equity, inclusion, privacy, and security.”
Innovating to Improve Disaster Response and Recovery
Todd Park at OSTP blog: “Last week, the White House Office of Science and Technology Policy (OSTP) and the Federal Emergency Management Agency (FEMA) jointly challenged a group of over 80 top innovators from around the country to come up with ways to improve disaster response and recovery efforts. This diverse group of stakeholders, consisting of representatives from Zappos, Airbnb, Marriott International, the Parsons School of Design, AOL/Huffington Post’s Social Impact, The Weather Channel, Twitter, Topix.com, Twilio, New York City, Google and the Red Cross, to name a few, spent an entire day at the White House collaborating on ideas for tools, products, services, programs, and apps that can assist disaster survivors and communities…
During the “Data Jam/Think Tank,” we discussed response and recovery challenges…Below are some of the ideas that were developed throughout the day. In the case of the first two ideas, participants wrote code and created actual working prototypes.
- A real-time communications platform that allows survivors dependent on electricity-powered medical devices to text or call in their needs—such as batteries, medication, or a power generator—and connect those needs with a collaborative transportation network to make real-time deliveries.
- A technical schema that tags all disaster-related information from social media and news sites – enabling municipalities and first responders to better understand all of the invaluable information generated during a disaster and help identify where they can help.
- A Disaster Relief Innovation Vendor Engine (DRIVE) which aggregates pre-approved vendors for disaster-related needs, including transportation, power, housing, and medical supplies, to make it as easy as possible to find scarce local resources.
- A crowdfunding platform for small businesses and others to receive access to capital to help rebuild after a disaster, including a rating system that encourages rebuilding efforts that improve the community.
- Promoting preparedness through talk shows, working closely with celebrities, musicians, and children to raise awareness.
- A “community power-go-round” that, like a merry-go-round, can be pushed to generate electricity and additional power for battery-charged devices including cell phones or a Wi-Fi network to provide community internet access.
- Aggregating crowdsourced imagery taken and shared through social media sites to help identify where trees have fallen, electrical lines have been toppled, and streets have been obstructed.
- A kid-run local radio station used to educate youth about preparedness for a disaster and activated to support relief efforts during a disaster that allows youth to share their experiences.”
New! Humanitarian Computing Library
Patrick Meier at iRevolution: “The field of “Humanitarian Computing” applies Human Computing and Machine Computing to address major information-based challengers in the humanitarian space. Human Computing refers to crowdsourcing and microtasking, which is also referred to as crowd computing. In contrast, Machine Computing draws on natural language processing and machine learning, amongst other disciplines. The Next Generation Humanitarian Technologies we are prototyping at QCRI are powered by Humanitarian Computing research and development (R&D).
My QCRI colleagues and I just launched the first ever Humanitarian Computing Library which is publicly available here. The purpose of this library, or wiki, is to consolidate existing and future research that relate to Humanitarian Computing in order to support the development of next generation humanitarian tech. The repository currently holds over 500 publications that span topics such as Crisis Management, Trust and Security, Software and Tools, Geographical Analysis and Crowdsourcing. These publications are largely drawn from (but not limited to) peer-reviewed papers submitted at leading conferences around the world. We invite you to add your own research on humanitarian computing to this growing collection of resources.”
How Mechanical Turkers Crowdsourced a Huge Lexicon of Links Between Words and Emotion
The Physics arXiv Blog: Sentiment analysis on the social web depends on how a person’s state of mind is expressed in words. Now a new database of the links between words and emotions could provide a better foundation for this kind of analysis

One of the buzzphrases associated with the social web is sentiment analysis. This is the ability to determine a person’s opinion or state of mind by analysing the words they post on Twitter, Facebook or some other medium.
Much has been promised with this method—the ability to measure satisfaction with politicians, movies and products; the ability to better manage customer relations; the ability to create dialogue for emotion-aware games; the ability to measure the flow of emotion in novels; and so on.
The idea is to entirely automate this process—to analyse the firehose of words produced by social websites using advanced data mining techniques to gauge sentiment on a vast scale.
But all this depends on how well we understand the emotion and polarity (whether negative or positive) that people associate with each word or combinations of words.
Today, Saif Mohammad and Peter Turney at the National Research Council Canada in Ottawa unveil a huge database of words and their associated emotions and polarity, which they have assembled quickly and inexpensively using Amazon’s crowdsourcing Mechanical Turk website. They say this crowdsourcing mechanism makes it possible to increase the size and quality of the database quickly and easily….The result is a comprehensive word-emotion lexicon for over 10,000 words or two-word phrases which they call EmoLex….
The bottom line is that sentiment analysis can only ever be as good as the database on which it relies. With EmoLex, analysts have a new tool for their box of tricks.”
Ref: arxiv.org/abs/1308.6297: Crowdsourcing a Word-Emotion Association Lexicon
From Collaborative Coding to Wedding Invitations: GitHub Is Going Mainstream
Wired: “With 3.4 million users, the five-year-old site is a runaway hit in the hacker community, the go-to place for coders to show off pet projects and crowdsource any improvements. But the company has grander ambitions: It wants to change the way people work. It’s starting with software developers for sure, but maybe one day anyone who edits text in one form or another — lawyers, writers, and civil servants — will do it the GitHub way.
To first-time visitors, GitHub looks like a twisted version of Facebook, built in some alternate universe where YouTube videos and photos of cats have somehow morphed into snippets of code. But many of the underlying concepts are the same. You can “follow” other hackers to see what they’re working on. You can comment on their code — much like you’d do on a Facebook photo. You can even “star” a project to show that you like it, just as you’d “favorite” something on Twitter.
But it’s much more than a social network. People discover new projects and then play around with them, making changes, trying out new ideas. Then, with the push of a button, they merge into something better. You can also “fork” projects. That’s GitHub lingo for then when you make a copy of a project so you can then build and modify your own, independent version.
People didn’t just suggest changes to Lee’s Twitter patent license. It was forked 53 times: by Arul, by a computer science student in Portland, by a Belgian bicycle designer. These forks can now evolve and potentially even merge back into Lee’s agreement. The experiment also inspired Fenwick & West, one of Silicon Valley’s top legal firms (and GitHub’s law firm) to post 30 pages of standard documents for startups to GitHub earlier this year.”