Springwise: “With vast amounts of data now publicly available, the answers to many questions lie buried in the numbers, and we already saw a publishing platform helping entrepreneurs visualize government data. For an organization as passionate about civic engagement asmySidewalk, this open data is a treasure trove of compelling stories.
mySidewalk was founded by city planners who recognized the potential force for change contained in local communities. Yet without a compelling reason to get involved, many individuals remain ‘interested bystanders’ — something mySidewalk is determined to change.
Using the latest available data, mySidewalk creates dashboards that are customized for every project to help local public officials make the most informed decisions possible. The dashboards present visualizations of a wide range of socioeconomic and demographic datasets, as well as provide local, regional and national comparisons, all of which help to tell the stories behind the numbers.
It is those stories that mySidewalk believes will provide enough motivation for the ‘interested bystanders’ to get involved. As it says on the mySidewalk website, “Share your ideas. Shape your community.” Organizations of all types have taken notice of the power of data, with businesses using geo-tagging to analyze social media content, and real-time information sharing helping humanitarians in crises….(More)”
Report by Juan Ortiz Freuler: “Following a promising and already well-established trend, in February 2014 the Office of the President of Mexico launched its open data portal (datos.gob.mx). This diagnostic –carried out between July and September of 2015- is designed to brief international donors and stakeholders such as members of the Open Government Partnership Steering Committee, provides the reader with contextual information to understand the state of supply and demand for open data from the portal, and the specific challenges the mexican government is facing in its quest to implement the policy. The insights offered through data processing and interviews with key stakeholders indicate the need to promote: i) A sense of ownership of datos.gob.mx by the user community, but particularly by the officials in charge of implementing the policy within each government unit; ii) The development of tools and mechanisms to increase the quality of the data provided through the portal; and iii) Civic hacking of the portal to promote innovation, and a sense of appropriation that would increase the policy’s long-term resilience to partisan and leadership change….(More)”
From crowdsourcing to nudges to open data to participatory budgeting, more open and innovative ways to tackle society’s problems and make public institutions more effective are emerging. Yet little is known about what innovations actually work, when, why, for whom and under what conditions.
And anyone seeking existing research is confronted with sources that are widely dispersed across disciplines, often locked behind pay walls, and hard to search because of the absence of established taxonomies. As the demand to confront problems in new ways grows so too does the urgency for making learning about governance innovations more accessible.
As part of GovLab’s broader effort to move from “faith-based interventions” toward more “evidence-based interventions,” OGRX curates and makes accessible the most diverse and up-to-date collection of findings on innovating governance. At launch, the site features over 350 publications spanning a diversity of governance innovation areas, including but not limited to:
Visit ogrx.org to explore the latest research findings, submit your own work for inclusion on the platform, and share knowledge with others interested in using science and technology to improve the way we govern. (More)”
Robert Epstein at Quartz: “Because I conduct research on how the Internet affects elections, journalists have lately been asking me about the primaries. Here are the two most common questions I’ve been getting:
Do Google’s search rankings affect how people vote?
How well does Google Trends predict the winner of each primary?
My answer to the first question is: Probably, but no one knows for sure. From research I have been conducting in recent years with Ronald E. Robertson, my associate at the American Institute for Behavioral Research and Technology, on the Search Engine Manipulation Effect (SEME, pronounced “seem”), we know that when higher search results make one candidate look better than another, an enormous number of votes will be driven toward the higher-ranked candidate—up to 80% of undecided voters in some demographic groups. This is partly because we have all learned to trust high-ranked search results, but it is mainly because we are lazy; search engine users generally click on just the top one or two items.
Because no one actually tracks search rankings, however—they are ephemeral and personalized, after all, which makes them virtually impossible to track—and because no whistleblowers have yet come forward from any of the search engine companies,
We cannot know for sure whether search rankings are consistently favoring one candidate or another.This means we also cannot know for sure how search rankings are affecting elections. We know the power they have to do so, but that’s it.
As for the question about Google Trends, for a while I was giving a mindless, common-sense answer: Well, I said, Google Trends tells you about search activity, and if lots more people are searching for “Donald Trump” than for “Ted Cruz” just before a primary, then more people will probably vote for Trump.
When you run the numbers, search activity seems to be a pretty good predictor of voting. On primary day in New Hampshire this year, search traffic on Google Trends was highest for Trump, followed by John Kasich, then Cruz—and so went the vote. But careful studies of the predictive power of search activity have actually gotten mixed results. A 2011 study by researchers at Wellesley College in Massachusetts, for example, found that Google Trends was a poor predictor of the outcomes of the 2008 and 2010 elections.
So much for Trends. But then I got to thinking: Why are we struggling so hard to figure out how to use Trends or tweets or shares to predict elections when Google actually knows exactly how we are going to vote. Impossible, you say? Think again….
This leaves us with two questions, one small and practical and the other big and weird.
The small, practical question is: How is Google using those numbers? Might they be sharing them with their preferred presidential candidate, for example? That is not unlawful, after all, and Google executives have taken a hands-on role in past presidential campaigns. The Wall Street Journal reported, for example, that Eric Schmidt, head of Google at that time, was personally overseeing Barack Obama’s programming team at his campaign headquarters the night before the 2012 election.
And the big, weird question is: Why are we even bothering to vote?
Voting is such a hassle—the parking, the lines, the ID checks. Maybe we should all stay home and just let Google announce the winners….(More)”
Giulio Quaggiotto at Nesta: “Over the past decade we’ve seen an explosion in the amount of data we create, with more being captured about our lives than ever before. As an industry, the public sector creates an enormous amount of information – from census data to tax data to health data. When it comes to use of the data however, despite many initiatives trying to promote open and big data for public policy as well as evidence-based policymaking, we feel there is still a long way to go.
Why is that? Data initiatives are often created under the assumption that if data is available, people (whether citizens or governments) will use it. But this hasn’t necessarily proven to be the case, and this approach neglects analysis of power and an understanding of the political dynamics at play around data (particularly when data is seen as an output rather than input).
Many data activities are also informed by the ‘extractive industry’ paradigm: citizens and frontline workers are seen as passive ‘data producers’ who hand over their information for it to be analysed and mined behind closed doors by ‘the experts’.
Given budget constraints facing many local and central governments, even well intentioned initiatives often take an incremental, passive transparency approach (i.e. let’s open the data first then see what happens), or they adopt a ‘supply/demand’ metaphor to data provision and usage…..
As a response to these issues, this blog series will explore the hypothesis that putting the question of citizen and government agency – rather than openness, volume or availability – at the centre of data initiatives has the potential to unleash greater, potentially more disruptive innovation and to focus efforts (ultimately leading to cost savings).
Our argument will be that data innovation initiatives should be informed by the principles that:
People closer to the problem are the best positioned to provide additional context to the data and potentially act on solutions (hence the importance of “thick data“).
Citizens are active agents rather than passive providers of ‘digital traces’.
Governments are both users and providers of data.
We should ask at every step of the way how can we empower communities and frontline workers to take better decisions over time, and how can we use data to enhance the decision making of every actor in the system (from government to the private sector, from private citizens to social enterprises) in their role of changing things for the better… (More)
Book edited by Collmann, Jeff, and Matei, Sorin Adam: “This book springs from a multidisciplinary, multi-organizational, and multi-sector conversation about the privacy and ethical implications of research in human affairs using big data. The need to cultivate and enlist the public’s trust in the abilities of particular scientists and scientific institutions constitutes one of this book’s major themes. The advent of the Internet, the mass digitization of research information, and social media brought about, among many other things, the ability to harvest – sometimes implicitly – a wealth of human genomic, biological, behavioral, economic, political, and social data for the purposes of scientific research as well as commerce, government affairs, and social interaction. What type of ethical dilemmas did such changes generate? How should scientists collect, manipulate, and disseminate this information? The effects of this revolution and its ethical implications are wide-ranging.
This book includes the opinions of myriad investigators, practitioners, and stakeholders in big data on human beings who also routinely reflect on the privacy and ethical issues of this phenomenon. Dedicated to the practice of ethical reasoning and reflection in action, the book offers a range of observations, lessons learned, reasoning tools, and suggestions for institutional practice to promote responsible big data research on human affairs. It caters to a broad audience of educators, researchers, and practitioners. Educators can use the volume in courses related to big data handling and processing. Researchers can use it for designing new methods of collecting, processing, and disseminating big data, whether in raw form or as analysis results. Lastly, practitioners can use it to steer future tools or procedures for handling big data. As this topic represents an area of great interest that still remains largely undeveloped, this book is sure to attract significant interest by filling an obvious gap in currently available literature. …(More)”
Nathaniel A. Raymond and Casey S. Harrity at HPN: “This generation of humanitarian actors will be defined by the actions they take in response to the challenges and opportunities of the digital revolution. At this critical moment in the history of humanitarian action, success depends on humanitarians recognising that the use of information communication technologies (ICTs) must become a core competency for humanitarian action. Treated in the past as a boutique sub-area of humanitarian practice, the central role that they now play has made the collection, analysis and dissemination of data derived from ICTs and other sources a basic skill required of humanitarians in the twenty-first century. ICT use must now be seen as an essential competence with critical implications for the efficiency and effectiveness of humanitarian response.
Practice in search of a doctrine
ICT use for humanitarian response runs the gamut from satellite imagery to drone deployment; to tablet and smartphone use; to crowd mapping and aggregation of big data. Humanitarian actors applying these technologies include front-line responders in NGOs and the UN but also, increasingly, volunteers and the private sector. The rapid diversification of available technologies as well as the increase in actors utilising them for humanitarian purposes means that the use of these technologies has far outpaced the ethical and technical guidance available to practitioners. Technology adoption by humanitarian actors prior to the creation of standards for how and how not to apply a specific tool has created a largely undiscussed and unaddressed ‘doctrine gap’.
Examples of this gap are, unfortunately, many. One such is the mass collection of personally identifiable cell phone data by humanitarian actors as part of phone surveys and cash transfer programmes. Although initial best practice and lessons learned have been developed for this method of data collection, no common inter-agency standards exist, nor are there comprehensive ethical frameworks for what data should be retained and for how long, and what data should be anonymised or not collected in the first place…(More)”
Report by Phoensight: “With the emergence of increasing computational power, high cloud storage capacity and big data comes an eager anticipation of one of the biggest IT transformations of our society today.
Open data has an instrumental role to play in our digital revolution by creating unprecedented opportunities for governments and businesses to leverage off previously unavailable information to strengthen their analytics and decision making for new client experiences. Whilst virtually every business recognises the value of data and the importance of the analytics built on it, the ability to realise the potential for maximising revenue and cost savings is not straightforward. The discovery of valuable insights often involves the acquisition of new data and an understanding of it. As we move towards an increasing supply of open data, technological and other entrepreneurs will look to better utilise government information for improved productivity.
This report uses a data-centric approach to examine the usability of information by considering ways in which open data could better facilitate data-driven innovations and further boost our economy. It assesses the state of open data today and suggests ways in which data providers could supply open data to optimise its use. A number of useful measures of information usability such as accessibility, quantity, quality and openness are presented which together contribute to the Open Data Usability Index (ODUI). For the first time, a comprehensive assessment of open data usability has been developed and is expected to be a critical step in taking the open data agenda to the next level.
With over two million government datasets assessed against the open data usability framework and models developed to link entire country’s datasets to key industry sectors, never before has such an extensive analysis been undertaken. Government open data across Australia, Canada, Singapore, the United Kingdom and the United States reveal that most countries have the capacity for improvements in their information usability. It was found that for 2015 the United Kingdom led the way followed by Canada, Singapore, the United States and Australia. The global potential of government open data is expected to reach 20 exabytes by 2020, provided governments are able to release as much data as possible within legislative constraints….(More)”
Jesse Dunietz at Nautilus: “…A feverish push for “big data” analysis has swept through biology, linguistics, finance, and every field in between. Although no one can quite agree how to define it, the general idea is to find datasets so enormous that they can reveal patterns invisible to conventional inquiry. The data are often generated by millions of real-world user actions, such as tweets or credit-card purchases, and they can take thousands of computers to collect, store, and analyze. To many companies and researchers, though, the investment is worth it because the patterns can unlock information about anything from genetic disorders to tomorrow’s stock prices.
But there’s a problem: It’s tempting to think that with such an incredible volume of data behind them, studies relying on big data couldn’t be wrong. But the bigness of the data can imbue the results with a false sense of certainty. Many of them are probably bogus—and the reasons why should give us pause about any research that blindly trusts big data.
In the case of language and culture, big data showed up in a big way in 2011, when Google released itsNgrams tool. Announced with fanfare in the journal Science, Google Ngrams allowed users to search for short phrases in Google’s database of scanned books—about 4 percent of all books ever published!—and see how the frequency of those phrases has shifted over time. The paper’s authors heralded the advent of “culturomics,” the study of culture based on reams of data and, since then, Google Ngrams has been, well, largely an endless source of entertainment—but also a goldmine for linguists, psychologists, and sociologists. They’ve scoured its millions of books to show that, for instance, yes, Americans are becoming more individualistic; that we’re “forgetting our past faster with each passing year”; and that moral ideals are disappearing from our cultural consciousness.
WE’RE LOSING HOPE: An Ngrams chart for the word “hope,” one of many intriguing plots found by xkcd author Randall Munroe. If Ngrams really does reflect our culture, we may be headed for a dark place.
The problems start with the way the Ngrams corpus was constructed. In a study published last October, three University of Vermont researchers pointed out that, in general, Google Books includes one copy of every book. This makes perfect sense for its original purpose: to expose the contents of those books to Google’s powerful search technology. From the angle of sociological research, though, it makes the corpus dangerously skewed….
Even once you get past the data sources, there’s still the thorny issue of interpretation. Sure, words like “character” and “dignity” might decline over the decades. But does that mean that people care about morality less? Not so fast, cautions Ted Underwood, an English professor at the University of Illinois, Urbana-Champaign. Conceptions of morality at the turn of the last century likely differed sharply from ours, he argues, and “dignity” might have been popular for non-moral reasons. So any conclusions we draw by projecting current associations backward are suspect.
Of course, none of this is news to statisticians and linguists. Data and interpretation are their bread and butter. What’s different about Google Ngrams, though, is the temptation to let the sheer volume of data blind us to the ways we can be misled.
This temptation isn’t unique to Ngrams studies; similar errors undermine all sorts of big data projects. Consider, for instance, the case of Google Flu Trends (GFT). Released in 2008, GFT would count words like “fever” and “cough” in millions of Google search queries, using them to “nowcast” how many people had the flu. With those estimates, public health officials could act two weeks before the Centers for Disease Control could calculate the true numbers from doctors’ reports.
When big data isn’t seen as a panacea, it can be transformative.
Initially, GFT was claimed to be 97 percent accurate. But as a study out of Northeastern University documents, that accuracy was a fluke. First, GFT completely missed the “swine flu” pandemic in the spring and summer of 2009. (It turned out that GFT was largely predicting winter.) Then, the system began to overestimate flu cases. In fact, it overshot the peak 2013 numbers by a whopping 140 percent. Eventually, Google just retired the program altogether.
So what went wrong? As with Ngrams, people didn’t carefully consider the sources and interpretation of their data. The data source, Google searches, was not a static beast. When Google started auto-completing queries, users started just accepting the suggested keywords, distorting the searches GFT saw. On the interpretation side, GFT’s engineers initially let GFT take the data at face value; almost any search term was treated as a potential flu indicator. With millions of search terms, GFT was practically guaranteed to over-interpret seasonal words like “snow” as evidence of flu.
But when big data isn’t seen as a panacea, it can be transformative. Several groups, like Columbia University researcher Jeffrey Shaman’s, for example, have outperformed the flu predictions of both the CDC and GFT by using the former to compensate for the skew of the latter. “Shaman’s team tested their model against actual flu activity that had already occurred during the season,” according to the CDC. By taking the immediate past into consideration, Shaman and his team fine-tuned their mathematical model to better predict the future. All it takes is for teams to critically assess their assumptions about their data….(More)
Ana Campoy at Quartz: “Mexico City just launched a massive experiment in digital democracy. It is asking its nearly 9 million residents to help draft a new constitution through social media. The crowdsourcing exercise is unprecedented in Mexico—and pretty much everywhere else.
as locals are known, can petition for issues to be included in the constitution through Change.org (link inSpanish), and make their case in person if they gather more than 10,000 signatures. They can also annotate proposals by the constitution drafters via PubPub, an editing platform (Spanish) similar to GoogleDocs.
The idea, in the words of the mayor, Miguel Angel Mancera, is to“bestow the constitution project (Spanish) with a democratic,progressive, inclusive, civic and plural character.”
There’s a big catch, however. The constitutional assembly—the body that has the final word on the new city’s basic law—is under no obligation to consider any of the citizen input. And then there are the practical difficulties of collecting and summarizing the myriad of views dispersed throughout one of the world’s largest cities.
That makes Mexico City’s public-consultation experiment a big test for the people’s digital power, one being watched around the world.Fittingly, the idea of crowdsourcing a constitution came about in response to an attempt to limit people power.
Fittingly, the idea of crowdsourcing a constitution came about in response to an attempt to limit people power.
For decades, city officials had fought to get out from under the thumb of the federal government, which had the final word on decisions such as who should be the city’s chief of police. This year, finally, they won a legal change that turns the Distrito Federal (federal district), similar to the US’s District of Columbia, into Ciudad de México (Mexico City), a more autonomous entity, more akin to a state. (Confusingly, it’s just part of the larger urban area also colloquially known as Mexico City, which spills into neighboring states.)
However, trying to retain some control, the Mexican congress decided that only 60% of the delegates to the city’s constitutional assembly would be elected by popular vote. The rest will be assigned by the president, congress, and Mancera, the mayor. Mancera is also the only one who can submit a draft constitution to the assembly.
Mancera’s response was to create a committee of some 30 citizens(Spanish), including politicians, human-rights advocates, journalists,and even a Paralympic gold medalist, to write his draft. He also calledfor the development of mechanisms to gather citizens’ “aspirations,values, and longing for freedom and justice” so they can beincorporated into the final document.
The mechanisms, embedded in an online platform (Spanish) that offersvarious ways to weigh in, were launched at the end of March and willcollect inputs until September 1. The drafting group has until themiddle of that month to file its text with the assembly, which has toapprove the new constitution by the end of January.
An experiment with few precedents
Mexico City didn’t have a lot of examples to draw on, since not a lot ofplaces have experience with crowdsourcing laws. In the US, a few locallawmakers have used Wiki pages and GitHub to draft bills, says MarilynBautista, a lecturer at Stanford Law School who has researched thepractice. Iceland—with a population some 27 times smaller than MexicoCity’s—famously had its citizens contribute to its constitution withinput from social media. The effort failed after the new constitution gotstuck in parliament.
In Mexico City, where many citizens already feel left out, the first bighurdle is to convince them it’s worth participating….
Then comes the task of making sense of the cacophony that will likelyemerge. Some of the input can be very easily organized—the results ofthe survey, for example, are being graphed in real time. But there could be thousands of documents and comments on the Change.org petitionsand the editing platform.
Ideas are grouped into 18 topics, such as direct democracy,transparency and economic rights. They are prioritized based on theamount of support they’ve garnered and how relevant they are, saidBernardo Rivera, an adviser for the city. Drafters get a weekly deliveryof summarized citizen petitions….
An essay about human rights on the PubPub platform.(PubPub)
The most elaborate part of the system is PubPub, an open publishing platform similar to Google Docs, which is based on a project originally developed by MIT’s Media Lab. The drafters are supposed to post essays on how to address constitutional issues, and potentially, the constitution draft itself, once there is one. Only they—or whoever they authorize—will be able to reword the original document.
User comments and edits are recorded on a side panel, with links to the portion of text they refer to. Another screen records every change, so everyone can track which suggestions have made it into the text. Members of the public can also vote comments up or down, or post their own essays….(More).