Data innovation: where to start? With the road less taken


Giulio Quaggiotto at Nesta: “Over the past decade we’ve seen an explosion in the amount of data we create, with more being captured about our lives than ever before. As an industry, the public sector creates an enormous amount of information – from census data to tax data to health data. When it comes to use of the data however, despite many initiatives trying to promote open and big data for public policy as well as evidence-based policymaking, we feel there is still a long way to go.

Why is that? Data initiatives are often created under the assumption that if data is available, people (whether citizens or governments) will use it. But this hasn’t necessarily proven to be the case, and this approach neglects analysis of power and an understanding of the political dynamics at play around data (particularly when data is seen as an output rather than input).

Many data activities are also informed by the ‘extractive industry’ paradigm: citizens and frontline workers are seen as passive ‘data producers’ who hand over their information for it to be analysed and mined behind closed doors by ‘the experts’.

Given budget constraints facing many local and central governments, even well intentioned initiatives often take an incremental, passive transparency approach (i.e. let’s open the data first then see what happens), or they adopt a ‘supply/demand’ metaphor to data provision and usage…..

As a response to these issues, this blog series will explore the hypothesis that putting the question of citizen and government agency – rather than openness, volume or availability – at the centre of data initiatives has the potential to unleash greater, potentially more disruptive innovation and to focus efforts (ultimately leading to cost savings).

Our argument will be that data innovation initiatives should be informed by the principles that:

  • People closer to the problem are the best positioned to provide additional context to the data and potentially act on solutions (hence the importance of “thick data“).

  • Citizens are active agents rather than passive providers of ‘digital traces’.

  • Governments are both users and providers of data.

  • We should ask at every step of the way how can we empower communities and frontline workers to take better decisions over time, and how can we use data to enhance the decision making of every actor in the system (from government to the private sector, from private citizens to social enterprises) in their role of changing things for the better… (More)

 

Ethical Reasoning in Big Data


Book edited by Collmann, Jeff, and Matei, Sorin Adam: “This book springs from a multidisciplinary, multi-organizational, and multi-sector conversation about the privacy and ethical implications of research in human affairs using big data. The need to cultivate and enlist the public’s trust in the abilities of particular scientists and scientific institutions constitutes one of this book’s major themes. The advent of the Internet, the mass digitization of research information, and social media brought about, among many other things, the ability to harvest – sometimes implicitly – a wealth of human genomic, biological, behavioral, economic, political, and social data for the purposes of scientific research as well as commerce, government affairs, and social interaction. What type of ethical dilemmas did such changes generate? How should scientists collect, manipulate, and disseminate this information? The effects of this revolution and its ethical implications are wide-ranging.

This book includes the opinions of myriad investigators, practitioners, and stakeholders in big data on human beings who also routinely reflect on the privacy and ethical issues of this phenomenon. Dedicated to the practice of ethical reasoning and reflection in action, the book offers a range of observations, lessons learned, reasoning tools, and suggestions for institutional practice to promote responsible big data research on human affairs. It caters to a broad audience of educators, researchers, and practitioners. Educators can use the volume in courses related to big data handling and processing. Researchers can use it for designing new methods of collecting, processing, and disseminating big data, whether in raw form or as analysis results. Lastly, practitioners can use it to steer future tools or procedures for handling big data. As this topic represents an area of great interest that still remains largely undeveloped, this book is sure to attract significant interest by filling an obvious gap in currently available literature. …(More)”

Addressing the ‘doctrine gap’: professionalising the use of Information Communication Technologies in humanitarian action


Nathaniel A. Raymond and Casey S. Harrity at HPN: “This generation of humanitarian actors will be defined by the actions they take in response to the challenges and opportunities of the digital revolution. At this critical moment in the history of humanitarian action, success depends on humanitarians recognising that the use of information communication technologies (ICTs) must become a core competency for humanitarian action. Treated in the past as a boutique sub-area of humanitarian practice, the central role that they now play has made the collection, analysis and dissemination of data derived from ICTs and other sources a basic skill required of humanitarians in the twenty-first century. ICT use must now be seen as an essential competence with critical implications for the efficiency and effectiveness of humanitarian response.

Practice in search of a doctrine

ICT use for humanitarian response runs the gamut from satellite imagery to drone deployment; to tablet and smartphone use; to crowd mapping and aggregation of big data. Humanitarian actors applying these technologies include front-line responders in NGOs and the UN but also, increasingly, volunteers and the private sector. The rapid diversification of available technologies as well as the increase in actors utilising them for humanitarian purposes means that the use of these technologies has far outpaced the ethical and technical guidance available to practitioners. Technology adoption by humanitarian actors prior to the creation of standards for how and how not to apply a specific tool has created a largely undiscussed and unaddressed ‘doctrine gap’.

Examples of this gap are, unfortunately, many. One such is the mass collection of personally identifiable cell phone data by humanitarian actors as part of phone surveys and cash transfer programmes. Although initial best practice and lessons learned have been developed for this method of data collection, no common inter-agency standards exist, nor are there comprehensive ethical frameworks for what data should be retained and for how long, and what data should be anonymised or not collected in the first place…(More)”

Open Data Supply: Enriching the usability of information


Report by Phoensight: “With the emergence of increasing computational power, high cloud storage capacity and big data comes an eager anticipation of one of the biggest IT transformations of our society today.

Open data has an instrumental role to play in our digital revolution by creating unprecedented opportunities for governments and businesses to leverage off previously unavailable information to strengthen their analytics and decision making for new client experiences. Whilst virtually every business recognises the value of data and the importance of the analytics built on it, the ability to realise the potential for maximising revenue and cost savings is not straightforward. The discovery of valuable insights often involves the acquisition of new data and an understanding of it. As we move towards an increasing supply of open data, technological and other entrepreneurs will look to better utilise government information for improved productivity.

This report uses a data-centric approach to examine the usability of information by considering ways in which open data could better facilitate data-driven innovations and further boost our economy. It assesses the state of open data today and suggests ways in which data providers could supply open data to optimise its use. A number of useful measures of information usability such as accessibility, quantity, quality and openness are presented which together contribute to the Open Data Usability Index (ODUI). For the first time, a comprehensive assessment of open data usability has been developed and is expected to be a critical step in taking the open data agenda to the next level.

With over two million government datasets assessed against the open data usability framework and models developed to link entire country’s datasets to key industry sectors, never before has such an extensive analysis been undertaken. Government open data across Australia, Canada, Singapore, the United Kingdom and the United States reveal that most countries have the capacity for improvements in their information usability. It was found that for 2015 the United Kingdom led the way followed by Canada, Singapore, the United States and Australia. The global potential of government open data is expected to reach 20 exabytes by 2020, provided governments are able to release as much data as possible within legislative constraints….(More)”

How Big Data Creates False Confidence


Jesse Dunietz at Nautilus: “…A feverish push for “big data” analysis has swept through biology, linguistics, finance, and every field in between. Although no one can quite agree how to define it, the general idea is to find datasets so enormous that they can reveal patterns invisible to conventional inquiry. The data are often generated by millions of real-world user actions, such as tweets or credit-card purchases, and they can take thousands of computers to collect, store, and analyze. To many companies and researchers, though, the investment is worth it because the patterns can unlock information about anything from genetic disorders to tomorrow’s stock prices.

But there’s a problem: It’s tempting to think that with such an incredible volume of data behind them, studies relying on big data couldn’t be wrong. But the bigness of the data can imbue the results with a false sense of certainty. Many of them are probably bogus—and the reasons why should give us pause about any research that blindly trusts big data.

In the case of language and culture, big data showed up in a big way in 2011, when Google released itsNgrams tool. Announced with fanfare in the journal Science, Google Ngrams allowed users to search for short phrases in Google’s database of scanned books—about 4 percent of all books ever published!—and see how the frequency of those phrases has shifted over time. The paper’s authors heralded the advent of “culturomics,” the study of culture based on reams of data and, since then, Google Ngrams has been, well, largely an endless source of entertainment—but also a goldmine for linguists, psychologists, and sociologists. They’ve scoured its millions of books to show that, for instance, yes, Americans are becoming more individualistic; that we’re “forgetting our past faster with each passing year”; and that moral ideals are disappearing from our cultural consciousness.

WE’RE LOSING HOPE: An Ngrams chart for the word “hope,” one of many intriguing plots found by xkcd author Randall Munroe. If Ngrams really does reflect our culture, we may be headed for a dark place.

The problems start with the way the Ngrams corpus was constructed. In a study published last October, three University of Vermont researchers pointed out that, in general, Google Books includes one copy of every book. This makes perfect sense for its original purpose: to expose the contents of those books to Google’s powerful search technology. From the angle of sociological research, though, it makes the corpus dangerously skewed….

Even once you get past the data sources, there’s still the thorny issue of interpretation. Sure, words like “character” and “dignity” might decline over the decades. But does that mean that people care about morality less? Not so fast, cautions Ted Underwood, an English professor at the University of Illinois, Urbana-Champaign. Conceptions of morality at the turn of the last century likely differed sharply from ours, he argues, and “dignity” might have been popular for non-moral reasons. So any conclusions we draw by projecting current associations backward are suspect.

Of course, none of this is news to statisticians and linguists. Data and interpretation are their bread and butter. What’s different about Google Ngrams, though, is the temptation to let the sheer volume of data blind us to the ways we can be misled.

This temptation isn’t unique to Ngrams studies; similar errors undermine all sorts of big data projects. Consider, for instance, the case of Google Flu Trends (GFT). Released in 2008, GFT would count words like “fever” and “cough” in millions of Google search queries, using them to “nowcast” how many people had the flu. With those estimates, public health officials could act two weeks before the Centers for Disease Control could calculate the true numbers from doctors’ reports.

When big data isn’t seen as a panacea, it can be transformative.

Initially, GFT was claimed to be 97 percent accurate. But as a study out of Northeastern University documents, that accuracy was a fluke. First, GFT completely missed the “swine flu” pandemic in the spring and summer of 2009. (It turned out that GFT was largely predicting winter.) Then, the system began to overestimate flu cases. In fact, it overshot the peak 2013 numbers by a whopping 140 percent. Eventually, Google just retired the program altogether.

So what went wrong? As with Ngrams, people didn’t carefully consider the sources and interpretation of their data. The data source, Google searches, was not a static beast. When Google started auto-completing queries, users started just accepting the suggested keywords, distorting the searches GFT saw. On the interpretation side, GFT’s engineers initially let GFT take the data at face value; almost any search term was treated as a potential flu indicator. With millions of search terms, GFT was practically guaranteed to over-interpret seasonal words like “snow” as evidence of flu.

But when big data isn’t seen as a panacea, it can be transformative. Several groups, like Columbia University researcher Jeffrey Shaman’s, for example, have outperformed the flu predictions of both the CDC and GFT by using the former to compensate for the skew of the latter. “Shaman’s team tested their model against actual flu activity that had already occurred during the season,” according to the CDC. By taking the immediate past into consideration, Shaman and his team fine-tuned their mathematical model to better predict the future. All it takes is for teams to critically assess their assumptions about their data….(More)

Mexico City is crowdsourcing its new constitution using Change.org in a democracy experiment


Ana Campoy at Quartz: “Mexico City just launched a massive experiment in digital democracy. It is asking its nearly 9 million residents to help draft a new constitution through social media. The crowdsourcing exercise is unprecedented in Mexico—and pretty much everywhere else.

as locals are known, can petition for issues to be included in the constitution through Change.org (link inSpanish), and make their case in person if they gather more than 10,000 signatures. They can also annotate proposals by the constitution drafters via PubPub, an editing platform (Spanish) similar to GoogleDocs.

The idea, in the words of the mayor, Miguel Angel Mancera, is to“bestow the constitution project (Spanish) with a democratic,progressive, inclusive, civic and plural character.”

There’s a big catch, however. The constitutional assembly—the body that has the final word on the new city’s basic law—is under no obligation to consider any of the citizen input. And then there are the practical difficulties of collecting and summarizing the myriad of views dispersed throughout one of the world’s largest cities.

That makes Mexico City’s public-consultation experiment a big test for the people’s digital power, one being watched around the world.Fittingly, the idea of crowdsourcing a constitution came about in response to an attempt to limit people power.

Fittingly, the idea of crowdsourcing a constitution came about in response to an attempt to limit people power.
For decades, city officials had fought to get out from under the thumb of the federal government, which had the final word on decisions such as who should be the city’s chief of police. This year, finally, they won a legal change that turns the Distrito Federal (federal district), similar to the US’s District of Columbia, into Ciudad de México (Mexico City), a more autonomous entity, more akin to a state. (Confusingly, it’s just part of the larger urban area also colloquially known as Mexico City, which spills into neighboring states.)

However, trying to retain some control, the Mexican congress decided that only 60% of the delegates to the city’s constitutional assembly would be elected by popular vote. The rest will be assigned by the president, congress, and Mancera, the mayor. Mancera is also the only one who can submit a draft constitution to the assembly.

Mancera’s response was to create a committee of some 30 citizens(Spanish), including politicians, human-rights advocates, journalists,and even a Paralympic gold medalist, to write his draft. He also calledfor the development of mechanisms to gather citizens’ “aspirations,values, and longing for freedom and justice” so they can beincorporated into the final document.

 The mechanisms, embedded in an online platform (Spanish) that offersvarious ways to weigh in, were launched at the end of March and willcollect inputs until September 1. The drafting group has until themiddle of that month to file its text with the assembly, which has toapprove the new constitution by the end of January.
 An experiment with few precedents

Mexico City didn’t have a lot of examples to draw on, since not a lot ofplaces have experience with crowdsourcing laws. In the US, a few locallawmakers have used Wiki pages and GitHub to draft bills, says MarilynBautista, a lecturer at Stanford Law School who has researched thepractice. Iceland—with a population some 27 times smaller than MexicoCity’s—famously had its citizens contribute to its constitution withinput from social media. The effort failed after the new constitution gotstuck in parliament.

In Mexico City, where many citizens already feel left out, the first bighurdle is to convince them it’s worth participating….

Then comes the task of making sense of the cacophony that will likelyemerge. Some of the input can be very easily organized—the results ofthe survey, for example, are being graphed in real time. But there could be thousands of documents and comments on the Change.org petitionsand the editing platform.

 Ideas are grouped into 18 topics, such as direct democracy,transparency and economic rights. They are prioritized based on theamount of support they’ve garnered and how relevant they are, saidBernardo Rivera, an adviser for the city. Drafters get a weekly deliveryof summarized citizen petitions….
An essay about human rights on the PubPub platform.(PubPub)

The most elaborate part of the system is PubPub, an open publishing platform similar to Google Docs, which is based on a project originally developed by MIT’s Media Lab. The drafters are supposed to post essays on how to address constitutional issues, and potentially, the constitution draft itself, once there is one. Only they—or whoever they authorize—will be able to reword the original document.

User comments and edits are recorded on a side panel, with links to the portion of text they refer to. Another screen records every change, so everyone can track which suggestions have made it into the text. Members of the public can also vote comments up or down, or post their own essays….(More).

The “Social Side” of Public Policy: Monitoring Online Public Opinion and Its Mobilization During the Policy Cycle


Andrea Ceron and Fedra Negri in Policy & Internet: “This article addresses the potential role played by social media analysis in promoting interaction between politicians, bureaucrats, and citizens. We show that in a “Big Data” world, the comments posted online by social media users can profitably be used to extract meaningful information, which can support the action of policymakers along the policy cycle. We analyze Twitter data through the technique of Supervised Aggregated Sentiment Analysis. We develop two case studies related to the “jobs act” labor market reform and the “#labuonascuola” school reform, both formulated and implemented by the Italian Renzi cabinet in 2014–15. Our results demonstrate that social media data can help policymakers to rate the available policy alternatives according to citizens’ preferences during the formulation phase of a public policy; can help them to monitor citizens’ opinions during the implementation phase; and capture stakeholders’ mobilization and de-mobilization processes. We argue that, although social media analysis cannot replace other research methods, it provides a fast and cheap stream of information that can supplement traditional analyses, enhancing responsiveness and institutional learning….(More)”

The Open Data Barometer (3rd edition)


The Open Data Barometer: “Once the preserve of academics and statisticians, data has become a development cause embraced by everyone from grassroots activists to the UN Secretary-General. There’s now a clear understanding that we need robust data to drive democracy and development — and a lot of it.

Last year, the world agreed the Sustainable Development Goals (SDGs) — seventeen global commitments that set an ambitious agenda to end poverty, fight inequality and tackle climate change by 2030. Recognising that good data is essential to the success of the SDGs, the Global Partnership for Sustainable Development Data and the International Open Data Charter were launched as the SDGs were unveiled. These alliances mean the “data revolution” now has over 100 champions willing to fight for it. Meanwhile, Africa adopted the African Data Consensus — a roadmap to improving data standards and availability in a region that has notoriously struggled to capture even basic information such as birth registration.

But while much has been made of the need for bigger and better data to power the SDGs, this year’s Barometer follows the lead set by the International Open Data Charter by focusing on how much of this data will be openly available to the public.

Open data is essential to building accountable and effective institutions, and to ensuring public access to information — both goals of SDG 16. It is also essential for meaningful monitoring of progress on all 169 SDG targets. Yet the promise and possibilities offered by opening up data to journalists, human rights defenders, parliamentarians, and citizens at large go far beyond even these….

At a glance, here are this year’s key findings on the state of open data around the world:

    • Open data is entering the mainstream.The majority of the countries in the survey (55%) now have an open data initiative in place and a national data catalogue providing access to datasets available for re-use. Moreover, new open data initiatives are getting underway or are promised for the near future in a number of countries, including Ecuador, Jamaica, St. Lucia, Nepal, Thailand, Botswana, Ethiopia, Nigeria, Rwanda and Uganda. Demand is high: civil society and the tech community are using government data in 93% of countries surveyed, even in countries where that data is not yet fully open.
    • Despite this, there’s been little to no progress on the number of truly open datasets around the world.Even with the rapid spread of open government data plans and policies, too much critical data remains locked in government filing cabinets. For example, only two countries publish acceptable detailed open public spending data. Of all 1,380 government datasets surveyed, almost 90% are still closed — roughly the same as in the last edition of the Open Data Barometer (when only 130 out of 1,290 datasets, or 10%, were open). What is more, much of the approximately 10% of data that meets the open definition is of poor quality, making it difficult for potential data users to access, process and work with it effectively.
    • “Open-washing” is jeopardising progress. Many governments have advertised their open data policies as a way to burnish their democratic and transparent credentials. But open data, while extremely important, is just one component of a responsive and accountable government. Open data initiatives cannot be effective if not supported by a culture of openness where citizens are encouraged to ask questions and engage, and supported by a legal framework. Disturbingly, in this edition we saw a backslide on freedom of information, transparency, accountability, and privacy indicators in some countries. Until all these factors are in place, open data cannot be a true SDG accelerator.
    • Implementation and resourcing are the weakest links.Progress on the Barometer’s implementation and impact indicators has stalled or even gone into reverse in some cases. Open data can result in net savings for the public purse, but getting individual ministries to allocate the budget and staff needed to publish their data is often an uphill battle, and investment in building user capacity (both inside and outside of government) is scarce. Open data is not yet entrenched in law or policy, and the legal frameworks supporting most open data initiatives are weak. This is a symptom of the tendency of governments to view open data as a fad or experiment with little to no long-term strategy behind its implementation. This results in haphazard implementation, weak demand and limited impact.
    • The gap between data haves and have-nots needs urgent attention.Twenty-six of the top 30 countries in the ranking are high-income countries. Half of open datasets in our study are found in just the top 10 OECD countries, while almost none are in African countries. As the UN pointed out last year, such gaps could create “a whole new inequality frontier” if allowed to persist. Open data champions in several developing countries have launched fledgling initiatives, but too often those good open data intentions are not adequately resourced, resulting in weak momentum and limited success.
    • Governments at the top of the Barometer are being challenged by a new generation of open data adopters. Traditional open data stalwarts such as the USA and UK have seen their rate of progress on open data slow, signalling that new political will and momentum may be needed as more difficult elements of open data are tackled. Fortunately, a new generation of open data adopters, including France, Canada, Mexico, Uruguay, South Korea and the Philippines, are starting to challenge the ranking leaders and are adopting a leadership attitude in their respective regions. The International Open Data Charter could be an important vehicle to sustain and increase momentum in challenger countries, while also stimulating renewed energy in traditional open data leaders….(More)”

Accountable Algorithms


Paper by Joshua A. Kroll et al: “Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.

The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.

We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.

The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.

The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.

Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability….(More)”

Smart City and Smart Government: Synonymous or Complementary?


Paper by Leonidas G. Anthopoulos and Christopher G. Reddick: “Smart City is an emerging and multidisciplinary domain. It has been recently defined as innovation, not necessarily but mainly through information and communications technologies (ICT), which enhance urban life in terms of people, living, economy, mobility and governance. Smart government is also an emerging topic, which attracts increasing attention from scholars who work in public administration, political and information sciences. There is no widely accepted definition for smart government, but it appears to be the next step of e-government with the use of technology and innovation by governments for better performance. However, it is not clear whether these two terms co-exist or concern different domains. The aim of this paper is to investigate the term smart government and to clarify its meaning in relationship to the smart city. In this respect this paper performed a comprehensive literature review analysis and concluded that smart government is shown not to be synonymous with smart city. Our findings show that smart city has a dimension of smart government, and smart government uses smart city as an area of practice. The authors conclude that smart city is complimentary, part of larger smart government movement….(More)”