Big Data, Big Questions


Special Issue by the International Journal of Communication on Big Data, Big Questions:

Critiquing Big Data: Politics, Ethics, Epistemology | Special Section Introduction PDF
Kate Crawford, Mary L. Gray, Kate Miltner 10 pgs.
The Big Data Divide ABSTRACT PDF
Mark Andrejevic 17 pgs.
Metaphors of Big Data ABSTRACT PDF
Cornelius Puschmann, Jean Burgess 20 pgs.
Advertising, Big Data and the Clearance of the Public Realm: Marketers’ New Approaches to the Content Subsidy ABSTRACT PDF
Nick Couldry, Joseph Turow 17 pgs.
A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large Scale Data Sets ABSTRACT PDF
Lawrence Busch 18 pgs.
Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data ABSTRACT PDF
Kevin Driscoll, Shawn Walker 20 pgs.
Living on Fumes: Digital Footprints, Data Fumes, and the Limitations of Spatial Big Data ABSTRACT PDF
Jim Thatcher 19 pgs.
This One Does Not Go Up To 11: The Quantified Self Movement as an Alternative Big Data Practice ABSTRACT PDF
Dawn Nafus, Jamie Sherman 11 pgs.
The Theory/Data Thing ABSTRACT PDF
Geoffrey C. Bowker 5 pgs.

Let's amplify California's collective intelligence


Gavin Newsom and Ken Goldberg at the SFGate: “Although the results of last week’s primary election are still being certified, we already know that voter turnout was among the lowest in California’s history. Pundits will rant about the “cynical electorate” and wag a finger at disengaged voters shirking their democratic duties, but we see the low turnout as a symptom of broader forces that affect how people and government interact.
The methods used to find out what citizens think and believe are limited to elections, opinion polls, surveys and focus groups. These methods may produce valuable information, but they are costly, infrequent and often conducted at the convenience of government or special interests.
We believe that new technology has the potential to increase public engagement by tapping the collective intelligence of Californians every day, not just on election day.
While most politicians already use e-mail and social media, these channels are easily dominated by extreme views and tend to regurgitate material from mass media outlets.
We’re exploring an alternative.
The California Report Card is a mobile-friendly web-based platform that streamlines and organizes public input for the benefit of policymakers and elected officials. The report card allows participants to assign letter grades to key issues and to suggest new ideas for consideration; public officials then can use that information to inform their decisions.
In an experimental version of the report card released earlier this year, residents from all 58 counties assigned more than 20,000 grades to the state of California and also suggested issues they feel deserve priority at the state level. As one participant noted: “This platform allows us to have our voices heard. The ability to review and grade what others suggest is important. It enables elected officials to hear directly how Californians feel.”
Initial data confirm that Californians approve of our state’s rollout of Obamacare, but are very concerned about the future of our schools and universities.
There was also a surprise. California Report Card suggestions for top state priorities revealed consistently strong interest and support for more attention to disaster preparedness. Issues related to this topic were graded as highly important by a broad cross section of participants across the state. In response, we’re testing new versions of the report card that can focus on topics related to wildfires and earthquakes.
The report card is part of an ongoing collaboration between the CITRIS Data and Democracy Initiative at UC Berkeley and the Office of the Lieutenant Governor to explore how technology can improve public communication and bring the government closer to the people. Our hunch is that engineering concepts can be adapted for public policy to rapidly identify real insights from constituents and resist gaming by special interests.
You don’t have to wait for the next election to have your voice heard by officials in Sacramento. The California Report Card is now accessible from cell phones, desktop and tablet computers. We encourage you to contribute your own ideas to amplify California’s collective intelligence. It’s easy, just click “participate” on this website: CaliforniaReportCard.org”

How to Make Government Data Sites Better


Flowing Data: “Accessing government data from the source is frustrating. If you’ve done it, or at least tried to, you know the pain that is oddly formatted files, search that doesn’t work, and annotation that tells you nothing about the data in front of you.
The most frustrating part of the process is knowing how useful the data could be if only it were shared more simply. Unfortunately, ease-of-use is rarely the case, and we spend more time formatting and inspecting the data than we do actually putting it to use. Shouldn’t it be the other way around?
It’s this painstaking process that draws so much ire. It’s hard not to complain.
Maybe the people in charged of these sites just don’t know what’s going on. Or maybe they’re so overwhelmed by suck that they don’t know where to start. Or they’re unknowingly infected by the that-is-how-we’ve-always-done-it bug.
Whatever it may be, I need to think out loud about how to improve these sites. Empty complaints don’t help.
I use the Centers for Disease Control and Prevention as the test subject, but most of the things covered should easily generalize to other government sites (and non-government ones too). And I choose CDC not because they’re the worst but because they publish a lot of data that is of immediate and direct use to the general public.
I approach this from the point of view of someone who uses government data, beyond pulling a single data point from a spreadsheet. I’m also going to put on my Captain Obvious hat, because what seems obvious to some is apparently a black box to others.
Provide a useable data format
Sometimes it feels like government data is available in every format except the one that data users want. The worst one was when I downloaded a 2gb file, and upon unzipping it, I discovered it was a EXE file.
Data in PDF format is a kick in the face for people looking for CSV files. There might be ways to get the data out from PDFs, but it’s still a pain when you have more than a handful of files….
Useable data format is the most important, and if there’s just one thing you change, make it this.
(Raw data is fine too)
It’s rare to find raw government data, so it’s like striking gold when it actually happens. I realize you run into issues with data privacy, quality, missing data, etc. For these data sources, I appreciate the estimates with standard errors. However, the less aggregated (the more raw) you can provide, the better.
CSV for that too, please.
Never mind the fancy sharing tools
Not all government data is wedged into PDF files, and some of it is accessible via export tools that let you subset and layout your data exactly how you want it. The problem is that in an effort to please everyone, you end up with a tool shown on the left….
Tell people where to get the data
Get the things above done, and your government data site is exponentially better than it was before, but let’s keep going.
The navigation process to get to a dataset is incredibly convoluted, which makes it hard to find data and difficult to return to it….
Show visual previews
I’m all for visualization integrated with the data search tools. It always sucks when I spend time formatting data only to find that it wasn’t worth my time. Census Reporter is a fine example of how this might work.
That said, visual tools plus an upgrade to the previously mentioned things is a big undertaking, especially if you’re going to do it right. So I’m perfectly fine if you skip this step to focus your resources on data that’s easier to use and download. Leave the visualizing and analysis to us.
Decide what’s important, archive the rest
So much cruft. So many old documents. Broken links. Create an archive and highlight what people come to your site for.
Wrapping up
There’s plenty more stuff to update, especially once you start to work with the details, but this should be a good place to start. It’s a lot easier to point out what you can do to improve government data sharing than it is to actually do it of course. There are so many people, policies, and oh yes, politics, that it can be hard to change.”

Open Government Will Reshape Latin America


Alejandro Guerrero at Medium: “When people think on the place for innovations, they typically think on innovation being spurred by large firms and small startups based in the US. And particularly in that narrow stretch of land and water called Silicon Valley.
However, the flux of innovation taking place in the intersection between technology and government is phenomenal and emerging everywhere. From the marble hallways of parliaments everywhere —including Latin America’s legislative houses— to office hubs of tech-savvy non-profits full of enthusiastic social changers —also including Latin American startups— a driving force is starting to challenge our conception of how government and citizens can and should interact. And few people are discussing or analyzing these developments.
Open Government in Latin America
The potential for Open Government to improve government’s decision-making and performance is huge. And it is particularly immense in middle income countries such as the ones in Latin America, where the combination of growing incomes, more sophisticated citizens’ demands, and broken public services is generating a large bottom-up pressure and requesting more creative solutions from governments to meet the enormous social needs, while cutting down corruption and improving governance.
It is unsurprising that citizens from all over Latin America are increasingly taking the streets and demanding better public services and more transparent institutions.
While these protests are necessarily short-lived and unarticulated —a product of growing frustration with government— they are a symptom with deeper causes that won’t go easily away, and these protests will most likely come back with increasing frequency and the unresolved frustration may eventually transmute in political platforms with more radical ideas to challenge the status quo.
Behind the scene, governments across the region still face enormous weaknesses in public management, ill-prepared and underpaid public officials carry on with their duties as the platonic idea of a demotivated workforce, and the opportunities for corruption, waste, and nepotism are plenty. The growing segment of more affluent citizens simply opt out from government and resort to private alternatives, thus exacerbating inequalities in the already most unequal region in the world. The crumbling middle classes and the poor can just resort to voicing their complaints. And they are increasingly doing so.
And here is where open government initiatives might play a transformative role, disrupting the way governments make decisions and work while empowering citizens in the process.
The preconditions for OpenGov are almost here
In Latin America, connectivity rates are growing fast (reaching 61% in 2013 for the Americas as a whole), close to 90% of the population owns a cellphone, and access to higher levels of education keeps growing (as an example, the latest PISA report indicates that Mexico went from 58% in 2003 to 70% high-schoolers in 2012). The social conditions for a stronger role of citizens in government are increasingly there.
Moreover, most Latin American countries passed transparency laws during the 2000s, creating the enabling environment for open government initiatives to flourish. It is thus unsurprising that the next generation of young government bureaucrats, on average more internet-savvy and better educated than its predecessors, is taking over and embracing innovations in government. And they are finding echo (and suppliers of ideas and apps!) among local startups and civil society groups, while also being courted by large tech corporations (think of Google or Microsoft) behind succulent government contracts associated with this form of “doing good”.
This is an emerging galaxy of social innovators, technologically-savvy bureaucrats, and engaged citizens providing a large crowd-sourcing community and an opportunity to test different approaches. And the underlying tectonic shifts are pushing governments towards that direction. For a sampler, check out the latest developments for Brazil, Argentina, Peru, Mexico, Colombia, Paraguay, Chile, Panama, Costa Rica, Guatemala, Honduras, Dominican Republic, Uruguay and (why not?) my own country, which I will include in the review often for the surprisingly limited progress of open government in this OECD member, which shares similar institutions and challenges with Latin America.

A Road Full of Promise…and Obstacles

Most of the progress in Latin America is quite recent, and the real impact is still often more limited once you abandon the halls of the Digital Government directorates and secretarías or look if you look beyond the typical government data portal. The resistance to change is as human as laughing, but it is particularly intense among the public sector side of human beings. Politics also typically plays a enormous role in resisting transparency open government, and in a context of weak institutions and pervasive corruption, the temptation to politically block or water down open data/open government projects is just too high. Selective release of data (if any) is too frequent, government agencies often act as silos by not sharing information with other government departments, and irrational fears by policy-makers combined with adoption barriers (well explained here) all contribute to deter the progress of the open government promise in Latin America…”

Special Issue on Innovation through Open Data


A Review of the State-of-the-Art and an Emerging Research Agenda in the Journal of Theoretical and Applied Electronic Commerce Research:

  • Going Beyond Open Data: Challenges and Motivations for Smart Disclosure in Ethical Consumption (Djoko Sigit Sayogo, Jing Zhang, Theresa A. Pardo, Giri K. Tayi, Jana Hrdinova, David F. Andersen and Luis Felipe Luna-Reyes)
  • Shaping Local Open Data Initiatives: Politics and Implications (Josefin Lassinantti, Birgitta Bergvall-Kåreborn and Anna Ståhlbröst)
  • A State-of-the-Art Analysis of the Current Public Data Landscape from a Functional, Semantic and Technical Perspective (Michael Petychakis, Olga Vasileiou, Charilaos Georgis, Spiros Mouzakitis and John Psarras)
  • Using a Method and Tool for Hybrid Ontology Engineering: an Evaluation in the Flemish Research Information Space (Christophe Debruyne and Pieter De Leenheer)
  • A Metrics-Driven Approach for Quality Assessment of Linked Open Data (Behshid Behkamal, Mohsen Kahani, Ebrahim Bagheri and Zoran Jeremic)
  • Open Government Data Implementation Evaluation (Peter Parycek, Johann Höchtl and Michael Ginner)
  • Data-Driven Innovation through Open Government Data (Thorhildur Jetzek, Michel Avital and Niels Bjorn-Andersen)

Technological Innovations and Future Shifts in International Politics


Paper by Askar Akaev and Vladimir Pantin in International Studies Quaterly: “How are large technological changes and important shifts in international politics interconnected? It is shown in the article that primary technological innovations, which take place in each Kondratieff cycle, change the balance of power between the leading states and cause shifts in international politics. In the beginning of the twenty-first century, the genesis and initial development of the cluster of new technologies takes place in periods of crisis and depression. Therefore, the authors forecast that the period 2013–2020 will be marked by the advancement of important technological innovations and massive geopolitical shifts in many regions of the world.”

Estonian plan for 'data embassies' overseas to back up government databases


Graeme Burton in Computing: “Estonia is planning to open “data embassies” overseas to back up government databases and to operate government “in the cloud“.
The aim is partly to improve efficiency, but driven largely by fear of invasion and occupation, Jaan Priisalu, the director general of Estonian Information System Authority, told Sky News.
He said: “We are planning to actually operate our government in the cloud. It’s clear also how it helps to protect the country, the territory. Usually when you are the military planner and you are planning the occupation of the territory, then one of the rules is suppress the existing institutions.
“And if you are not able to do it, it means that this political price of occupying the country will simply rise for planners.”
Part of the rationale for the plan, he continued, was fear of attack from Russia in particular, which has been heightened following the occupation of Crimea, formerly in Ukraine.
“It’s quite clear that you can have problems with your neighbours. And our biggest neighbour is Russia, and nowadays it’s quite aggressive. This is clear.”
The plan is to back up critical government databases outside of Estonia so that affairs of state can be conducted in the cloud, even if the country is invaded. It would also have the benefit of keeping government information out of invaders’ hands – provided it can keep its government cloud secure.
According to Sky News, the UK is already in advanced talks about hosting the Estonian government databases and may make the UK the first of Estonia’s data embassies.
Having wrested independence from the Soviet Union in 1991, Estonia has experienced frequent tension with its much bigger neighbour. In 2007, for example, after the relocation of the “Bronze Soldier of Tallinn” and the exhumation of the soldiers buried in a square in the centre of the capital to a military cemetery in April 2007, the country was subject to a prolonged cyber-attack sourced to Russia.
Russian hacker “Sp0Raw” said that the most efficient of the online attacks on Estonia could not have been carried out without the approval of Russian authorities and added that the hackers seemed to act under “recommendations” from parties in government. However, claims by Estonia that the Russian government was directly involved in the attacks were “empty words, not supported by technical data”.
Mike Witt, deputy director of the US Computer Emergency Response Team (CERT), suggested that the distributed denial-of-service (DDOS) attacks, while crippling to the Estonian government at the time, were not significant in scale from a technical standpoint. However, the Estonian government was forced to shut down many of its online operations in response.
At the same time, the Estonian government has been accused of implementing anti-Russian laws and discriminating against its large ethnic Russian population.
Last week, the Estonian government unveiled a plan to allow anyone in the world to apply for “digital citizenship of the country, enabling them to use Estonian online services, open bank accounts, and start companies without having to physically reside in the country.”

How open data can help shape the way we analyse electoral behaviour


Harvey Lewis (Deloitte), Ulrich Atz, Gianfranco Cecconi, Tom Heath (ODI) in The Guardian: Even after the local council elections in England and Northern Ireland on 22 May, which coincided with polling for the European Parliament, the next 12 months remain a busy time for the democratic process in the UK.
In September, the people of Scotland make their choice in a referendum on the future of the Union. Finally, the first fixed-term parliament in Westminster comes to an end with a general election in all areas of Great Britain and Northern Ireland in May 2015.
To ensure that as many people as possible are eligible and able to vote, the government is launching an ambitious programme of Individual Electoral Registration (IER) this summer. This will mean that the traditional, paper-based approach to household registration will shift to a tailored and largely digital process more in-keeping with the data-driven demands of the twenty-first century.
Under IER, citizens will need to provide ‘identifying information’, such as date of birth or national insurance number, when applying to register.

Ballots: stuck in the past?

However, despite the government’s attempts through IER to improve the veracity of information captured prior to ballots being posted, little has changed in terms of the vision for capturing, distributing and analysing digital data from election day itself.

Advertisement

Indeed, paper is still the chosen medium for data collection.
Digitising elections is fraught with difficulty, though. In the US, for example, the introduction of new voting machines created much controversy even though they are capable of providing ‘near-perfect’ ballot data.
The UK’s democratic process is not completely blind, though. Numerous opinion surveys are conducted both before and after polling, including the long-running British Election Study, to understand the shifting attitudes of a representative cross-section of the electorate.
But if the government does not retain in sufficient geographic detail digital information on the number of people who vote, then how can it learn what is necessary to reverse the long-running decline in turnout?

The effects of lack of data

To add to the debate around democratic engagement, a joint research team, with data scientists from Deloitte and the Open Data Institute (ODI), have been attempting to understand what makes voters tick.
Our research has been hampered by a significant lack of relevant data describing voter behaviour at electoral ward level, as well as difficulties in matching what little data is available to other open data sources, such as demographic data from the 2011 Census.
Even though individual ballot papers are collected and verified for counting the number of votes per candidate – the primary aim of elections, after all – the only recent elections for which aggregate turnout statistics have been published at ward level are the 2012 local council elections in England and Wales. In these elections, approximately 3,000 wards from a total of over 8,000 voted.
Data published by the Electoral Commission for the 2013 local council elections in England and Wales purports to be at ward level but is, in fact, for ‘county electoral divisions’, as explained by the Office for National Statistics.
Moreover, important factors related to the accessibility of polling stations – such as the distance from main population centres – could not be assessed because the location of polling stations remains the responsibility of individual local authorities – and only eight of these have so far published their data as open data.
Given these fundamental limitations, drawing any robust conclusions is difficult. Nevertheless, our research shows the potential for forecasting electoral turnout with relatively few census variables, the most significant of which are age and the size of the electorate in each ward.

What role can open data play?

The limited results described above provide a tantalising glimpse into a possible future scenario: where open data provides a deeper and more granular understanding of electoral behaviour.
On the back of more sophisticated analyses, policies for improving democratic engagement – particularly among young people – have the potential to become focused and evidence-driven.
And, although the data captured on election day will always remain primarily for the use of electing the public’s preferred candidate, an important secondary consideration is aggregating and publishing data that can be used more widely.
This may have been prohibitively expensive or too complex in the past but as storage and processing costs continue to fall, and the appetite for such knowledge grows, there is a compelling business case.
The benefits of this future scenario potentially include:

  • tailoring awareness and marketing campaigns to wards and other segments of the electorate most likely to respond positively and subsequently turn out to vote
  • increasing the efficiency with which European, general and local elections are held in the UK
  • improving transparency around the electoral process and stimulating increased democratic engagement
  • enhancing links to the Government’s other significant data collection activities, including the Census.

Achieving these benefits requires commitment to electoral data being collected and published in a systematic fashion at least at ward level. This would link work currently undertaken by the Electoral Commission, the ONS, Plymouth University’s Election Centre, the British Election Study and the more than 400 local authorities across the UK.”

How to treat government like an open source project


Ben Balter in OpenSource.com: “Open government is great. At least, it was a few election cycles ago. FOIA requests, open data, seeing how your government works—it’s arguably brought light to a lot of not-so-great practices, and in many cases, has spurred citizen-centric innovation not otherwise imagined before the information’s release.
It used to be that sharing information was really, really hard. Open government wasn’t even a possibility a few hundred years ago. Throughout the history of communication tools—be it the printing press, fax machine, or floppy disks—new tools have generally done three things: lowered the cost to transmit information, increased who that information could be made available to, and increase how quickly that information could be distributed. But, printing presses and fax machines have two limitations: they are one way and asynchronous. They let you more easily request, and eventually see how the sausage was made but don’t let you actually take part in the sausage-making. You may be able to see what’s wrong, but you don’t have the chance to make it better. By the time you find out, it’s already too late.
As technology allows us to communicate with greater frequency and greater fidelity, we have the chance to make our government not only transparent, but truly collaborative.

So, how do we encourage policy makers and bureaucrats to move from open government to collaborative government, to learn open source’s lessons about openness and collaboration at scale?
For one, we geeks can help to create a culture of transparency and openness within government by driving up the demand side of the equation. Be vocal, demand data, expect to see process, and once released, help build lightweight apps. Show potential change agents in government that their efforts will be rewarded.
Second, it’s a matter of tooling. We’ve got great tools out there—things like Git that can track who made what change when and open standards like CSV or JSON that don’t require proprietary software—but by-and-large they’re a foreign concept in government, at least among those empowered to make change. Command line interfaces with black background and green text can be intimidating to government bureaucrats used to desktop publishing tools. Make it easier for government to do the right thing and choose open standards over proprietary tooling.”
Last, be a good open source ambassador. Help your home city or state get involved with open source. Encourage them to take their first step (be it consuming open source, publishing, or collaborating with the public), teach them what it means to do things in the open, And when they do push code outside the firewall, above all, be supportive. We’re in this together.
As technology makes it easier to work together, geeks can help make our government not just open, but in fact collaborative. Government is the world’s largest and longest running open source project (bugs, trolls, and all). It’s time we start treating it like one.

Citizen participation and technology


ICTlogy: “The recent, rapid rise in the use of digital technology is changing relationships between citizens, organizations and public institutions, and expanding political participation. But while technology has the potential to amplify citizens’ voices, it must be accompanied by clear political goals and other factors to increase their clout.
Those are among the conclusions of a new NDI study, “Citizen Participation and Technology,” that examines the role digital technologies – such as social media, interactive websites and SMS systems – play in increasing citizen participation and fostering accountability in government. The study was driven by the recognition that better insights are needed into the relationship between new technologies, citizen participation programs and the outcomes they aim to achieve.
Using case studies from countries such as Burma, Mexico and Uganda, the study explores whether the use of technology in citizen participation programs amplifies citizen voices and increases government responsiveness and accountability, and whether the use of digital technology increases the political clout of citizens.
The research shows that while more people are using technology—such as social media for mobile organizing, and interactive websites and text messaging systems that enable direct communication between constituents and elected officials or crowdsourcing election day experiences— the type and quality of their political participation, and therefore its impact on democratization, varies. It also suggests that, in order to leverage technology’s potential, there is a need to focus on non-technological areas such as political organizing, leadership skills and political analysis.
For example, the “2% and More Women in Politics” coalition led by Mexico’s National Institute for Women (INMUJERES) used a social media campaign and an online petition to call successfully for reforms that would allocate two percent of political party funding for women’s leadership training. Technology helped the activists reach a wider audience, but women from the different political parties who made up the coalition might not have come together without NDI’s role as a neutral convener.
The study, which was conducted with support from the National Endowment for Democracy, provides an overview of NDI’s approach to citizen participation, and examines how the integration of technologies affects its programs in order to inform the work of NDI, other democracy assistance practitioners, donors, and civic groups.

Observations:

Key findings:

  1. Technology can be used to readily create spaces and opportunities for citizens to express their voices, but making these voices politically stronger and the spaces more meaningful is a harder challenge that is political and not technological in nature.
  2. Technology that was used to purposefully connect citizens’ groups and amplify their voices had more political impact.
  3. There is a scarcity of data on specific demographic groups’ use of, and barriers to technology for political participation. Programs seeking to close the digital divide as an instrument of narrowing the political divide should be informed by more research into barriers to access to both politics and technology.
  4. There is a blurring of the meaning between the technologies of open government data and the politics of open government that clouds program strategies and implementation.
  5. Attempts to simply crowdsource public inputs will not result in users self-organizing into politically influential groups, since citizens lack the opportunities to develop leadership, unity, and commitment around a shared vision necessary for meaningful collective action.
  6. Political will and the technical capacity to engage citizens in policy making, or providing accurate data on government performance are lacking in many emerging democracies. Technology may have changed institutions’ ability to respond to citizen demands but its mere presence has not fundamentally changed actual government responsiveness.”