The Free 'Big Data' Sources Everyone Should Know


Bernard Marr at Linkedin Pulse: “…The moves by companies and governments to put large amounts of information into the public domain have made large volumes of data accessible to everyone….here’s my rundown of some of the best free big data sources available today.

Data.gov

The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime. To check it out, click here.

US Census Bureau

A wealth of information on the lives of US citizens covering population data, geographic data and education. To check it out, click here. To check it out, click here.

European Union Open Data Portal

As the above, but based on data from European Union institutions. To check it out, click here.

Data.gov.uk

Data from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950. To check it out, click here.

The CIA World Factbook

Information on history, population, economy, government, infrastructure and military of 267 countries. To check it out, click here.

Healthdata.gov

125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics. To check it out, click here.

NHS Health and Social Care Information Centre

Health data sets from the UK National Health Service. To check it out, click here.

Amazon Web Services public datasets

Huge resource of public data, including the 1000 Genome Project, an attempt to build the most comprehensive database of human genetic information and NASA’s database of satellite imagery of Earth. To check it out, click here.

Facebook Graph

Although much of the information on users’ Facebook profile is private, a lot isn’t – Facebook provide the Graph API as a way of querying the huge amount of information that its users are happy to share with the world (or can’t hide because they haven’t worked out how the privacy settings work). To check it out, click here.

Gapminder

Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world. To check it out, click here.

Google Trends

Statistics on search volume (as a proportion of total search) for any given term, since 2004. To check it out, click here.

Google Finance

40 years’ worth of stock market data, updated in real time. To check it out, click here.

Google Books Ngrams

Search and analyze the full text of any of the millions of books digitised as part of the Google Books project. To check it out, click here.

National Climatic Data Center

Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data. To check it out, click here.

DBPedia

Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. DBPedia is an ambitious project to catalogue and create a public, freely distributable database allowing anyone to analyze this data. To check it out, click here.

Topsy

Free, comprehensive social media data is hard to come by – after all their data is what generates profits for the big players (Facebook, Twitter etc) so they don’t want to give it away. However Topsy provides a searchable database of public tweets going back to 2006 as well as several tools to analyze the conversations. To check it out, click here.

Likebutton

Mines Facebook’s public data – globally and from your own network – to give an overview of what people “Like” at the moment. To check it out, click here.

New York Times

Searchable, indexed archive of news articles going back to 1851. To check it out, click here.

Freebase

A community-compiled database of structured data about people, places and things, with over 45 million entries. To check it out, click here.

Million Song Data Set

Metadata on over a million songs and pieces of music. Part of Amazon Web Services. To check it out, click here.”
See also Bernard Marr‘s blog at Big Data Guru

Lab Rats


Clare Dwyer Hogg at the Long and Short:  “Do you remember how you were feeling between 11 and 18 January, 2012? If you’re a Facebook user, you can scroll back and have a look. Your status updates might show you feeling a little bit down, or cheery. All perfectly natural, maybe. But if you were one of 689,003 unwitting users selected for an experiment to determine whether emotions are contagious, then maybe not. The report on its findings was published in March this year: “Experimental evidence of massive-scale emotional contagion through social networks”. How did Facebook do it? Very subtly, by adjusting the algorithm of selected users’ news feeds. One half had a reduced chance of being exposed to positive updates, the other had a more upbeat newsfeed. Would users be more inclined to feel positive or negative themselves, depending on which group they were in? Yes. The authors of the report found – by extracting the posts of the people they were experimenting on – that, indeed, emotional states can be transferred to others, “leading people to experience the same emotions without their awareness”.

It was legal (see Facebook’s Data Use Policy). Ethical? The answer to that lies in the shadows. A one-off? Not likely. When revealed last summer, the Facebook example created headlines around the world – and another story quickly followed. On 28 July, Christian Rudder, a Harvard math graduate and one of the founders of the internet dating site OkCupid, wrote a blog post titled “We Experiment on Human Beings!”. In it, he outlined a number of experiments they performed on their users, one of which was to tell people who were “bad matches” (only 30 per cent compatible, according to their algorithm) that they were actually “exceptionally good for each other” (which usually requires a 90 per cent match). OkCupid wanted to see if mere suggestion would inspire people to like each other (answer: yes). It was a technological placebo. The experiment found that the power of suggestion works – but so does the bona fide OkCupid algorithm. Outraged debates ensued, with Rudder defensive. “This is the only way to find this stuff out,” he said, in one heated radio interview. “If you guys have an alternative to the scientific method, I’m all ears.”…

The debate, says Mark Earls, should primarily be about civic responsibility, even before the ethical concerns. Earls is a towering figure in the world of advertising and communication, and his book Herd: How to Change Mass Behaviour By Harnessing our True Nature, was a gamechanger in how people in the industry thought about what drives us to make decisions. That was a decade ago, before Facebook, and it’s increasingly clear that his theories were prescient.

He kept an eye on the Facebook experiment furore, and was, he says, heavily against the whole concept. “They’re supporting the private space between people, their contacts and their social media life,” he says. “And then they abused it.”…”

4 Tech Trends Changing How Cities Operate


at Governing: “Louis Brandeis famously characterized states as laboratories for democracy, but cities could be called labs for innovation or new practices….When Government Technology magazine (produced by Governing’s parent company, e.Republic, Inc.) published its annual Digital Cities Survey, the results provided an interesting look at how local governments are using technology to improve how they deliver services, increase production and streamline operations…the survey also showed four technology trends changing how local government operates and serves its citizens:

1. Open Data

…Big cities were the first to open up their data and gained national attention for their transparency. New York City, which passed an open data law in 2012, leads all cities with more than 1,300 data sets open to the public; Chicago started opening up data to the public in 2010 following an executive order and is second among cities with more than 600; and San Francisco, which was the first major city to open the doors to transparency in 2009, had the highest score from the U.S. Open Data Census for the quality of its open data.
But the survey shows that a growing number of mid-sized jurisdictions are now getting involved, too. Tacoma, Wash., has a portal with 40 data sets that show how the city is spending tax dollars on public works, economic development, transportation and public safety. Ann Arbor, Mich., has a financial transparency tool that reveals what the city is spending on a daily basis, in some cases….

2. ‘Stat’ Programs and Data Analytics

…First, the so-called “stat” programs are proliferating. Started by the New York Police Department in the 1980s, CompStat was a management technique that merged data with staff feedback to drive better performance by police officers and precinct captains. Its success led to many imitations over the years and, as the digital survey shows, stat programs continue to grow in importance. For example, Louisville has used its “LouieStat” program to cut the city’s bill for unscheduled employee overtime by $23 million as well as to spot weaknesses in performance.
Second, cities are increasing their use of data analytics to measure and improve performance. Denver, Jacksonville, Fla., and Phoenix have launched programs that sift through data sets to find patterns that can lead to better governance decisions. Los Angeles has combined transparency with analytics to create an online system that tracks performance for the city’s economy, service delivery, public safety and government operations that the public can view. Robert J. O’Neill Jr., executive director of the International City/County Management Association, said that both of these tech-driven performance trends “enable real-time decision-making.” He argued that public leaders who grasp the significance of these new tools can deliver government services that today’s constituents expect.

3. Online Citizen Engagement

…Avondale, Ariz., population 78,822, is engaging citizens with a mobile app and an online forum that solicits ideas that other residents can vote up or down.
In Westminster, Colo., population 110,945, a similar forum allows citizens to vote online about community ideas and gives rewards to users who engage with the online forum on a regular basis (free passes to a local driving range or fitness program). Cities are promoting more engagement activities to combat a decline in public trust in government. The days when a public meeting could provide citizen engagement aren’t enough in today’s technology-dominated  world. That’s why social media tools, online surveys and even e-commerce rewards programs are popping up in cities around the country to create high-value interaction with its citizens.

4. Geographic Information Systems

… Cities now use them to analyze financial decisions to increase performance, support public safety, improve public transit, run social service activities and, increasingly, engage citizens about their city’s governance.
Augusta, Ga., won an award for its well-designed and easy-to-use transit maps. Sugar Land, Texas, uses GIS to support economic development and, as part of its citizen engagement efforts, to highlight its capital improvement projects. GIS is now used citywide by 92 percent of the survey respondents. That’s significant because GIS has long been considered a specialized (and expensive) technology primarily for city planning and environmental projects….”

The Global Open Data Index 2014


Open Knowledge Foundation: “The Global Open Data Index ranks countries based on the availability and accessibility of information in ten key areas, including government spending, election results, transport timetables, and pollution levels.
The UK tops the 2014 Index retaining its pole position with an overall score of 96%, closely followed by Denmark and then France at number 3 up from 12th last year. Finland comes in 4th while Australia and New Zealand share the 5th place. Impressive results were seen from India at #10 (up from #27) and Latin American countries like Colombia and Uruguay who came in joint 12th .
Sierra Leone, Mali, Haiti and Guinea rank lowest of the countries assessed, but there are many countries where the governments are less open but that were not assessed because of lack of openness or a sufficiently engaged civil society.
Overall, whilst there is meaningful improvement in the number of open datasets (from 87 to 105), the percentage of open datasets across all the surveyed countries remained low at only 11%.
Even amongst the leaders on open government data there is still room for improvement: the US and Germany, for example, do not provide a consolidated, open register of corporations. There was also a disappointing degree of openness around the details of government spending with most countries either failing to provide information at all or limiting the information available – only two countries out of 97 (the UK and Greece) got full marks here. This is noteworthy as in a period of sluggish growth and continuing austerity in many countries, giving citizens and businesses free and open access to this sort of data would seem to be an effective means of saving money and improving government efficiency.
Explore the Global Open Data Index 2014 for yourself!”

Introducing Hatch: Tell Stories With Purpose


Jay Geneske at the Rockefeller Foundation: “Stories with purpose don’t just materialize—they’re strategically planned, they’re creatively crafted, and designed to achieve measurable outcomes.
Using the landscape report Digital Storytelling for Social Impact as our guide, we’ve rolled up our sleeves with our lead grantee, Hattaway Communications, and dozens of experts and leaders to come up with a tool that we think will be game-changing for the social impact sector.
We’ve named it Hatch.
Hatch is a concierge that connects you to a suite of tools and a growing community of storytellers to help you leverage your stories to drive social impact.
Each of Hatch’s five sections are designed to help you craft, curate and share impactful stories. As you build your storytelling profile, Hatch will suggest tools, case studies and resources customized to your needs. These recommendations will always be saved to your profile so you can access them later.

Here’s just a sampling of what you’ll find:

How to Make the Case to Invest in Story
Taming the Measurement Monster
Your CEO as Master Storyteller
The 40/60 Content Rule: Less Time Writing, More Time Sharing
What Makes a Story Great
Case studies from UNICEF, The Gates Foundation, charity: water, and Greenpeace.
Tips like Nonprofit Photography Ethics, Recruiting Volunteers on LinkedIn, and Using Tumblr to Collect and Share Stories.
Guides for use and measuring impact of platforms like Facebook, Medium, Twitter, and Instagram….”

How to Fingerprint a City


Frank Jacobs at BigThink: “Thanks to Big Data, a new “Science of Cities” is emerging. Urban processes that until now could only be perceived subjectively can finally be quantified. Point in case: two French scientists have developed a mathematical formula to ‘fingerprint’ cities.
Take a good, close look at your fingertips. The pattern of grooves and ridges on your skin there [1] is yours alone. Equally unique is the warp and weft of urban road networks. No two cities’ street grids are exactly alike. Some are famously distinct. The forensic urbanist in all of us can probably recognise a blind map of New York, London and a few other global metropolises.
Rémi Louf and Marc Barthelemy examined the street patterns of 131 cities around the world. Not to learn them by heart and impress their fellow scientists at the Institut de Physique Théorique near Paris – although that would be a neat parlor trick. They wanted to see if it would be possible to classify them into distinct types. The title of their paper, A Typology of Street Patterns, is a bit of a giveaway: the answer is Yes.
Before we get to the How, let’s hear them explain the Why:

“[Street and road] networks can be thought as a simplified schematic view of cities, which captures a large part of their structure and organization and contain a large amount of information about underlying and universal mechanisms at play in their formation and evolution. Extracting common patterns between cities is a way towards the identification of these underlying mechanisms. At stake is the question of the processes behind the so-called ‘organic’ patterns – which grow in response to local constraints – and whether they are preferable to the planned patterns which are designed under large scale constraints”.

There have been attempts before to classify urban networks, but the results have always been colored by the subjectivity of what Louf and Barthelemy call the ‘Space Syntax Community’. That’s all changed now: Big Data – in this case, the mass digitization of street maps – makes it possible to extract common patterns from street grids in an objective manner, as dispassionately as the study of tree leaves according to their venation. …
Read their entire paper here.

How do we improve open data for police accountability?


at the SunLight Foundation: “This is a challenging time for people who worry about the fairness of American governmental institutions. In quick succession, grand juries declined to indict two police officers accused of killing black men. In the case of Ferguson, Mo. officer Darren Wilson’s killing of Michael Brown, the grand jury’s decision appeared to center on uncertainty about whether Wilson’s action was legal and whether he killed under threat. In the case of New York City police officer Daniel Pantaleo’s killing of Eric Garner, however, a bystander recorded and made public a video of the police officer causing Garner’s death through an illegal chokehold. In Pantaleo’s case, the availability of video data has made the question about institutional fairness even more urgent, as people can see for themselves the context in which the officer exercised power. The data has given us a common set of facts to use in judging police behavior.
We grant law enforcement and corrections departments the right to exercise more physical power over the public than we do to any other part of our government. But do we generally have the data we need to evaluate how they’re using it?….
The time to find good solutions to these problems is now. Responding to widespread frustration, President Obama has just announced a three-part initiative to “strengthen community policing”: an increased focus on transparency and oversight for federal-to-local transfers of military equipment, a proposal to provide matching funding to local police departments to buy body cameras, and a “Task Force on 21st Century Policing” that will make recommendations for how to implement community-oriented policing practices.
While each element of Obama’s initiative corresponds to a distinct set of concerns about policing, one element they share in common is the need to increase access to information about police work. Each of the three approaches will rely on mechanisms to increase the flow of public information about what police officers are doing in their official roles and how they are doing it. How are police officers going about fulfilling their responsibility to ensure public safety? Are they working in ways that appropriately respect individual rights? Are they responsive to public concerns, when concerns are raised?
By encouraging the collection and publication of more data about how government is working, Obama’s initiative has the potential to support precisely the kind of increase in data availability that can transform public outcomes. When applied with the intent to improve transparency and accountability and to increase public engagement, open data — and the civic tech that uses this data — can bridge the often too-large gap between the public and government.
However, because Obama’s initiatives depend on the effective collection, publication, and communication of information, open data advocates have a particular contribution to make. It’s important to think about what lessons we can apply from our experiences with open data — and with data collected and used for police accountability — in order to ensure that this initiative has the greatest possible impact. As an open data and open government community, can we make recommendations that can help improve the data we’re collecting for police transparency and accountability?
I’m going to begin a list, but it’s just a beginning – I am certain that you have many more recommendations to make. I’ll categorize them first by Obama’s “Strengthening Community Policing” initiatives and then keep thinking about what additional data is needed. Please think along with me about what kind of datasets we will need, what potential issues with data availability and quality we’re likely to see, what kind of laws may need to be changed to improve access to the data necessary for police accountability, then make your recommendations in the Google Doc embedded at the end of this post. If you’ve seen any great projects you’ve seen which improve police transparency and accountability, be sure to share those as well….”

Five public participation books from 2014 you should take the time to read


at Bang The Table: “Every year dozens of books are published on the subject of community engagement, civic engagement, public engagement or public participation (depending on your fancy). None of us has time to read them all, so how to choose.
I’ve compiled a short and eclectic list here that span the breadth of issues that public participation practitioners and thier public sector managers are likely to be thinking about; legal, organisational culture, bringing joy back into citizen engagement, thoughtful living and thoughtful engagement, and DIY citizenship (and what that means for the public sector).
Blocking Public Participation: The use of strategic litigation to silence political expression
written by Byron M Sheldrick, published by Wilfred Laurier University Press
The blurb…

Strategic litigation against public participation (SLAPP) involves lawsuits brought by individuals, corporations, groups, or politicians to curtail political activism and expression. An increasingly large part of the political landscape in Canada, they are often launched against those protesting, boycotting, or participating in some form of political activism. A common feature of SLAPPs is that their intention is rarely to win the case or secure a remedy; rather, the suit is brought to create a chill on political expression….
Making Policy Public: Participatory Bureaucracy in American Democracy
written by Susan L. Moffit, published by Cambridge University Press
The blurb…
This book challenges the conventional wisdom that government bureaucrats inevitably seek secrecy and demonstrates how and when participatory bureaucracy manages the enduring tension between bureaucratic administration and democratic accountability….
Making Democracy Fun: How Game Design Can Empower Citizens and Transform Politics
written by Josh A. Lerner, published by MIT
The blurb…

Anyone who has ever been to a public hearing or community meeting would agree that participatory democracy can be boring. Hours of repetitive presentations, alternatingly alarmist or complacent, for or against, accompanied by constant heckling, often with no clear outcome or decision….

What Would Socrates Do?: Self-Examination, Civic Engagement, and the Politics of Philosophy

written by Joel Alden Schlosser, published by Cambridge University Press
The blurb…
Socrates continues to be an extremely influential force to this day; his work is featured prominently in the work of contemporary thinkers ranging from Hannah Arendt and Leo Strauss, to Michel Foucault and Jacques Rancière….
DIY Citizenship: Critical Making and Social Media
edited by Matt Ratto & Megan Boler, published by MIT
The blurb…
Today, DIY — do-it-yourself — describes more than self-taught carpentry. Social media enables DIY citizens to organize and protest in new ways (as in Egypt’s “Twitter revolution” of 2011) and to re-purpose corporate content (or create new user-generated content) in order to offer political counter-narratives….”

Pricey privacy: Framing the economy of information in the digital age


Paper by Federica Fornaciari in FirstMonday: “As new information technologies become ubiquitous, individuals are often prompted rethinking disclosure. Available media narratives may influence one’s understanding of the benefits and costs related to sharing personal information. This study, guided by frame theory, undertakes a Critical Discourse Analysis (CDA) of media discourse developed to discuss the privacy concerns related to the corporate collection and trade of personal information. The aim is to investigate the frames — the central organizing ideas — used in the media to discuss such an important aspect of the economics of personal data. The CDA explored 130 articles published in the New York Times between 2000 and 2012. Findings reveal that the articles utilized four frames: confusion and lack of transparency, justification and private interests, law and self-regulation, and commodification of information. Articles used episodic framing often discussing specific instances of infringements rather than broader thematic accounts. Media coverage tended to frame personal information as a commodity that may be traded, rather than as a fundamental value.”

Macon Money: A serious game for civic engagement


Wilson Center Commons Lab: “In 2011, residents of Macon, Georgia received over $65,000 in free local currency—with a catch.
This money was locked in bonds redeemable for an unknown value between $10 and $100. Prior to circulation, each bond was cut in half. Residents of Macon wishing to “cash” their bonds were required to first find the missing half, held by an unknown community member.
These were the rules for Macon Money, a real-world game created by Area/Code Inc. in collaboration with several community partners. Benjamin Stokes was brought on board by the Knight Foundation as an advisor and researcher for the game. Stokes describes real-world games as activities where “playing the game is congruent with making impact in the world; making progress in the game, also does something in the real world.”  Macon Money was designed to foster civic engagement through a number of means.
First, the two halves of each bond were intentionally distributed in neighborhoods on opposite ends of Macon, or in neighborhoods characterized by different socio-economic status. This “game mechanic” forced residents who would not normally interact to collaborate towards a common goal.  Bond holders found each other through a designated website, social media platforms including Facebook and Twitter, and even serendipitous face-to-face interaction.
Bonds were redeemable for Macon Money, a currency that could only be spent at local businesses (which were reimbursed with U.S. currency).  This ensured continuing engagement with the Macon community, and in some cases continuing engagement between players.  Macon Money was also designed to foster community identity through the visual design of the currency itself.  Macon dollars depicted symbols of communal value, such a picture of Otis Redding, a native of the town.
While the game Macon Money is over, researchers continue to analyze the how the game helped foster civic engagement within a local community. Most recently, Stokes described these impacts during a talk at American University co-sponsored by The American University Game Lab, the Series Games Initiative at the Woodrow Wilson International Center for Scholars, the AU Library, and the American University Center for Media and Social Impact. A video for the talk was recently posted here:…”