OSTP Blog: “Strengthening our Nation’s resilience to disasters is a shared responsibility, with all community members contributing their unique skills and perspectives. Whether you’re a data steward who can unlock information and foster a culture of open data, an innovator who can help address disaster preparedness challenges, or a volunteer ready to join the “Innovation for Disasters” movement, we are excited for you to visit the new disasters.data.gov site, launching today.
First previewed at the White House Innovation for Disaster Response and Recovery Initiative Demo Day, disasters.data.gov is designed to be a public resource to foster collaboration and the continual improvement of disaster-related open data, free tools, and new ways to empower first responders, survivors, and government officials with the information needed in the wake of a disaster.
A screenshot from the new disasters.data.gov web portal.
Today, the Administration is unveiling the first in a series of Innovator Challenges that highlight pressing needs from the disaster preparedness community. The inaugural Innovator Challenge focuses on a need identified from firsthand experience of local emergency management, responders, survivors, and Federal departments and agencies. The challenge asks innovators across the nation: “How might we leverage real-time sensors, open data, social media, and other tools to help reduce the number of fatalities from flooding?”
In addition to this first Innovator Challenge, here are some highlights from disasters.data.gov:….(More)”
The Free 'Big Data' Sources Everyone Should Know
Bernard Marr at Linkedin Pulse: “…The moves by companies and governments to put large amounts of information into the public domain have made large volumes of data accessible to everyone….here’s my rundown of some of the best free big data sources available today.
Data.gov
The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime. To check it out, click here.
US Census Bureau
A wealth of information on the lives of US citizens covering population data, geographic data and education. To check it out, click here. To check it out, click here.
European Union Open Data Portal
As the above, but based on data from European Union institutions. To check it out, click here.
Data.gov.uk
Data from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950. To check it out, click here.
The CIA World Factbook
Information on history, population, economy, government, infrastructure and military of 267 countries. To check it out, click here.
Healthdata.gov
125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics. To check it out, click here.
NHS Health and Social Care Information Centre
Health data sets from the UK National Health Service. To check it out, click here.
Amazon Web Services public datasets
Huge resource of public data, including the 1000 Genome Project, an attempt to build the most comprehensive database of human genetic information and NASA’s database of satellite imagery of Earth. To check it out, click here.
Facebook Graph
Although much of the information on users’ Facebook profile is private, a lot isn’t – Facebook provide the Graph API as a way of querying the huge amount of information that its users are happy to share with the world (or can’t hide because they haven’t worked out how the privacy settings work). To check it out, click here.
Gapminder
Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world. To check it out, click here.
Google Trends
Statistics on search volume (as a proportion of total search) for any given term, since 2004. To check it out, click here.
Google Finance
40 years’ worth of stock market data, updated in real time. To check it out, click here.
Google Books Ngrams
Search and analyze the full text of any of the millions of books digitised as part of the Google Books project. To check it out, click here.
National Climatic Data Center
Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data. To check it out, click here.
DBPedia
Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. DBPedia is an ambitious project to catalogue and create a public, freely distributable database allowing anyone to analyze this data. To check it out, click here.
Topsy
Free, comprehensive social media data is hard to come by – after all their data is what generates profits for the big players (Facebook, Twitter etc) so they don’t want to give it away. However Topsy provides a searchable database of public tweets going back to 2006 as well as several tools to analyze the conversations. To check it out, click here.
Likebutton
Mines Facebook’s public data – globally and from your own network – to give an overview of what people “Like” at the moment. To check it out, click here.
New York Times
Searchable, indexed archive of news articles going back to 1851. To check it out, click here.
Freebase
A community-compiled database of structured data about people, places and things, with over 45 million entries. To check it out, click here.
Million Song Data Set
Metadata on over a million songs and pieces of music. Part of Amazon Web Services. To check it out, click here.”
See also Bernard Marr‘s blog at Big Data Guru
Lab Rats
Clare Dwyer Hogg at the Long and Short: “Do you remember how you were feeling between 11 and 18 January, 2012? If you’re a Facebook user, you can scroll back and have a look. Your status updates might show you feeling a little bit down, or cheery. All perfectly natural, maybe. But if you were one of 689,003 unwitting users selected for an experiment to determine whether emotions are contagious, then maybe not. The report on its findings was published in March this year: “Experimental evidence of massive-scale emotional contagion through social networks”. How did Facebook do it? Very subtly, by adjusting the algorithm of selected users’ news feeds. One half had a reduced chance of being exposed to positive updates, the other had a more upbeat newsfeed. Would users be more inclined to feel positive or negative themselves, depending on which group they were in? Yes. The authors of the report found – by extracting the posts of the people they were experimenting on – that, indeed, emotional states can be transferred to others, “leading people to experience the same emotions without their awareness”.
It was legal (see Facebook’s Data Use Policy). Ethical? The answer to that lies in the shadows. A one-off? Not likely. When revealed last summer, the Facebook example created headlines around the world – and another story quickly followed. On 28 July, Christian Rudder, a Harvard math graduate and one of the founders of the internet dating site OkCupid, wrote a blog post titled “We Experiment on Human Beings!”. In it, he outlined a number of experiments they performed on their users, one of which was to tell people who were “bad matches” (only 30 per cent compatible, according to their algorithm) that they were actually “exceptionally good for each other” (which usually requires a 90 per cent match). OkCupid wanted to see if mere suggestion would inspire people to like each other (answer: yes). It was a technological placebo. The experiment found that the power of suggestion works – but so does the bona fide OkCupid algorithm. Outraged debates ensued, with Rudder defensive. “This is the only way to find this stuff out,” he said, in one heated radio interview. “If you guys have an alternative to the scientific method, I’m all ears.”…
The debate, says Mark Earls, should primarily be about civic responsibility, even before the ethical concerns. Earls is a towering figure in the world of advertising and communication, and his book Herd: How to Change Mass Behaviour By Harnessing our True Nature, was a gamechanger in how people in the industry thought about what drives us to make decisions. That was a decade ago, before Facebook, and it’s increasingly clear that his theories were prescient.
He kept an eye on the Facebook experiment furore, and was, he says, heavily against the whole concept. “They’re supporting the private space between people, their contacts and their social media life,” he says. “And then they abused it.”…”
Transparency is not just television
Beth Noveck at the SunLight Foundation Blog: “In 2009, Larry Lessig published a headline-grabbing piece in the New Republic entitled “Against Transparency,” arguing that the “naked transparency movement” might inspire disgust in, rather than reform of, our political system. In their recent Brookings Institution paper, “Why Critics of Transparency Are Wrong,” Gary Bass, Danielle Brian and Norm Eisen rightly critique the latest generation of naysayers who contend that transparency has contributed to our political ills, and that efforts to reduce transparency will cause government to work better. The problem with suggesting that transparency is the root of extreme partisan gridlock, the absence of dealmaking, and the lowest rates of trust by the American people in Congress, however, is that there is no real transparency in that institution.
If there is any shortcoming in Bass, Brian and Eisen’s robust defense of transparency, it is that they should be even tougher in rapping these other writers across the knuckles, including for some critics’ unsophisticated equation of television cameras in the chamber with transparency.
Even in our media-savvy age, televising congressional deliberations (if you can call them that) – what we might call political transparency – surely contributes too little to policy transparency. It lays bare the spectacle but does not provide access to the kinds of information disclosures about the workings of Congress in a way that also affords people an opportunity to participate in changing those workings and that Bass, Brian and Eisen also support. Done right, transparency provides an empirical foundation for developing solutions together. If the Brits can have a 21st Century parliament initiative, we are long overdue for a 21st century Congress….”
Designing a Citizen Science and Crowdsourcing Toolkit for the Federal Government
2013 Second Open Government National Action Plan, President Obama called on Federal agencies to harness the ingenuity of the public by accelerating and scaling the use of open innovation methods, such as citizen science and crowdsourcing, to help address a wide range of scientific and societal problems.
Citizen science is a form of open collaboration in which members of the public participate in the scientific process, including identifying research questions, collecting and analyzing data, interpreting results, and solving problems. Crowdsourcing is a process in which individuals or organizations submit an open call for voluntary contributions from a large group of unknown individuals (“the crowd”) or, in some cases, a bounded group of trusted individuals or experts.
Citizen science and crowdsourcing are powerful tools that can help Federal agencies:
- Advance and accelerate scientific research through group discovery and co-creation of knowledge. For instance, engaging the public in data collection can provide information at resolutions that would be difficult for Federal agencies to obtain due to time, geographic, or resource constraints.
- Increase science literacy and provide students with skills needed to excel in science, technology, engineering, and math (STEM). Volunteers in citizen science or crowdsourcing projects gain hands-on experience doing real science, and take that learning outside of the classroom setting.
- Improve delivery of government services with significantly lower resource investments.
- Connect citizens to the missions of Federal agencies by promoting a spirit of open government and volunteerism.
To enable effective and appropriate use of these new approaches, the Open Government National Action Plan specifically commits the Federal government to “convene an interagency group to develop an Open Innovation Toolkit for Federal agencies that will include best practices, training, policies, and guidance on authorities related to open innovation, including approaches such as incentive prizes, crowdsourcing, and citizen science.”
On November 21, 2014, the Office of Science and Technology Policy (OSTP) kicked off development of the Toolkit with a human-centered design workshop. Human-centered design is a multi-stage process that requires product designers to engage with different stakeholders in creating, iteratively testing, and refining their product designs. The workshop was planned and executed in partnership with the Office of Personnel Management’s human-centered design practice known as “The Lab” and the Federal Community of Practice on Crowdsourcing and Citizen Science (FCPCCS), a growing network of more than 100 employees from more than 20 Federal agencies….
The Toolkit will help further the culture of innovation, learning, sharing, and doing in the Federal citizen science and crowdsourcing community: indeed, the development of the Toolkit is a collaborative and community-building activity in and of itself.
The following successful Federal projects illustrate the variety of possible citizen science and crowdsourcing applications:
- The Citizen Archivist Dashboard (NARA) coordinates crowdsourced archival record tagging and document transcription. Recently, more than 170,000 volunteers indexed 132 million names of the 1940 Census in only five months, which NARA could not have done alone.
- Through Measuring Broadband America (FCC), 2 million volunteers collected and provided the FCC with data on their Internet speeds, data that FCC used to create a National Broadband Map revealing digital divides.
- In 2014, Nature’s Notebook (USGS, NSF) volunteers recorded more than 1 million observations on plants and animals that scientists use to analyze environmental change.
- Did You Feel It? (USGS) has enabled more than 3 million people worldwide to share their experiences during and immediately after earthquakes. This information facilitates rapid damage assessments and scientific research, particularly in areas without dense sensor networks.
- The mPING (NOAA) mobile app has collected more than 600,000 ground-based observations that help verify weather models.
- USAID anonymized and opened its loan guarantee data to volunteer mappers. Volunteers mapped 10,000 data points in only 16 hours, compared to the 60 hours officials expected.
- The Air Sensor Toolbox (EPA), together with training workshops, scientific partners, technology evaluations, and a scientific instrumentation loan program, empowers communities to monitor and report local air pollution.
In early 2015, OSTP, in partnership with the Challenges and Prizes Community of Practice, will convene Federal practitioners to develop the other half of the Open Innovation Toolkit for prizes and challenges. Stay tuned!”
Learn from the losers
Tim Harford in the Financial Times:”Kickended is important. It reminds us that the world is biased in systematic ways…I’m sure I’m not the only person to ponder launching an exciting project on Kickstarter before settling back to count the money. Dean Augustin may have had the same idea back in 2011; he sought $12,000 to produce a documentary about John F Kennedy. Jonathan Reiter’s “BizzFit” looked to raise $35,000 to create an algorithmic matching service for employers and employees. This October, two brothers in Syracuse, New York, launched a Kickstarter campaign in the hope of being paid $400 to film themselves terrifying their neighbours at Halloween. These disparate campaigns have one thing in common: they received not a single penny of support. Not one of these people was able to persuade friends, colleagues or even their parents to kick in so much as a cent.
My inspiration for these tales of Kickstarter failure is Silvio Lorusso, an artist and designer based in Venice. Lorusso’s website, Kickended, searches Kickstarter for all the projects that have received absolutely no funding. (There are plenty: about 10 per cent of Kickstarter projects go nowhere at all, and only 40 per cent raise enough money to hit their funding targets.)
Kickended performs an important service. It reminds us that what we see around us is not representative of the world; it is biased in systematic ways. Normally, when we talk of bias we think of a conscious ideological slant. But many biases are simple and unconscious. I have never read a media report or blog post about a typical, representative Kickstarter campaign – but I heard a lot about the Pebble watch, the Coolest cooler and potato salad. If I didn’t know better, I might form unrealistic expectations about what running a Kickstarter campaign might achieve.
This isn’t just about Kickstarter. Such bias is everywhere. Most of the books people read are bestsellers – but most books are not bestsellers. And most book projects do not become books at all. There’s a similar story to tell about music, films and business ventures in general.
. . .
In 1943, the American statistician Abraham Wald was asked to advise the US air force on how to reinforce their planes. Only a limited weight of armour plating was feasible, and the proposal on the table was to reinforce the wings, the centre of the fuselage, and the tail. Why? Because bombers were returning from missions riddled with bullet holes in those areas.
Wald explained that this would be a mistake. What the air force had discovered was that when planes were hit in the wings, tail or central fuselage, they made it home. Where, asked Wald, were the planes that had been hit in other areas? They never returned. Wald suggested reinforcing the planes wherever the surviving planes had been unscathed instead.
It’s natural to look at life’s winners – often they become winners in the first place because they’re interesting to look at. That’s why Kickended gives us an important lesson. If we don’t look at life’s losers too, we may end up putting our time, money, attention or even armour plating in entirely the wrong place.”
White House: Help Shape Public Participation
the White House Blog: “Public participation — where citizens help shape and implement government programs — is a foundation of open, transparent, and engaging government services. From emergency management and regulatory development to science and education, better and more meaningful engagement with those who use public services can measurably improve government for everyone.
A team across the government is now working side-by-side with civil society organizations to deliver the first U.S. Public Participation Playbook, dedicated to providing best practices for how agencies can better design public participation programs, and suggested performance metrics for evaluating their effectiveness.
Developing a U.S. Public Participation Playbook has been an open government priority, and was included in both the first and second U.S. Open Government National Action Plans as part of the United States effort to increase public integrity in government programs. This resource reflects the commitment of the government and civic partners to measurably improve participation programs, and is designed using the same inclusive principles that it champions.
More than 30 Federal leaders from across diverse missions in public service have collaborated on draft best practices, or “plays,” lead by the General Services Administration’s inter-agency SocialGov Community. The playbook is not limited to digital participation, and is designed to address needs from the full spectrum of public participation programs.
The plays are structured to provide best practices, tangible examples, and suggested performance metrics for government activities that already exist or are under development. Some categories included in the plays include encouraging community development and outreach, empowering participants through public/private partnerships, using data to drive decisions, and designing for inclusiveness and accessibility.
In developing this new resource, the team has been reaching out to more than a dozen civil society organizations and stakeholders, asking them to contribute as the Playbook is created. The team would like your input as well! Over the next month, contribute your ideas to the playbook using Madison, an easy-to-use, open source platform that allows for accountable review of each contribution.
Through this process, the team will work together to ensure that the Playbook reflects the best ideas and examples for agencies to use in developing and implementing their programs with public participation in mind. This resource will be a living document, and stakeholders from inside or outside of government should continually offer new insights — whether new plays, the latest case studies, or the most current performance metrics — to the playbook.
We look forward to seeing the public participate in the creation and evolution of the Public Participation Playbook!”
New Tool in Fighting Corruption: Open Data
Martin Tisne at Omidyar Network: “Yesterday in Brisbane, the G20 threw its weight behind open data by featuring it prominently in the G20 Anti-Corruption working action plan. Specifically, the action plan calls for effort in three related areas:
Open data describes information that is not simply public, but that has been published in a manner that makes it easy to access and easy to compare and connect with other information.
This matters for anti corruption – if you are a journalist or a civil society activist investigating bribery and corruption those connections are everything. They tell you that an anonymous person (e.g. ‘Mr Smith’) who owns an obscure company registered in a tax haven is linked to a another company that has been illegally exporting timber from a neighboring country. That the said Mr. Smith is also the son-in-law of the mining minister of yet another country, who herself has been accused of embezzling mining revenues. As we have written elsewhere on this blog, investigative journalists, prosecution authorities, and civil society groups all need access to this linked data for their work.
The action plan also links open data to the wider G20 agenda, citing its impact on the ability of businesses to make better investment decisions. You can find the full detail here….”
Giving Americans Easier Access to Their Own Data
the White House Blog: “…One of the newest My Data efforts is the IRS tool, Get Transcript. Launched in 2014, Get Transcript allows taxpayers to securely view, print, and download a PDF record of the last three years of their IRS tax account. Get Transcript has produced over 17 million so-called tax transcripts, reducing phone, mail, or in-person requests by approximately 40% from last year. Secure access to your own tax data makes it easier to demonstrate your income with prospective lenders and employers, or help with tax preparation. What was a paper-based transcript process which took multiple days has been made instantaneous and easy for the American taxpayer.
The IRS is an agency that serves virtually every American, and runs one of the nation’s largest customer service operations. To give an idea of the size and scope of responsibilities, the Internal Revenue Service:
- receives over 80 million phone calls per year, mostly from people eager to hear the status of their refund, understand a notice, make a payment, or update their account;
- sends out nearly 200 million paper notices annually; and
- receives over 50 million unique visitors to its website each month during filing season.
Meeting this demand from citizens is a challenge with limited staff and resources. Nonetheless, the IRS is committed to improving service to citizens across all of its channels – whether it’s by phone, walk-ins, or especially its digital services.
Building on the initial success of Get Transcript, there are more exciting improvements to IRS services in the pipeline. For instance, millions of taxpayers contact the IRS every year to ask about their tax status, whether their filing was received, if their refund was processed, or if their payment posted. In the future, taxpayers will be able to answer these types of questions independently by signing in to a mobile-friendly, personalized online account to conduct transactions and see all of their tax information in one place. Users will be able to view account history and balance, make payments or see payment status, or even authorize their tax preparer to view or make changes to their tax return. This will also include the ability to download personal tax information in an easy to use and machine-readable format so that taxpayers can share with trusted recipients if desired….”
OpenUp Corporate Data while Protecting Privacy
Much of the data generated by these devices is today controlled by corporations. These companies are in effect “owners” of terabytes of data and metadata. Companies use this data to aggregate, analyze, and track individual preferences, provide more targeted consumer experiences, and add value to the corporate bottom line.
At the same time, even as we witness a rapid “datafication” of the global economy, access to data is emerging as an increasingly critical issue, essential to addressing many of our most important social, economic, and political challenges. While the rise of the Open Data movement has opened up over a million datasets around the world, much of this openness is limited to government (and, to a lesser extent, scientific) data. Access to corporate data remains extremely limited. This is a lost opportunity. If corporate data—in the form of Web clicks, tweets, online purchases, sensor data, call data records, etc.—were made available in a de-identified and aggregated manner, researchers, public interest organizations, and third parties would gain greater insights on patterns and trends that could help inform better policies and lead to greater public good (including combatting Ebola).
Corporate data sharing holds tremendous promise. But its potential—and limitations—are also poorly understood. In what follows, we share early findings of our efforts to map this emerging open data frontier, along with a set of reflections on how to safeguard privacy and other citizen and consumer rights while sharing. Understanding the practice of shared corporate data—and assessing the associated risks—is an essential step in increasing access to socially valuable data held by businesses today. This is a challenge certainly worth exploring during the forthcoming OpenUp conference!
Understanding and classifying current corporate data sharing practices
Corporate data sharing remains very much a fledgling field. There has been little rigorous analysis of different ways or impacts of sharing. Nonetheless, our initial mapping of the landscape suggests there have been six main categories of activity—i.e., ways of sharing—to date:…
Assessing risks of corporate data sharing
Although the shared corporate data offers several benefits for researchers, public interest organizations, and other companies, there do exist risks, especially regarding personally identifiable information (PII). When aggregated, PII can serve to help understand trends and broad demographic patterns. But if PII is inadequately scrubbed and aggregated data is linked to specific individuals, this can lead to identity theft, discrimination, profiling, and other violations of individual freedom. It can also lead to significant legal ramifications for corporate data providers….”