Lab Rats


Clare Dwyer Hogg at the Long and Short:  “Do you remember how you were feeling between 11 and 18 January, 2012? If you’re a Facebook user, you can scroll back and have a look. Your status updates might show you feeling a little bit down, or cheery. All perfectly natural, maybe. But if you were one of 689,003 unwitting users selected for an experiment to determine whether emotions are contagious, then maybe not. The report on its findings was published in March this year: “Experimental evidence of massive-scale emotional contagion through social networks”. How did Facebook do it? Very subtly, by adjusting the algorithm of selected users’ news feeds. One half had a reduced chance of being exposed to positive updates, the other had a more upbeat newsfeed. Would users be more inclined to feel positive or negative themselves, depending on which group they were in? Yes. The authors of the report found – by extracting the posts of the people they were experimenting on – that, indeed, emotional states can be transferred to others, “leading people to experience the same emotions without their awareness”.

It was legal (see Facebook’s Data Use Policy). Ethical? The answer to that lies in the shadows. A one-off? Not likely. When revealed last summer, the Facebook example created headlines around the world – and another story quickly followed. On 28 July, Christian Rudder, a Harvard math graduate and one of the founders of the internet dating site OkCupid, wrote a blog post titled “We Experiment on Human Beings!”. In it, he outlined a number of experiments they performed on their users, one of which was to tell people who were “bad matches” (only 30 per cent compatible, according to their algorithm) that they were actually “exceptionally good for each other” (which usually requires a 90 per cent match). OkCupid wanted to see if mere suggestion would inspire people to like each other (answer: yes). It was a technological placebo. The experiment found that the power of suggestion works – but so does the bona fide OkCupid algorithm. Outraged debates ensued, with Rudder defensive. “This is the only way to find this stuff out,” he said, in one heated radio interview. “If you guys have an alternative to the scientific method, I’m all ears.”…

The debate, says Mark Earls, should primarily be about civic responsibility, even before the ethical concerns. Earls is a towering figure in the world of advertising and communication, and his book Herd: How to Change Mass Behaviour By Harnessing our True Nature, was a gamechanger in how people in the industry thought about what drives us to make decisions. That was a decade ago, before Facebook, and it’s increasingly clear that his theories were prescient.

He kept an eye on the Facebook experiment furore, and was, he says, heavily against the whole concept. “They’re supporting the private space between people, their contacts and their social media life,” he says. “And then they abused it.”…”

Transparency is not just television


Beth Noveck at the SunLight Foundation Blog: “In 2009, Larry Lessig published a headline-grabbing piece in the New Republic entitled “Against Transparency,” arguing that the “naked transparency movement” might inspire disgust in, rather than reform of, our political system. In their recent Brookings Institution paper, “Why Critics of Transparency Are Wrong,” Gary Bass, Danielle Brian and Norm Eisen rightly critique the latest generation of naysayers who contend that transparency has contributed to our political ills, and that efforts to reduce transparency will cause government to work better. The problem with suggesting that transparency is the root of extreme partisan gridlock, the absence of dealmaking, and the lowest rates of trust by the American people in Congress, however, is that there is no real transparency in that institution.
If there is any shortcoming in Bass, Brian and Eisen’s robust defense of transparency, it is that they should be even tougher in rapping these other writers across the knuckles, including for some critics’ unsophisticated equation of television cameras in the chamber with transparency.
Even in our media-savvy age, televising congressional deliberations (if you can call them that) – what we might call political transparency – surely contributes too little to policy transparency. It lays bare the spectacle but does not provide access to the kinds of information disclosures about the workings of Congress in a way that also affords people an opportunity to participate in changing those workings and that Bass, Brian and Eisen also support. Done right, transparency provides an empirical foundation for developing solutions together. If the Brits can have a 21st Century parliament initiative, we are long overdue for a 21st century Congress….”

Designing a Citizen Science and Crowdsourcing Toolkit for the Federal Government


Jenn Gustetic, Lea Shanley, Jay Benforado, and Arianne Miller at the White House Blog: “In the 2013 Second Open Government National Action Plan, President Obama called on Federal agencies to harness the ingenuity of the public by accelerating and scaling the use of open innovation methods, such as citizen science and crowdsourcing, to help address a wide range of scientific and societal problems.
Citizen science is a form of open collaboration in which members of the public participate in the scientific process, including identifying research questions, collecting and analyzing data, interpreting results, and solving problems. Crowdsourcing is a process in which individuals or organizations submit an open call for voluntary contributions from a large group of unknown individuals (“the crowd”) or, in some cases, a bounded group of trusted individuals or experts.
Citizen science and crowdsourcing are powerful tools that can help Federal agencies:

  • Advance and accelerate scientific research through group discovery and co-creation of knowledge. For instance, engaging the public in data collection can provide information at resolutions that would be difficult for Federal agencies to obtain due to time, geographic, or resource constraints.
  • Increase science literacy and provide students with skills needed to excel in science, technology, engineering, and math (STEM). Volunteers in citizen science or crowdsourcing projects gain hands-on experience doing real science, and take that learning outside of the classroom setting.
  • Improve delivery of government services with significantly lower resource investments.
  • Connect citizens to the missions of Federal agencies by promoting a spirit of open government and volunteerism.

To enable effective and appropriate use of these new approaches, the Open Government National Action Plan specifically commits the Federal government to “convene an interagency group to develop an Open Innovation Toolkit for Federal agencies that will include best practices, training, policies, and guidance on authorities related to open innovation, including approaches such as incentive prizes, crowdsourcing, and citizen science.”
On November 21, 2014, the Office of Science and Technology Policy (OSTP) kicked off development of the Toolkit with a human-centered design workshop. Human-centered design is a multi-stage process that requires product designers to engage with different stakeholders in creating, iteratively testing, and refining their product designs. The workshop was planned and executed in partnership with the Office of Personnel Management’s human-centered design practice known as “The Lab” and the Federal Community of Practice on Crowdsourcing and Citizen Science (FCPCCS), a growing network of more than 100 employees from more than 20 Federal agencies….
The Toolkit will help further the culture of innovation, learning, sharing, and doing in the Federal citizen science and crowdsourcing community: indeed, the development of the Toolkit is a collaborative and community-building activity in and of itself.
The following successful Federal projects illustrate the variety of possible citizen science and crowdsourcing applications:

  • The Citizen Archivist Dashboard (NARA) coordinates crowdsourced archival record tagging and document transcription. Recently, more than 170,000 volunteers indexed 132 million names of the 1940 Census in only five months, which NARA could not have done alone.
  • Through Measuring Broadband America (FCC), 2 million volunteers collected and provided the FCC with data on their Internet speeds, data that FCC used to create a National Broadband Map revealing digital divides.
  • In 2014, Nature’s Notebook (USGS, NSF) volunteers recorded more than 1 million observations on plants and animals that scientists use to analyze environmental change.
  • Did You Feel It? (USGS) has enabled more than 3 million people worldwide to share their experiences during and immediately after earthquakes. This information facilitates rapid damage assessments and scientific research, particularly in areas without dense sensor networks.
  • The mPING (NOAA) mobile app has collected more than 600,000 ground-based observations that help verify weather models.
  • USAID anonymized and opened its loan guarantee data to volunteer mappers. Volunteers mapped 10,000 data points in only 16 hours, compared to the 60 hours officials expected.
  • The Air Sensor Toolbox (EPA), together with training workshops, scientific partners, technology evaluations, and a scientific instrumentation loan program, empowers communities to monitor and report local air pollution.

In early 2015, OSTP, in partnership with the Challenges and Prizes Community of Practice, will convene Federal practitioners to develop the other half of the Open Innovation Toolkit for prizes and challenges. Stay tuned!”
 

Learn from the losers


Tim Harford in the Financial Times:”Kickended is important. It reminds us that the world is biased in systematic ways…I’m sure I’m not the only person to ponder launching an exciting project on Kickstarter before settling back to count the money. Dean Augustin may have had the same idea back in 2011; he sought $12,000 to produce a documentary about John F Kennedy. Jonathan Reiter’s “BizzFit” looked to raise $35,000 to create an algorithmic matching service for employers and employees. This October, two brothers in Syracuse, New York, launched a Kickstarter campaign in the hope of being paid $400 to film themselves terrifying their neighbours at Halloween. These disparate campaigns have one thing in common: they received not a single penny of support. Not one of these people was able to persuade friends, colleagues or even their parents to kick in so much as a cent.
My inspiration for these tales of Kickstarter failure is Silvio Lorusso, an artist and designer based in Venice. Lorusso’s website, Kickended, searches Kickstarter for all the projects that have received absolutely no funding. (There are plenty: about 10 per cent of Kickstarter projects go nowhere at all, and only 40 per cent raise enough money to hit their funding targets.)
Kickended performs an important service. It reminds us that what we see around us is not representative of the world; it is biased in systematic ways. Normally, when we talk of bias we think of a conscious ideological slant. But many biases are simple and unconscious. I have never read a media report or blog post about a typical, representative Kickstarter campaign – but I heard a lot about the Pebble watch, the Coolest cooler and potato salad. If I didn’t know better, I might form unrealistic expectations about what running a Kickstarter campaign might achieve.
This isn’t just about Kickstarter. Such bias is everywhere. Most of the books people read are bestsellers – but most books are not bestsellers. And most book projects do not become books at all. There’s a similar story to tell about music, films and business ventures in general.
. . .
In 1943, the American statistician Abraham Wald was asked to advise the US air force on how to reinforce their planes. Only a limited weight of armour plating was feasible, and the proposal on the table was to reinforce the wings, the centre of the fuselage, and the tail. Why? Because bombers were returning from missions riddled with bullet holes in those areas.
Wald explained that this would be a mistake. What the air force had discovered was that when planes were hit in the wings, tail or central fuselage, they made it home. Where, asked Wald, were the planes that had been hit in other areas? They never returned. Wald suggested reinforcing the planes wherever the surviving planes had been unscathed instead.
It’s natural to look at life’s winners – often they become winners in the first place because they’re interesting to look at. That’s why Kickended gives us an important lesson. If we don’t look at life’s losers too, we may end up putting our time, money, attention or even armour plating in entirely the wrong place.”

White House: Help Shape Public Participation


Corinna Zarek and Justin Herman at the White House Blog: “Public participation — where citizens help shape and implement government programs — is a foundation of open, transparent, and engaging government services. From emergency management and regulatory development to science and education, better and more meaningful engagement with those who use public services can measurably improve government for everyone.
A team across the government is now working side-by-side with civil society organizations to deliver the first U.S. Public Participation Playbook, dedicated to providing best practices for how agencies can better design public participation programs, and suggested performance metrics for evaluating their effectiveness.
Developing a U.S. Public Participation Playbook has been an open government priority, and was included in both the first and second U.S. Open Government National Action Plans as part of the United States effort to increase public integrity in government programs. This resource reflects the commitment of the government and civic partners to measurably improve participation programs, and is designed using the same inclusive principles that it champions.
More than 30 Federal leaders from across diverse missions in public service have collaborated on draft best practices, or “plays,” lead by the General Services Administration’s inter-agency SocialGov Community. The playbook is not limited to digital participation, and is designed to address needs from the full spectrum of public participation programs.
The plays are structured to provide best practices, tangible examples, and suggested performance metrics for government activities that already exist or are under development. Some categories included in the plays include encouraging community development and outreach, empowering participants through public/private partnerships, using data to drive decisions, and designing for inclusiveness and accessibility.
In developing this new resource, the team has been reaching out to more than a dozen civil society organizations and stakeholders, asking them to contribute as the Playbook is created. The team would like your input as well! Over the next month, contribute your ideas to the playbook using Madison, an easy-to-use, open source platform that allows for accountable review of each contribution.
Through this process, the team will work together to ensure that the Playbook reflects the best ideas and examples for agencies to use in developing and implementing their programs with public participation in mind. This resource will be a living document, and stakeholders from inside or outside of government should continually offer new insights — whether new plays, the latest case studies, or the most current performance metrics — to the playbook.
We look forward to seeing the public participate in the creation and evolution of the Public Participation Playbook!”

New Tool in Fighting Corruption: Open Data


Martin Tisne at Omidyar Network: “Yesterday in Brisbane, the G20 threw its weight behind open data by featuring it prominently in the G20 Anti-Corruption working action plan. Specifically, the action plan calls for effort in three related areas:

(1)   Prepare a G20 compendium of good practices and lessons learned on open data and its application in the fight against corruption
(2)   Prepare G20 Open Data Principles, including identifying areas or sectors where their application is particularly useful
(3)   Complete self‑assessments of G20 country open data frameworks and initiatives

Open data describes information that is not simply public, but that has been published in a manner that makes it easy to access and easy to compare and connect with other information.
This matters for anti corruption – if you are a journalist or a civil society activist investigating bribery and corruption those connections are everything. They tell you that an anonymous person (e.g. ‘Mr Smith’) who owns an obscure company registered in a tax haven is linked to a another company that has been illegally exporting timber from a neighboring country. That the said Mr. Smith is also the son-in-law of the mining minister of yet another country, who herself has been accused of embezzling mining revenues. As we have written elsewhere on this blog, investigative journalists, prosecution authorities, and civil society groups all need access to this linked data for their work.
The action plan also links open data to the wider G20 agenda, citing its impact on the ability of businesses to make better investment decisions. You can find the full detail here….”

Giving Americans Easier Access to Their Own Data


Nick Sinai and Rajive Mathur  at the White House Blog: “…One of the newest My Data efforts is the IRS tool, Get Transcript. Launched in 2014, Get Transcript allows taxpayers to securely view, print, and download a PDF record of the last three years of their IRS tax account. Get Transcript has produced over 17 million so-called tax transcripts, reducing phone, mail, or in-person requests by approximately 40% from last year. Secure access to your own tax data makes it easier to demonstrate your income with prospective lenders and employers, or help with tax preparation. What was a paper-based transcript process which took multiple days has been made instantaneous and easy for the American taxpayer.
The IRS is an agency that serves virtually every American, and runs one of the nation’s largest customer service operations. To give an idea of the size and scope of responsibilities, the Internal Revenue Service:

  • receives over 80 million phone calls per year, mostly from people eager to hear the status of their refund, understand a notice, make a payment, or update their account;
  • sends out nearly 200 million paper notices annually; and
  • receives over 50 million unique visitors to its website each month during filing season.

Meeting this demand from citizens is a challenge with limited staff and resources. Nonetheless, the IRS is committed to improving service to citizens across all of its channels – whether it’s by phone, walk-ins, or especially its digital services.
Building on the initial success of Get Transcript, there are more exciting improvements to IRS services in the pipeline. For instance, millions of taxpayers contact the IRS every year to ask about their tax status, whether their filing was received, if their refund was processed, or if their payment posted. In the future, taxpayers will be able to answer these types of questions independently by signing in to a mobile-friendly, personalized online account to conduct transactions and see all of their tax information in one place. Users will be able to view account history and balance, make payments or see payment status, or even authorize their tax preparer to view or make changes to their tax return. This will also include the ability to download personal tax information in an easy to use and machine-readable format so that taxpayers can share with trusted recipients if desired….”

OpenUp Corporate Data while Protecting Privacy


Article by Stefaan G. Verhulst and David Sangokoya, (The GovLab) for the OpenUp? Blog: “Consider a few numbers: By the end of 2014, the number of mobile phone subscriptions worldwide is expected to reach 7 billion, nearly equal to the world’s population. More than 1.82 billion people communicate on some form of social network, and almost 14 billion sensor-laden everyday objects (trucks, health monitors, GPS devices, refrigerators, etc.) are now connected and communicating over the Internet, creating a steady stream of real-time, machine-generated data.
Much of the data generated by these devices is today controlled by corporations. These companies are in effect “owners” of terabytes of data and metadata. Companies use this data to aggregate, analyze, and track individual preferences, provide more targeted consumer experiences, and add value to the corporate bottom line.
At the same time, even as we witness a rapid “datafication” of the global economy, access to data is emerging as an increasingly critical issue, essential to addressing many of our most important social, economic, and political challenges. While the rise of the Open Data movement has opened up over a million datasets around the world, much of this openness is limited to government (and, to a lesser extent, scientific) data. Access to corporate data remains extremely limited. This is a lost opportunity. If corporate data—in the form of Web clicks, tweets, online purchases, sensor data, call data records, etc.—were made available in a de-identified and aggregated manner, researchers, public interest organizations, and third parties would gain greater insights on patterns and trends that could help inform better policies and lead to greater public good (including combatting Ebola).
Corporate data sharing holds tremendous promise. But its potential—and limitations—are also poorly understood. In what follows, we share early findings of our efforts to map this emerging open data frontier, along with a set of reflections on how to safeguard privacy and other citizen and consumer rights while sharing. Understanding the practice of shared corporate data—and assessing the associated risks—is an essential step in increasing access to socially valuable data held by businesses today. This is a challenge certainly worth exploring during the forthcoming OpenUp conference!
Understanding and classifying current corporate data sharing practices
Corporate data sharing remains very much a fledgling field. There has been little rigorous analysis of different ways or impacts of sharing. Nonetheless, our initial mapping of the landscape suggests there have been six main categories of activity—i.e., ways of sharing—to date:…
Assessing risks of corporate data sharing
Although the shared corporate data offers several benefits for researchers, public interest organizations, and other companies, there do exist risks, especially regarding personally identifiable information (PII). When aggregated, PII can serve to help understand trends and broad demographic patterns. But if PII is inadequately scrubbed and aggregated data is linked to specific individuals, this can lead to identity theft, discrimination, profiling, and other violations of individual freedom. It can also lead to significant legal ramifications for corporate data providers….”

The New Thing in Google Flu Trends Is Traditional Data


in the New York Times: “Google is giving its Flu Trends service an overhaul — “a brand new engine,” as it announced in a blog post on Friday.

The new thing is actually traditional data from the Centers for Disease Control and Prevention that is being integrated into the Google flu-tracking model. The goal is greater accuracy after the Google service had been criticized for consistently over-estimating flu outbreaks in recent years.

The main critique came in an analysis done by four quantitative social scientists, published earlier this year in an article in Science magazine, “The Parable of Google Flu: Traps in Big Data Analysis.” The researchers found that the most accurate flu predictor was a data mash-up that combined Google Flu Trends, which monitored flu-related search terms, with the official C.D.C. reports from doctors on influenza-like illness.

The Google Flu Trends team is heeding that advice. In the blog post, written by Christian Stefansen, a Google senior software engineer, wrote, “We’re launching a new Flu Trends model in the United States that — like many of the best performing methods in the literature — takes official CDC flu data into account as the flu season progresses.”

Google’s flu-tracking service has had its ups and downs. Its triumph came in 2009, when it gave an advance signal of the severity of the H1N1 outbreak, two weeks or so ahead of official statistics. In a 2009 article in Nature explaining how Google Flu Trends worked, the company’s researchers did, as the Friday post notes, say that the Google service was not intended to replace official flu surveillance methods and that it was susceptible to “false alerts” — anything that might prompt a surge in flu-related search queries.

Yet those caveats came a couple of pages into the Nature article. And Google Flu Trends became a symbol of the superiority of the new, big data approach — computer algorithms mining data trails for collective intelligence in real time. To enthusiasts, it seemed so superior to the antiquated method of collecting health data that involved doctors talking to patients, inspecting them and filing reports.

But Google’s flu service greatly overestimated the number of cases in the United States in the 2012-13 flu season — a well-known miss — and, according to the research published this year, has persistently overstated flu cases over the years. In the Science article, the social scientists called it “big data hubris.”

Governing the Smart, Connected City


Blog by Susan Crawford at HBR: “As politics at the federal level becomes increasingly corrosive and polarized, with trust in Congress and the President at historic lows, Americans still celebrate their cities. And cities are where the action is when it comes to using technology to thicken the mesh of civic goods — more and more cities are using data to animate and inform interactions between government and citizens to improve wellbeing.
Every day, I learn about some new civic improvement that will become possible when we can assume the presence of ubiquitous, cheap, and unlimited data connectivity in cities. Some of these are made possible by the proliferation of smartphones; others rely on the increasing number of internet-connected sensors embedded in the built environment. In both cases, the constant is data. (My new book, The Responsive City, written with co-author Stephen Goldsmith, tells stories from Chicago, Boston, New York City and elsewhere about recent developments along these lines.)
For example, with open fiber networks in place, sending video messages will become as accessible and routine as sending email is now. Take a look at rhinobird.tv, a free lightweight, open-source video service that works in browsers (no special download needed) and allows anyone to create a hashtag-driven “channel” for particular events and places. A debate or protest could be viewed from a thousand perspectives. Elected officials and public employees could easily hold streaming, virtual town hall meetings.
Given all that video and all those livestreams, we’ll need curation and aggregation to make sense of the flow. That’s why visualization norms, still in their infancy, will become a greater part of literacy. When the Internet Archive attempted late last year to “map” 400,000 hours of television news, against worldwide locations, it came up with pulsing blobs of attention. Although visionary Kevin Kelly has been talking about data visualization as a new form of literacy for years, city governments still struggle with presenting complex and changing information in standard, easy-to-consume ways.
Plenar.io is one attempt to resolve this. It’s a platform developed by former Chicago Chief Data Officer Brett Goldstein that allows public datasets to be combined and mapped with easy-to-see relationships among weather and crime, for example, on a single city block. (A sample question anyone can ask of Plenar.io: “Tell me the story of 700 Howard Street in San Francisco.”) Right now, Plenar.io’s visual norm is a map, but it’s easy to imagine other forms of presentation that could become standard. All the city has to do is open up its widely varying datasets…”