Unleashing the Power of Data to Serve the American People


Memorandum: Unleashing the Power of Data to Serve the American People
To: The American People
From: Dr. DJ Patil, Deputy U.S. CTO for Data Policy and Chief Data Scientist

….While there is a rich history of companies using data to their competitive advantage, the disproportionate beneficiaries of big data and data science have been Internet technologies like social media, search, and e-commerce. Yet transformative uses of data in other spheres are just around the corner. Precision medicine and other forms of smarter health care delivery, individualized education, and the “Internet of Things” (which refers to devices like cars or thermostats communicating with each other using embedded sensors linked through wired and wireless networks) are just a few of the ways in which innovative data science applications will transform our future.

The Obama administration has embraced the use of data to improve the operation of the U.S. government and the interactions that people have with it. On May 9, 2013, President Obama signed Executive Order 13642, which made open and machine-readable data the new default for government information. Over the past few years, the Administration has launched a number of Open Data Initiatives aimed at scaling up open data efforts across the government, helping make troves of valuable data — data that taxpayers have already paid for — easily accessible to anyone. In fact, I used data made available by the National Oceanic and Atmospheric Administration to improve numerical methods of weather forecasting as part of my doctoral work. So I know firsthand just how valuable this data can be — it helped get me through school!

Given the substantial benefits that responsibly and creatively deployed data can provide to us and our nation, it is essential that we work together to push the frontiers of data science. Given the importance this Administration has placed on data, along with the momentum that has been created, now is a unique time to establish a legacy of data supporting the public good. That is why, after a long time in the private sector, I am returning to the federal government as the Deputy Chief Technology Officer for Data Policy and Chief Data Scientist.

Organizations are increasingly realizing that in order to maximize their benefit from data, they require dedicated leadership with the relevant skills. Many corporations, local governments, federal agencies, and others have already created such a role, which is usually called the Chief Data Officer (CDO) or the Chief Data Scientist (CDS). The role of an organization’s CDO or CDS is to help their organization acquire, process, and leverage data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

The Role of the First-Ever U.S. Chief Data Scientist

Similarly, my role as the U.S. CDS will be to responsibly source, process, and leverage data in a timely fashion to enable transparency, provide security, and foster innovation for the benefit of the American public, in order to maximize the nation’s return on its investment in data.

So what specifically am I here to do? As I start, I plan to focus on these four activities:

…(More)”

'From Atoms to Bits': A Visual History of American Ideas


in The Atlantic: “A new paper employs a simple technique—counting words in patent texts—to trace the history of American invention, from chemistry to computers….in a new paper, Mikko Packalen at the University of Waterloo and Jay Bhattacharya of Stanford University, devised a brilliant way to address this question empirically. In short, they counted words in patent texts.

In a series of papers studying the history of American innovation, Packalen and Bhattacharya indexed every one-word, two-word, and three-word phrase that appeared in more than 4 million patent texts in the last 175 years. To focus their search on truly new concepts, they recorded the year those phrases first appeared in a patent. Finally, they ranked each concept’s popularity based on how many times it reappeared in later patents. Essentially, they trawled the billion-word literature of patents to document the birth-year and the lifespan of American concepts, from “plastic” to “world wide web” and “instant messaging.”

Here are the 20 most popular sequences of words in each decade from the 1840s to the 2000s. You can see polymerase chain reactions in the middle of the 1980s stack. Since the timeline, as it appears in the paper, is too wide to be visible on this article page, I’ve chopped it up and inserted the color code both above and below the timeline….

Another theme of Packalen and Bhattacharya’s research is that innovation has become more collaborative. Indeed, computers have not only taken over the world of inventions, but also they have changed the geography of innovation, Bhattacharya said. Larger cities have historically held an innovative advantage, because (the theory goes) their density of smarties speeds up debate on the merits of new ideas, which are often born raw and poorly understood. But the researchers found that in the last few decades, larger cities are no more likely to produce new ideas in patents than smaller cities that can just as easily connect online with their co-authors. “Perhaps due to the Internet, the advantage of larger cities appears to be eroding,” Packalen wrote in an email….(More)”

Citizen Science: Catch, Click and Submit Contest


Wilson Commons Lab: “The inaugural Catch, Click and Submit Contest begins on Feb 21st in honor of the National Invasive Species Awareness Week running Feb 22nd through the 28th. The contest, which calls on anglers to photograph and report non-native fish species caught during the derby, will award prizes to various categories such as “Most Unusual Catch” and “Most Species”.  Submissions from the contest will aid researchers in developing a better understanding of the distribution of fish species throughout Florida waterways.
By engaging the existing angler community, the contest hopes to increase public awareness of the potential impacts that arise from non-native fish species. “The Catch, Click and Submit Contest offers anglers the opportunity to assist natural resource managers in finding nonnative species by doing what they enjoy – fishing!” said biologist Kelly Gestring. “The early detection of a new, nonnative species could provide a better opportunity to control or even eradicate a population.” The hope is that participants will choose to target non-native fish for consumption in the future, helping to control invasive populations…(More).”

The Internet’s hidden science factory


Jenny Marder at PBS Newshour: “….Marshall is a worker for Amazon’s Mechanical Turk, an online job forum where “requesters” post jobs, and an army of crowdsourced workers complete them, earning fantastically small fees for each task. The work has been called microlabor, and the jobs, known as Human Intelligence Tasks, or HITs, range wildly. Some are tedious: transcribing interviews or cropping photos. Some are funny: prank calling someone’s buddy (that’s worth $1) or writing the title to a pornographic movie based on a collection of dirty screen grabs (6 cents). And others are downright bizarre. One task, for example, asked workers to strap live fish to their chests and upload the photos. That paid $5 — a lot by Mechanical Turk standards….
These aren’t obscure studies that Turkers are feeding. They span dozens of fields of research, including social, cognitive and clinical psychology, economics, political science and medicine. They teach us about human behavior. They deal in subjects like energy conservation, adolescent alcohol use, managing money and developing effective teaching methods.


….In 2010, the researcher Joseph Henrich and his team published a paper showing that an American undergraduate was about 4,000 times more likely than an average American to be the subject of a research study.
But that output pales in comparison to Mechanical Turk workers. The typical “Turker” completes more studies in a week than the typical undergraduate completes in a lifetime. That’s according to research by Rand, who surveyed both groups. Among those he surveyed, he found that the median traditional lab subject had completed 15 total academic studies — an average of one per week. The median Turker, on the other hand, had completed 300 total academic studies — an average of 20 per week….(More)”

Can Selfies Save Nutrition Science?


Trevor Butterworth at Stats.org: “You may have never heard of the Energy Balance Working Group, but this collection of 45 experts on nutrition, exercise, biochemistry, and other related disciplines have collectively thrown a “House-like” wrench into the research literature on everything from obesity to cancer and heart disease. Gregory House, the fictional and fantastically brilliant physician played by Hugh Laurie in the eponymous TV show frequently found his patients wanting in the court of self-reported truth: “I’ve found that when you want to know the truth about someone that someone is probably the last person you should ask.”
This is more or less what the Energy Balance Working Group have concluded in an “expert report” recently published in the International Journal of Obesity. If you want to know the truth about how much someone eats and exercises that someone is probably the last person you should ask….The problem is that self-reporting is a cheap and convenient source of data for research, while more accurate alternatives are either expensive and challenging or, as yet, more promise than reality (see sidebar)….
“There are at least two categories of solutions on the horizon. In one category, there are wearable monitoring devices that can collect objective, real-time data. Examples in the works or in use include photographic food diaries, records of chewing and swallowing behavior, and evaluating the time and intensity of movement using accelerometers and GPS, among others. It is important to note that there are still challenges converting these measurements into reliable estimates of energy intake and expenditure, but work is ongoing… David Allison, Distinguished Professor, Quetelet Endowed Professor of Public Health, University of Alabam”…(More)

The Future of Open and How To Stop It


Blogpost by Steve Song: “In 2008, Jonathan Zittrain wrote a book called The Future of the Internet and How To Stop It. In it he argued that the runaway success of the Internet is also the cause of it being undermined, that vested interests were in the process of locking down the potential for innovation by creating walled gardens.  He wrote that book because he loved the Internet and the potential it represents and was concerned about it going down a path that would diminish its potential.  It is in that spirit that I borrow his title to talk about the open movement.  By the term open movement, I am referring broadly to the group of initiatives inspired by the success of Open Source software that led to initiatives as varied as the Creative Commons, Open Data, Open Science, Open Access, Open Corporates, Open Government, the list goes on.   I write this because I love open initiatives but I fear that openness is in danger of becoming its own enemy as it becomes an orthodoxy difficult to question.
In June of last year, I wrote an article called The Morality of Openness which attempted to unpack my complicated feelings about openness.  Towards the end the essay, I wondered whether the word trust might not be a more important word than open for our current world.  I am now convinced of this.  Which is not to say that I have stopped believing in openness but openness; I believe openness is a means to an end, it is not the endgame.  Trust is the endgame.  Higher trust environments, whether in families or corporations or economies, tend to be both more effective and happier.  There is no similar body of evidence for open and yet open practices can be a critical element on the road to trust. Equally, when mis-applied, openness can achieve the opposite….
Openness can be a means of building trust.  Ironically though, if openness as behaviour is mandated, it stops building trust.  Listen to Nobel Laureate Vernon Smith talk about why that happens.  What Smith argues (building on the work of an earlier Smith, Adam Smith’s Theory of Moral Sentiments) is that intent matters.  That as human beings, we signal our intentions to each other with our behaviour and that influences how others behave.  When intention is removed by regulating or enforcing good behaviour, that signal is lost as well.
I watched this happen nearly ten years ago in South Africa when the government decided to embrace the success of Open Source software and make it mandatory for government departments to use Open Source software.  No one did.  It is choosing to share that make open initiatives work.  When you remove choice, you don’t inspire others to share and you don’t build trust.  Looking at the problem from the perspective of trust rather than from the perspective of open makes this problem much easier to see.
Lateral thinker Jerry Michalski gave a great talk last year entitled What If We Trusted You? in which he talked about how the architecture of systems either build or destroy trust.  He give a great example of wikipedia as an open, trust enabling architecture.  We don’t often think about what a giant leap of trust wikipedia makes in allowing anyone to edit it and what an enormous achievement it became…(More).”

Untangling Complexity


Essay by Jacqueline Wallace at civicquarterly.com : “The next phase of the digital revolution will be defined by products and services that facilitate shared understanding, allowing concerted participation around complex issues. In working to show the way, civic designers will need to call upon the powers of systems research, design research, social science, and open data….
Creating next-gen civic applications will require designers to embody a systems-based approach to civic participation, marrying systems-based research, user-centered design, social science, and data. This article chronicles my own experience leveraging these tools to facilitate shared understand amongst my community vis. a vis. the Kinder Morgan pipeline….
I believe the contention around the pipeline evinces a bigger problem in our civic sphere: While individual issues such as the Kinder Morgan pipeline continue to absorb a great deal of energy from citizens, user-centered designers must use their unique skillset to address these issues more broadly. Because we’re not only failing our fellow citizens; we’re failing our representatives as well. In the words of digital strategist and civic design advocate Mike Connery: “there has been almost zero investment in giving our representatives the tools they need to understand feedback from citizens.”
The opportunity is two-fold. As previously stated, there is a need to develop tools that support citizens and representatives to understand and engage with complex social issues. There is also a need to develop processes, processes that cultivate our abilities to understand policy development in order to more efficiently spend our tax dollars. Writing about a recent project that used social science methods to analyze the long-term success of a welfare project, NPR correspondent Shankar Vedantam said, “it really makes no sense that marketers selling toys have better data on what works and what doesn’t than policy makers who are spending billions and billions of dollars.”…(More)

Training the next generation of public leaders


Thanks to the generous support of the Knight Foundation, this term the Governance Lab Academy – a training program designed to promote civic engagement and innovation – is launching a series of online coaching programs.
Geared to the teams and individuals inside and outside of government planning to undertake a new project or trying to figure out how to make an existing project even more effective and scalable, these programs are designed to help participants working in civic engagement and innovation develop effective projects from idea to implementation.
Convened by leading experts in their fields, coaching programs meet exclusively online once a week for four weeks or every other week for eight weeks. They include frequent and constructive feedback, customized and original learning materials, peer-to-peer support, mentoring by topic experts and individualized coaching from those with policy, technology, and domain expertise.
There is no charge to participants but each program is limited to 8-10 project teams or individuals.
You can see the current roster of programs below and check out the website for more information (including FAQs), to sign up and to suggest a new program.

Faculty includes: 

  • Brian Behlendorf, Managing Director at Mithril Capital Management and Co-Founder Apache
  • Alexandra Clare, Founder of Iraq Re:Coded
  • Brian Forde, Senior Former Advisor to the U.S. CTO, White House Office of Science Technology and Policy
  • Francois Grey,  Coordinator of the Citizen Cyberscience Centre, Geneva
  • Gavin Hayman, Executive Director of the Open Contracting Partnership
  • Clay Johnson, CEO of The Department for Better Technology and Former Presidential Innovation Fellow
  • Benjamin Kallos, New York City Council Member and Chair of the Committee on Governmental Operations of the New York City Council
  • Karim Lakhani, Lumry Family Associate Professor of Business Administration at the Harvard Business School
  • Amen Ra Mashariki, Chief Analytics Officer of New York City
  • Geoff Mulgan, Chief Executive of NESTA
  • Miriam Nisbet,  Former Director of the Office of Government Information Services
  • Beth Noveck, Founder and CEO of The GovLab
  • Tiago Peixoto, Open Government Specialist at The World Bank
  • Arnaud Sahuguet, Chief Technology Officer of The GovLab
  • Joeri van den Steenhoven, Co-Founder and Chief Research and Development Officer of MaRS Solutions Lab
  • Stefaan Verhulst, Co-Founder and Chief Research and Development Officer of The GovLab

Ebola: Call for more sharing of scientific data


at the BBC: “The devastation left by the Ebola virus in west Africa raises many questions for science, policy and international development. One issue that has yet to receive widespread media attention is the handling of genetic data on the virus. By studying its code, scientists can trace how Ebola leapt across borders, and how, like all viruses, it is constantly evolving and changing.

Yet, researchers have been privately complaining for months about the scarcity of genetic information about the virus that is entering the public domain….

At the heart of the issue is the scientific process. The main way scientists are rewarded for their work is through the quality and number of research papers they publish.
Data is only revealed for scrutiny by the wider scientific community when the research is published, which can be a lengthy process….
Dr Emma Thomson of the MRC-University of Glasgow centre for virus research says all journals publishing papers on Ebola must insist all data is released, as a collaborative approach could save lives.
“At the time of publication is really important – these days most people do it but not always and journals often insist (but not always),” she told me.
“A lot of Ebola sequencing has happened but the data hasn’t always been uploaded.
“It’s an international emergency so people need to get the data out there to allow it to be analysed in different ways by different labs.”
In the old days of the public private race to decode the first human genome, the mood was one of making data accessible to all for the good of science and society.
Genetic science and public attitudes have moved on, but in the case of Ebola, some are saying it may be time for a re think.
As Prof Paul Hunter, Professor of health protection at the University of East Anglia, put it: “It would be tragic if, during a crisis like this, data was not being adequately shared with the public health community.
“The rapid sharing of data could help enable more rapid control of the outbreak.”…(More)”

Innovation Labs: Leveraging Openness for Radical Innovation?


Paper by Gryszkiewicz, Lidia and Lykourentzou, Ioanna and Toivonen, Tuukka: “A growing range of public, private and civic organisations, from Unicef through Nesta to Tesco, now run units known as ‘innovation labs’. The hopeful assumption they share is that labs, by building on openness among other features, can generate promising solutions to grand challenges of the future. Despite their seeming proliferation and popularisation, the underlying innovation paradigm embodied by labs has so far received scant academic attention. This is a missed opportunity, because innovation labs are potentially fruitful vehicles for leveraging openness for radical innovation. Indeed, they not only strive to span organisational, sectoral and geographical boundaries by bringing a variety of uncommon actors together to embrace radical ideas and out-of-the box thinking, but they also aim to apply the concept of openness throughout the innovation process, including the experimentation and development phases. While the phenomenon of labs clearly forms part of a broader trend towards openness, it seems to transcend traditional conceptualisations of open innovation (Chesbrough, 2006), open strategy (Whittington et al., 2011), open science (David, 1998) or open government (Janssen et al., 2012). What are innovation labs about, how do they differ from other innovation efforts and how do they embrace openness to create breakthrough innovations? This short exploratory paper is an introduction to a larger empirical study aiming to answer these questions….(More).”