Unleashing the Power of Data to Serve the American People

Memorandum: Unleashing the Power of Data to Serve the American People
To: The American People
From: Dr. DJ Patil, Deputy U.S. CTO for Data Policy and Chief Data Scientist

….While there is a rich history of companies using data to their competitive advantage, the disproportionate beneficiaries of big data and data science have been Internet technologies like social media, search, and e-commerce. Yet transformative uses of data in other spheres are just around the corner. Precision medicine and other forms of smarter health care delivery, individualized education, and the “Internet of Things” (which refers to devices like cars or thermostats communicating with each other using embedded sensors linked through wired and wireless networks) are just a few of the ways in which innovative data science applications will transform our future.

The Obama administration has embraced the use of data to improve the operation of the U.S. government and the interactions that people have with it. On May 9, 2013, President Obama signed Executive Order 13642, which made open and machine-readable data the new default for government information. Over the past few years, the Administration has launched a number of Open Data Initiatives aimed at scaling up open data efforts across the government, helping make troves of valuable data — data that taxpayers have already paid for — easily accessible to anyone. In fact, I used data made available by the National Oceanic and Atmospheric Administration to improve numerical methods of weather forecasting as part of my doctoral work. So I know firsthand just how valuable this data can be — it helped get me through school!

Given the substantial benefits that responsibly and creatively deployed data can provide to us and our nation, it is essential that we work together to push the frontiers of data science. Given the importance this Administration has placed on data, along with the momentum that has been created, now is a unique time to establish a legacy of data supporting the public good. That is why, after a long time in the private sector, I am returning to the federal government as the Deputy Chief Technology Officer for Data Policy and Chief Data Scientist.

Organizations are increasingly realizing that in order to maximize their benefit from data, they require dedicated leadership with the relevant skills. Many corporations, local governments, federal agencies, and others have already created such a role, which is usually called the Chief Data Officer (CDO) or the Chief Data Scientist (CDS). The role of an organization’s CDO or CDS is to help their organization acquire, process, and leverage data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

The Role of the First-Ever U.S. Chief Data Scientist

Similarly, my role as the U.S. CDS will be to responsibly source, process, and leverage data in a timely fashion to enable transparency, provide security, and foster innovation for the benefit of the American public, in order to maximize the nation’s return on its investment in data.

So what specifically am I here to do? As I start, I plan to focus on these four activities:


Medical Wikis Dedicated to Clinical Practice: A Systematic Review

New paper by Alexandre Brulet et al:  “Wikis may give clinician communities the opportunity to build knowledge relevant to their practice. The only previous study reviewing a set of health-related wikis, without specification of purpose or audience, globally showed a poor reliability…. Our aim was to review medical wiki websites dedicated to clinical practices…..Among 25 wikis included, 11 aimed at building an encyclopedia, five a textbook, three lessons, two oncology protocols, one a single article, and three at reporting clinical cases. Sixteen wikis were specialized with specific themes or disciplines. Fifteen wikis were using MediaWiki software as-is, three were hosted by online wiki farms, and seven were purpose-built. Except for one MediaWiki-based site, only purpose-built platforms managed detailed user disclosures. ….The 25 medical wikis we studied present various limitations in their format, management, and collaborative features. Professional medical wikis may be improved by using clinical cases, developing more detailed transparency and editorial policies, and involving postgraduate and continuing medical education learners….(More)”

A lot of private-sector data is also used for public good

Josh New in Computerworld: “As the private sector continues to invest in data-driven innovation, the capacity for society to benefit from this data collection grows as well. Much has been said about how the private sector is using the data it collects to improve corporate bottom lines, but positive stories about how that data contributes to the greater public good are largely unknown.
This is unfortunate, because data collected by the private sector is being used in a variety of important ways, including to advance medical research, to help students make better academic decisions and to provide government agencies and nonprofits with actionable insights. However, overzealous actions by government to restrict the collection and use of data by the private sector are likely to have a chilling effect on such data-driven innovation.
Companies are working to advance medical research with data sharing. Personal genetics company 23andMe, which offers its customers inexpensive DNA test kits, has obtained consent from three-fourths of its 800,000 customers to donate their genetic information for research purposes. 23andMe has partnered with pharmaceutical companies, such as Genentech and Pfizer, to advance genomics research by providing scientists with the data needed to develop new treatments for diseases like Crohn’s and Parkinson’s. The company has also worked with researchers to leverage its network of customers to recruit patients for clinical trials more effectively than through previous protocols.
Private-sector data is also helping students make more informed decisions about education. With the cost of attending college rising, data that helps make this investment worthwhile is incredibly valuable. The social networking company LinkedIn has built tools that provide prospective college students with valuable information about their potential career path, field of study and choice of school. By analyzing the education tracks and careers of its users, LinkedIn can offer students critical data-driven insights into how to make the best out of the enormous and costly decision to go to college. Through LinkedIn’s higher-education tools, students now have an unprecedented resource to develop data-supported education and career plans….(More)”

Innovate or Stagnate

Mohammed bin Rashid Al Maktoum at Project Syndicate: “Companies, like people, grow old. They start life small and eager to survive, fueled by youthful energy and fresh ideas. They compete, expand, mature, and eventually, with few exceptions, fade into obscurity. The same is true of governments: they, too, can lose the hunger and ambition of youth and allow themselves to become complacent….

The key to corporations’ rejuvenation, civilizations’ evolution, and human development in general is simple: innovation. I am always amazed when governments think they are an exception to this rule. Innovation in government is not an intellectual luxury, a topic confined to seminars and panel discussions, or a matter only of administrative reforms. It is the recipe for human survival and development, the fuel for constant progress, and the blueprint for a country’s rise.

The first key to business-like innovation in government is a focus on skills. Top-tier companies continuously invest in their employees to provide them with the right skills for the marketplace. Governments must do the same, by constantly upgrading skills and nurturing innovation – among their own employees, across key sectors of the economy, and at the foundations of the education system. Governments that fail to equip new generations with the skills needed to become leaders for their time are condemning them to be led by other, more innovative societies….

The second key to transforming governments into engines of innovation is to shift the balance of investment toward intangibles, as in the private sector. Whereas more than 80% of the value of the Standard & Poor’s 500 consisted of tangible assets 40 years ago, today that ratio is reversed: more than 80% of the largest companies’ value is intangible – the knowledge and skills of their employees and the intellectual property embedded in their products.

Governments, too, should think strategically about shifting their spending away from tangible infrastructure like roads and buildings, and toward intangibles like education and research and development…. (More)”.

U.S. Public Participation Playbook

“The U.S. Public Participation Playbook is a resource for government managers to effectively evaluate and build better services through public participation using best practices and performance metrics.
Public participation—where citizens help shape and implement government programs—is a foundation of open, transparent, and engaging government services. From emergency management, town hall discussions and regulatory development to science and education, better engagement with those who use public services can measurably improve those services for everyone.
Developing a U.S. Public Participation Playbook is an open government priority included in both the first and second U.S. Open Government National Action Plans as part of the United States effort to increase public integrity in government programs. This resource reflects the commitment of the government and civic partners to measurably improve participation programs, and is designed using the same inclusive principles that it champions.
How is the playbook structured?

We needed to create a resource that combines best practices and suggested performance metrics for public servants to use to evaluate and build better services — to meet this need, based on discussions with federal managers and stakeholders, we identified five main categories that should be addressed in all programs, whether digital or offline. Within each category we identified 12 unifying plays to start with, each including a checklist to consider, resources and training. We then provide suggested performance metrics for each main category.
This is only the beginning, however, and we hope the plays will quickly expand and enrich. The U.S. Public Participation Playbook was not just designed for a more open government — it was designed collaboratively through a more open government…(More)”

With a Few Bits of Data, Researchers Identify ‘Anonymous’ People

in the New York Times: “Even when real names and other personal information are stripped from big data sets, it is often possible to use just a few pieces of the information to identify a specific person, according to a study to be published Friday in the journal Science.

In the study, titled “Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata,” a group of data scientists analyzed credit card transactions made by 1.1 million people in 10,000 stores over a three-month period. The data set contained details including the date of each transaction, amount charged and name of the store.

Although the information had been “anonymized” by removing personal details like names and account numbers, the uniqueness of people’s behavior made it easy to single them out.

In fact, knowing just four random pieces of information was enough to reidentify 90 percent of the shoppers as unique individuals and to uncover their records, researchers calculated. And that uniqueness of behavior — or “unicity,” as the researchers termed it — combined with publicly available information, like Instagram or Twitter posts, could make it possible to reidentify people’s records by name.

“The message is that we ought to rethink and reformulate the way we think about data protection,” said Yves-Alexandre de Montjoye, a graduate student in computational privacy at the M.I.T. Media Lab who was the lead author of the study. “The old model of anonymity doesn’t seem to be the right model when we are talking about large-scale metadata.”

The analysis of large data sets containing details on people’s behavior holds great potential to improve public health, city planning and education.

But the study calls into question the standard methods many companies, hospitals and government agencies currently use to anonymize their records. It may also give ammunition to some technologists and privacy advocates who have challenged the consumer-tracking processes used by advertising software and analytics companies to tailor ads to so-called anonymous users online….(More).”

Study: Complaining on Twitter correlates with heart disease risks

at ArsTechnica: “Tweets prove better regional heart disease predictor than many classic factors. This week, a study was released by researchers at the University of Pennsylvania that found a surprising correlation when studying two kinds of maps: those that mapped the county-level frequency of cardiac disease, and those that mapped the emotional state of an area’s Twitter posts.
In all, researchers sifted through over 826 million tweets, made available by Twitter’s research-friendly “garden hose” server access, then narrowed those down to roughly 146 million tweets that had been posted with geolocation data from over 1,300 counties (each county needed to have at least 50,000 tweets to sift through to qualify). The team then measured an individual county’s expected “health” level based on frequency of certain phrases, using dictionaries that had been put through scrutiny over their application to emotional states. Negative statements about health, jobs, and attractiveness—along with a bump in curse words—would put a county in the “risk” camp, while words like “opportunities,” “overcome,” and “weekend” added more points to a county’s “protective” rating.
Not only did this measure correlate strongly with age-adjusted heart disease rate data, it turned out to be a more efficient predictor of higher or lower disease likelihood than “ten classical predictors” combined, including education, obesity, and smoking. Twitter beat that data by a rate of 42 percent to 36 percent….Psychological Science, 2014. DOI: 10.1177/0956797614557867  (About DOIs)….(More)”

At Universities, a Push for Data-Driven Career Services

at The New York Times: “Officials at the University of California, San Diego, had sparse information on the career success of their graduates until they set up a branded page for the university on LinkedIn a couple of years ago.

“Back then, we had records on 125,000 alumni, but we had good employment information on less than 10,000 of them,” recalled Armin Afsahi, who oversees alumni relations as the university’s associate vice chancellor for advancement. “Aside from Qualcomm, which is in our back yard, we didn’t know who employed our alumni.”

Within three months of setting up the university page, LinkedIn connections surfaced information on 92,000 alumni, Mr. Afsahi said.

The LinkedIn page of University of California, San Diego.
The LinkedIn page of University of California, San Diego.Credit


“The old models of alumni relations don’t work,” Mr. Afsahi said. “We have to be a data-driven, intelligence-oriented organization to create the engagement and value” that students and alumni expect.

In an article on Sunday, I profiled two analytics start-ups, EverTrue and Graduway, which aim to help colleges and universities identify their best prospective donors or student mentors by scanning their graduates’ social networking activities. Each start-up taps into LinkedIn profiles of alumni — albeit in different ways — to help institutions of higher education stay up-to-date with their graduates’ contact information and careers.

Since 2013, however, LinkedIn has offered its own proprietary service, called University Pages, where schools can create hubs for alumni outreach and networking. About 25,000 institutions of higher learning around the world now have official university pages on the site…(More).”

Mapping the Nation: Building a More Resilient Future

New book from Esri: “The fifth book in Esri’s Mapping the Nation series, Mapping the Nation: Building a More Resilient Future is a collection of geographic information system (GIS) maps that illustrate how federal government agencies rely on GIS analysis to build stronger, more resilient communities and help make the world a better place.
The print version of the book includes 118 full-color maps produced by more than 50 federal government agencies, including the US Forest Service, US Department of Defense, US Department of Education, and the Bureau of Ocean Energy Management. The digital version of Mapping the Nation offers enhanced and interactive maps and videos showcasing four start-up companies that are using ArcGIS technology in partnership with Esri and the government.
The maps depict how federal employees and officials use GIS to evaluate, plan, and respond to social, economic, and environmental concerns at local, regional, national, and global levels. Topics such as green government, economic recovery and sustainability, and climate protection show how government agencies use GIS to facilitate initiatives, improve transparency, and deliver strong business models…
Mapping and Apping the Nation 2015, an interactive digital adaptation of the printed map book, is available free of charge from the Esri Books app on Apple iTunes and the Google Play store.”

Would Athenian-style democracy work in the UK today?

Paul Cartledge at the BBC, in the context of BBC Democracy Day: “…The -kratia component of demo-kratia was derived from kratos, which meant unambiguously and unambivalently power or strength. Demos, the other component, meant “people” – but which people, precisely?
At one extreme it could be taken to mean all the people – that is, all the politically empowered people, the adult male citizenry as a whole. At the other ideological pole, it referred to only a section of the citizen people, the largest, namely the majority of poor citizens – those who had to work for a living and might be in greater or less penury.
Against these masses were counterposed the elite citizens – the (more or less) wealthy Few. For them, and it may well have been they who coined the word demokratia, the demos in the class sense meant the great unwashed, the stupid, ignorant, uneducated majority.
So, depending where you stood on the social spectrum, demokratia was either Abe Lincoln’s government of, by and for the people, or the dictatorship of the proletariat. This complicates, at least, any thought-experiment such as the one I’m about to conduct here.
However, what really stands in the way is a more symbolic than pragmatic objection – education, education, education.
For all that we have a formal and universally compulsory educational system, we are not educated either formally or informally to be citizens in the strong, active and participatory senses. The ancient Athenians lacked any sort of formal educational system whatsoever – though somehow or other most of them learned to read and write and count.
On the other hand, what they did possess in spades was an abundance of communal institutions, both formal and informal, both peaceful and warlike, both sacred and secular, whereby ideas of democratic citizenship could be disseminated, inculcated, internalised, and above all practised universally.
Annual, monthly and daily religious festivals. Annual drama festivals that were also themselves religious. Multiple experiences of direct participation in politics at both the local (village, parish, ward) and the “national” levels. And fighting as and for the Athenians both on land and at sea, against enemies both Greek and non-Greek (especially Persian).
Formal Athenian democratic politics, moreover, drew no such modern distinctions between the executive, legislative and judicial branches or functions of government as are enshrined in modern democratic constitutions. One ruled, as a democratic citizen, in all relevant branches equally. A trial for alleged impiety was properly speaking a political trial, as Socrates discovered to his cost.
In short, ancient Athenian democracy was very far from our liberal democracy. I don’t think I need to bang on about its conscientious exclusion of the female half of the citizenry, or its basis in a radical form of dehumanised personal slavery.
So why should we even think of wanting to apply any lesson or precedent drawn from it to our democracy today or in the future? One very good reason is the so-called “democratic deficit”, the attenuation or etiolation of what it means to be, or function fully as, a democratic citizen….(More)”