The big medical data miss: challenges in establishing an open medical resource


Eric J. Topol in Nature: ” I call for an international open medical resource to provide a database for every individual’s genomic, metabolomic, microbiomic, epigenomic and clinical information. This resource is needed in order to facilitate genetic diagnoses and transform medical care.

“We are each, in effect, one-person clinical trials”

Laurie Becklund was a noted journalist who died in February 2015 at age 66 from breast cancer. Soon thereafter, the Los Angeles Times published her op-ed entitled “As I lay dying” (Ref. 1). She lamented, “We are each, in effect, one-person clinical trials. Yet the knowledge generated from those trials will die with us because there is no comprehensive database of metastatic breast cancer patients, their characteristics and what treatments did and didn’t help them”. She went on to assert that, in the era of big data, the lack of such a resource is “criminal”, and she is absolutely right….

Around the same time of this important op-ed, the MIT Technology Review published their issue entitled “10 Breakthrough Technologies 2015” and on the list was the “Internet of DNA” (Ref. 2). While we are often reminded that the world we live in is becoming the “Internet of Things”, I have not seen this terminology applied to DNA before. The article on the “Internet of DNA” decried, “the unfolding calamity in genomics is that a great deal of life-saving information, though already collected, is inaccessible”. It called for a global network of millions of genomes and cited theMatchmaker Exchange as a frontrunner. For this international initiative, a growing number of research and clinical teams have come together to pool and exchange phenotypic and genotypic data for individual patients with rare disorders, in order to share this information and assist in the molecular diagnosis of individuals with rare diseases….

an Internet of DNA — or what I have referred to as a massive, open, online medicine resource (MOOM) — would help to quickly identify the genetic cause of the disorder4 and, in the process of doing so, precious guidance for prevention, if necessary, would become available for such families who are currently left in the lurch as to their risk of suddenly dying.

So why aren’t such MOOMs being assembled? ….

There has also been much discussion related to privacy concerns that patients might be unwilling to participate in a massive medical information resource. However, multiple global consumer surveys have shown that more than 80% of individuals are ready to share their medical data provided that they are anonymized and their privacy maximally assured4. Indeed, just 24 hours into Apple’s ResearchKit initiative, a smartphone-based medical research programme, there were tens of thousand of patients with Parkinson disease, asthma or heart disease who had signed on. Some individuals are even willing to be “open source” — that is, to make their genetic and clinical data fully available with free access online, without any assurance of privacy. This willingness is seen by the participants in the recently launched Open Humans initiative. Along with the Personal Genome Project, Go Viral and American Gut have joined in this initiative. Still, studies suggest that most individuals would only agree to be medical research participants if their identities would not be attainable. Unfortunately, to date, little has been done to protect individual medical privacy, for which there are both promising new data protection technological approaches4 and the need for additional governmental legislation.

This leaves us with perhaps the major obstacle that is holding back the development of MOOMs — researchers. Even with big, team science research projects culling together hundreds of investigators and institutions throughout the world, such as the Global Alliance for Genomics and Health (GA4GH), the data obtained clinically are just as Laurie Becklund asserted in her op-ed — “one-person clinical trials” (Ref. 1). While undertaking the construction of a MOOM is a huge endeavour, there is little motivation for researchers to take on this task, as this currently offers no academic credit and has no funding source. But the transformative potential of MOOMs to improve medical care is extraordinary. Rather than having the knowledge die with each of us, the time has come to take down the walls of academic medical centres and health-care systems around the world, and create a global knowledge medical resource that leverages each individual’s information to help one another…(More)”

Bloomberg Philanthropies Launches $42 Million “What Works Cities” Initiative


Press Release: “Today, Bloomberg Philanthropies announced the launch of the What Works Cities initiative, a $42 million program to help 100 mid-sized cities better use data and evidence. What Works Cities is the latest initiative from Bloomberg Philanthropies’ Government Innovation portfolio which promotes public sector innovation and spreads effective ideas amongst cities.

Through partners, Bloomberg Philanthropies will help mayors and local leaders use data and evidence to engage the public, make government more effective and improve people’s lives. U.S. cities with populations between 100,000 and 1 million people are invited to apply.

“While cities are working to meet new challenges with limited resources, they have access to more data than ever – and they are increasingly using it to improve people’s lives,” said Michael R. Bloomberg. “We’ll help them build on their progress, and help even more cities take steps to put data to work. What works? That’s a question that every city leader should ask – and we want to help them find answers.”

The $42 million dollar effort is the nation’s most comprehensive philanthropic initiative to help accelerate the ability of local leaders to use data and evidence to improve the lives of their residents. What Works Cities will provide mayors with robust technical assistance, expertise, and peer-to-peer learning opportunities that will help them enhance their use of data and evidence to improve services to solve problems for communities. The program will help cities:

1. Create sustainable open data programs and policies that promote transparency and robust citizen engagement;

2. Better incorporate data into budget, operational, and policy decision making;

3. Conduct low-cost, rapid evaluations that allow cities to continually improve programs; and

4. Focus funding on approaches that deliver results for citizens.

Across the initiative, Bloomberg Philanthropies will document how cities currently use data and evidence in decision making, and how this unique program of support helps them advance. Over time, the initiative will also launch a benchmark system which will collect standardized, comparable data so that cities can understand their performance relative to peers.

In cities across the country, mayors are increasingly relying on data and evidence to deliver better results for city residents. For example, New Orleans’ City Hall used data to reduce blighted residences by 10,000 and increased the number of homes brought into compliance by 62% in 2 years. The City’s “BlightStat” program has put New Orleans, once behind in efforts to revitalize abandoned and decaying properties, at the forefront of national efforts.

In New York City and other jurisdictions, open data from transit agencies has led to the creation of hundreds of apps that residents now use to get around town, choose where to live based on commuting times, provide key transit information to the visually impaired, and more. And Louisville has asked volunteers to attach GPS trackers to their asthma inhalers to see where they have the hardest time breathing. The city is now using that data to better target the sources of air pollution….

To learn more and apply to be a What Works City, visitwww.WhatWorksCities.org.”

A New Source of Data for Public Health Surveillance: Facebook Likes


Paper by Steven Gittelman et al in the Journal of Medical Internet Research: “The development of the Internet and the explosion of social media have provided many new opportunities for health surveillance. The use of the Internet for personal health and participatory health research has exploded, largely due to the availability of online resources and health care information technology applications [18]. These online developments, plus a demand for more timely, widely available, and cost-effective data, have led to new ways epidemiological data are collected, such as digital disease surveillance and Internet surveys [825]. Over the past 2 decades, Internet technology has been used to identify disease outbreaks, track the spread of infectious disease, monitor self-care practices among those with chronic conditions, and to assess, respond, and evaluate natural and artificial disasters at a population level [6,8,11,12,14,15,17,22,2628]. Use of these modern communication tools for public health surveillance has proven to be less costly and more timely than traditional population surveillance modes (eg, mail surveys, telephone surveys, and face-to-face household surveys).

The Internet has spawned several sources of big data, such as Facebook [29], Twitter [30], Instagram [31], Tumblr [32], Google [33], and Amazon [34]. These online communication channels and market places provide a wealth of passively collected data that may be mined for purposes of public health, such as sociodemographic characteristics, lifestyle behaviors, and social and cultural constructs. Moreover, researchers have demonstrated that these digital data sources can be used to predict otherwise unavailable information, such as sociodemographic characteristics among anonymous Internet users [3538]. For example, Goel et al [36] found no difference by demographic characteristics in the usage of social media and email. However, the frequency with which individuals accessed the Web for news, health care, and research was a predictor of gender, race/ethnicity, and educational attainment, potentially providing useful targeting information based on ethnicity and income [36]. Integrating these big data sources into the practice of public health surveillance is vital to move the field of epidemiology into the 21st century as called for in the 2012 US “Big Data Research and Development Initiative” [19,39].

Understanding how big data can be used to predict lifestyle behavior and health-related data is a step toward the use of these electronic data sources for epidemiologic needs…(More)”

Americans’ Views on Open Government Data


The upshot has been the appearance of a variety of “open data” and “open government” initiatives throughout the United States that try to use data as a lever to improve government performance and encourage warmer citizens’ attitudes toward government.

This report is based on the first national survey that seeks to benchmark public sentiment about the government initiatives that use data to cultivate the public square. The survey, conducted by Pew Research Center in association with the John S. and James L. Knight Foundation, captures public views at the emergent moment when new technology tools and techniques are being used to disseminate and capitalize on government data and specifically looks at:

  • People’s level of awareness of government efforts to share data
  • Whether these efforts translate into people using data to track government performance
  • If people think government data initiatives have made, or have the potential to make, government perform better or improve accountability
  • The more routine kinds of government-citizen online interactions, such as renewing licenses or searching for the hours of public facilities.

The results cover all three levels of government in America — federal, state and local — and show that government data initiatives are in their early stages in the minds of most Americans. Generally, people are optimistic that these initiatives can make government more accountable; even though many are less sure open data will improve government performance. And government does touch people online, as evidenced by high levels of use of the internet for routine information applications. But most Americans have yet to delve too deeply into government data and its possibilities to closely monitor government performance.

Among the survey’s main findings:

As open data and open government initiatives get underway, most Americans are still largely engaged in “e-Gov 1.0” online activities, with far fewer attuned to “Data-Gov 2.0” initiatives that involve agencies sharing data online for public use….

Minorities of Americans say they pay a lot of attention to how governments share data with the public and relatively few say they are aware of examples where government has done a good (or bad) job sharing data. Less than one quarter use government data to monitor how government performs in several different domains….
Americans have mixed hopes about government data initiatives. People see the potential in these initiatives as a force to improve government accountability. However, the jury is still out for many Americans as to whether government data initiatives will improve government performance….
People’s baseline level of trust in government strongly shapes how they view the possible impact of open data and open government initiatives on how government functions…
Americans’ perspectives on trusting government are shaped strongly by partisan affiliation, which in turn makes a difference in attitudes about the impacts of government data initiatives…

Americans are for the most part comfortable with government sharing online data about their communities, although they sound cautionary notes when the data hits close to home…

Smartphone users have embraced information-gathering using mobile apps that rely on government data to function, but not many see a strong link between the underlying government data and economic value…

…(More)”

How Digital Transparency Became a Force of Nature


Daniel C. Dennett and Deb Roy in Scientific American: “More than half a billion years ago a spectacularly creative burst of biological innovation called the Cambrian explosion occurred. In a geologic “instant” of several million years, organisms developed strikingly new body shapes, new organs, and new predation strategies and defenses against them. Evolutionary biologists disagree about what triggered this prodigious wave of novelty, but a particularly compelling hypothesis, advanced by University of Oxford zoologist Andrew Parker, is that light was the trigger. Parker proposes that around 543 million years ago, the chemistry of the shallow oceans and the atmosphere suddenly changed to become much more transparent. At the time, all animal life was confined to the oceans, and as soon as the daylight flooded in, eyesight became the best trick in the sea. As eyes rapidly evolved, so did the behaviors and equipment that responded to them.

Whereas before all perception was proximal — by contact or by sensed differences in chemical concentration or pressure waves — now animals could identify and track things at a distance. Predators could home in on their prey; prey could see the predators coming and take evasive action. Locomotion is a slow and stupid business until you have eyes to guide you, and eyes are useless if you cannot engage in locomotion, so perception and action evolved together in an arms race. This arms race drove much of the basic diversification of the tree of life we have today.

Parker’s hypothesis about the Cambrian explosion provides an excellent parallel for understanding a new, seemingly unrelated phenomenon: the spread of digital technology. Although advances in communications technology have transformed our world many times in the past — the invention of writing signaled the end of prehistory; the printing press sent waves of change through all the major institutions of society — digital technology could have a greater impact than anything that has come before. It will enhance the powers of some individuals and organizations while subverting the powers of others, creating both opportunities and risks that could scarcely have been imagined a generation ago.

Through social media, the Internet has put global-scale communications tools in the hands of individuals. A wild new frontier has burst open. Services such as YouTube, Facebook, Twitter, Tumblr, Instagram, WhatsApp and SnapChat generate new media on a par with the telephone or television — and the speed with which these media are emerging is truly disruptive. It took decades for engineers to develop and deploy telephone and television networks, so organizations had some time to adapt. Today a social-media service can be developed in weeks, and hundreds of millions of people can be using it within months. This intense pace of innovation gives organizations no time to adapt to one medium before the arrival of the next.

The tremendous change in our world triggered by this media inundation can be summed up in a word: transparency. We can now see further, faster, and more cheaply and easily than ever before — and we can be seen. And you and I can see that everyone can see what we see, in a recursive hall of mirrors of mutual knowledge that both enables and hobbles. The age-old game of hide-and-seek that has shaped all life on the planet has suddenly shifted its playing field, its equipment and its rules. The players who cannot adjust will not last long.

The impact on our organizations and institutions will be profound. Governments, armies, churches, universities, banks and companies all evolved to thrive in a relatively murky epistemological environment, in which most knowledge was local, secrets were easily kept, and individuals were, if not blind, myopic. When these organizations suddenly find themselves exposed to daylight, they quickly discover that they can no longer rely on old methods; they must respond to the new transparency or go extinct. Just as a living cell needs an effective membrane to protect its internal machinery from the vicissitudes of the outside world, so human organizations need a protective interface between their internal affairs and the public world, and the old interfaces are losing their effectiveness….(More at Medium)”

21st-Century Public Servants: Using Prizes and Challenges to Spur Innovation


Jenn Gustetic at the Open Government Initiative Blog: “Thousands of Federal employees across the government are using a variety of modern tools and techniques to deliver services more effectively and efficiently, and to solve problems that relate to the missions of their Agencies. These 21st-century public servants are accomplishing meaningful results by applying new tools and techniques to their programs and projects, such as prizes and challenges, citizen science and crowdsourcing, open data, and human-centered design.

Prizes and challenges have been a particularly popular tool at Federal agencies. With 397 prizes and challenges posted on challenge.gov since September 2010, there are hundreds of examples of the many different ways these tools can be designed for a variety of goals. For example:

  • NASA’s Mars Balance Mass Challenge: When NASA’s Curiosity rover pummeled through the Martian atmosphere and came to rest on the surface of Mars in 2012, about 300 kilograms of solid tungsten mass had to be jettisoned to ensure the spacecraft was in a safe orientation for landing. In an effort to seek creative concepts for small science and technology payloads that could potentially replace a portion of such jettisoned mass on future missions, NASA released the Mars Balance Mass Challenge. In only two months, over 200 concepts were submitted by over 2,100 individuals from 43 different countries for NASA to review. Proposed concepts ranged from small drones and 3D printers to radiation detectors and pre-positioning supplies for future human missions to the planet’s surface. NASA awarded the $20,000 prize to Ted Ground of Rising Star, Texas for his idea to use the jettisoned payload to investigate the Mars atmosphere in a way similar to how NASA uses sounding rockets to study Earth’s atmosphere. This was the first time Ted worked with NASA, and NASA was impressed by the novelty and elegance of his proposal: a proposal that NASA likely would not have received through a traditional contract or grant because individuals, as opposed to organizations, are generally not eligible to participate in those types of competitions.
  • National Institutes of Health (NIH) Breast Cancer Startup Challenge (BCSC): The primary goals of the BCSC were to accelerate the process of bringing emerging breast cancer technologies to market, and to stimulate the creation of start-up businesses around nine federally conceived and owned inventions, and one invention from an Avon Foundation for Women portfolio grantee.  While NIH has the capacity to enable collaborative research or to license technology to existing businesses, many technologies are at an early stage and are ideally suited for licensing by startup companies to further develop them into commercial products. This challenge established 11 new startups that have the potential to create new jobs and help promising NIH cancer inventions support the fight against breast cancer. The BCSC turned the traditional business plan competition model on its head to create a new channel to license inventions by crowdsourcing talent to create new startups.

These two examples of challenges are very different, in terms of their purpose and the process used to design and implement them. The success they have demonstrated shouldn’t be taken for granted. It takes access to resources (both information and people), mentoring, and practical experience to both understand how to identify opportunities for innovation tools, like prizes and challenges, to use them to achieve a desired outcome….

Last month, the Challenge.gov program at the General Services Administration (GSA), the Office of Personnel Management (OPM)’s Innovation Lab, the White House Office of Science and Technology Policy (OSTP), and a core team of Federal leaders in the prize-practitioner community began collaborating with the Federal Community of Practice for Challenges and Prizes to develop the other half of the open innovation toolkit, the prizes and challenges toolkit. In developing this toolkit, OSTP and GSA are thinking not only about the information and process resources that would be helpful to empower 21st-century public servants using these tools, but also how we help connect these people to one another to add another meaningful layer to the learning environment…..

Creating an inventory of skills and knowledge across the 600-person (and growing!) Federal community of practice in prizes and challenges will likely be an important resource in support of a useful toolkit. Prize design and implementation can involve tricky questions, such as:

  • Do I have the authority to conduct a prize or challenge?
  • How should I approach problem definition and prize design?
  • Can agencies own solutions that come out of challenges?
  • How should I engage the public in developing a prize concept or rules?
  • What types of incentives work best to motivate participation in challenges?
  • What legal requirements apply to my prize competition?
  • Can non-Federal employees be included as judges for my prizes?
  • How objective do the judging criteria need to be?
  • Can I partner to conduct a challenge? What’s the right agreement to use in a partnership?
  • Who can win prize money and who is eligible to compete? …(More)

Chinese air quality and social media


David C. Roberts at Quartz: “Every year, outdoor air pollution kills more people worldwide than malaria and HIV combined. People in China, particularly in its largest cities, are some of the most affected, since the country’s rapid economic growth has come at the cost of air quality. This issue remained largely unaddressed until the US embassy in Beijing began to tweet out air quality data in 2008, providing a remarkable demonstration of the transformative power of democratizing data. The tweets sparked an energetic environmental movement that forced China’s leaders to acknowledge the massive scale of the problem and begin to take measures to combat it.

The initiative to publicize air quality data was subsequently expanded to US consulates in several major Chinese cities, providing a wealth of new scientific data.  I recently worked with Federico San Martini and Christa Hasenkopf (both atmospheric scientists at the US State Department who are involved in this program) to analyze this data…(More)”

Thinking Ahead – Essays on Big Data, Digital Revolution, and Participatory Market Society


New book by Dirk Helbing: “The rapidly progressing digital revolution is now touching the foundations of the governance of societal structures. Humans are on the verge of evolving from consumers to prosumers, and old, entrenched theories – in particular sociological and economic ones – are falling prey to these rapid developments. The original assumptions on which they are based are being questioned. Each year we produce as much data as in the entire human history – can we possibly create a global crystal ball to predict our future and to optimally govern our world? Do we need wide-scale surveillance to understand and manage the increasingly complex systems we are constructing, or would bottom-up approaches such as self-regulating systems be a better solution to creating a more innovative, more successful, more resilient, and ultimately happier society? Working at the interface of complexity theory, quantitative sociology and Big Data-driven risk and knowledge management, the author advocates the establishment of new participatory systems in our digital society to enhance coordination, reduce conflict and, above all, reduce the “tragedies of the commons,” resulting from the methods now used in political, economic and management decision-making….(More)”

Modern Methods for Sentiment Analysis


Review by Michael Czerny: “Sentiment analysis is a common application of Natural Language Processing (NLP) methodologies, particularly classification, whose goal is to extract the emotional content in text. In this way, sentiment analysis can be seen as a method to quantify qualitative data with some sentiment score. While sentiment is largely subjective, sentiment quantification has enjoyed many useful implementations, such as businesses gaining understanding about consumer reactions to a product, or detecting hateful speech in online comments.

The simplest form of sentiment analysis is to use a dictionary of good and bad words. Each word in a sentence has a score, typically +1 for positive sentiment and -1 for negative. Then, we simply add up the scores of all the words in the sentence to get a final sentiment total. Clearly, this has many limitations, the most important being that it neglects context and surrounding words. For example, in our simple model the phrase “not good” may be classified as 0 sentiment, given “not” has a score of -1 and “good” a score of +1. A human would likely classify “not good” as negative, despite the presence of “good”.

Another common method is to treat a text as a “bag of words”. We treat each text as a 1 by N vector, where N is the size of our vocabulary. Each column is a word, and the value is the number of times that word appears. For example, the phrase “bag of bag of words” might be encoded as [2, 2, 1]. This could then be fed into a machine learning algorithm for classification, such as logistic regression or SVM, to predict sentiment on unseen data. Note that this requires data with known sentiment to train on in a supervised fashion. While this is an improvement over the previous method, it still ignores context, and the size of the data increases with the size of the vocabulary.

Word2Vec and Doc2Vec

Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Word2Vec is actually two different methods: Continuous Bag of Words (CBOW) and Skip-gram. In the CBOW method, the goal is to predict a word given the surrounding words. Skip-gram is the converse: we want to predict a window of words given a single word (see Figure 1). Both methods use artificial neural networks as their classification algorithm. Initially, each word in the vocabulary is a random N-dimensional vector. During training, the algorithm learns the optimal vector for each word using the CBOW or Skip-gram method….(More)

The Rule of History


Jill Lepore about Magna Carta, the Bill of Rights, and the hold of time in The New Yorker: “…Magna Carta has been taken as foundational to the rule of law, chiefly because in it King John promised that he would stop throwing people into dungeons whenever he wished, a provision that lies behind what is now known as due process of law and is understood not as a promise made by a king but as a right possessed by the people. Due process is a bulwark against injustice, but it wasn’t put in place in 1215; it is a wall built stone by stone, defended, and attacked, year after year. Much of the rest of Magna Carta, weathered by time and for centuries forgotten, has long since crumbled, an abandoned castle, a romantic ruin.

Magna Carta is written in Latin. The King and the barons spoke French. “Par les denz Dieu!” the King liked to swear, invoking the teeth of God. The peasants, who were illiterate, spoke English. Most of the charter concerns feudal financial arrangements (socage, burgage, and scutage), obsolete measures and descriptions of land and of husbandry (wapentakes and wainages), and obscure instruments for the seizure and inheritance of estates (disseisin and mort d’ancestor). “Men who live outside the forest are not henceforth to come before our justices of the forest through the common summonses, unless they are in a plea,” one article begins.

Magna Carta’s importance has often been overstated, and its meaning distorted. “The significance of King John’s promise has been anything but constant,” U.S. Supreme Court Justice John Paul Stevens aptly wrote, in 1992. It also has a very different legacy in the United States than it does in the United Kingdom, where only four of its original sixty-some provisions are still on the books. In 2012, three New Hampshire Republicans introduced into the state legislature a bill that required that “all members of the general court proposing bills and resolutions addressing individual rights or liberties shall include a direct quote from the Magna Carta which sets forth the article from which the individual right or liberty is derived.” For American originalists, in particular, Magna Carta has a special lastingness. “It is with us every day,” Justice Antonin Scalia said in a speech at a Federalist Society gathering last fall.

Much has been written of the rule of law, less of the rule of history. Magna Carta, an agreement between the King and his barons, was also meant to bind the past to the present, though perhaps not in quite the way it’s turned out. That’s how history always turns out: not the way it was meant to. In preparation for its anniversary, Magna Carta acquired a Twitter username: @MagnaCarta800th….(More)”