Science Isn’t Broken


Christie Aschwanden at FiveThirtyEight: “Yet even in the face of overwhelming evidence, it’s hard to let go of a cherished idea, especially one a scientist has built a career on developing. And so, as anyone who’s ever tried to correct a falsehood on the Internet knows, the truth doesn’t always win, at least not initially, because we process new evidence through the lens of what we already believe. Confirmation bias can blind us to the facts; we are quick to make up our minds and slow to change them in the face of new evidence.

A few years ago, Ioannidis and some colleagues searched the scientific literature for references to two well-known epidemiological studies suggesting that vitamin E supplements might protect against cardiovascular disease. These studies were followed by several large randomized clinical trials that showed no benefit from vitamin E and one meta-analysis finding that at high doses, vitamin E actually increased the risk of death.

Human fallibilities send the scientific process hurtling in fits, starts and misdirections instead of in a straight line from question to truth.

Despite the contradictory evidence from more rigorous trials, the first studies continued to be cited and defended in the literature. Shaky claims about beta carotene’s ability to reduce cancer risk and estrogen’s role in staving off dementia also persisted, even after they’d been overturned by more definitive studies. Once an idea becomes fixed, it’s difficult to remove from the conventional wisdom.

Sometimes scientific ideas persist beyond the evidence because the stories we tell about them feel true and confirm what we already believe. It’s natural to think about possible explanations for scientific results — this is how we put them in context and ascertain how plausible they are. The problem comes when we fall so in love with these explanations that we reject the evidence refuting them.

The media is often accused of hyping studies, but scientists are prone to overstating their results too.

Take, for instance, the breakfast study. Published in 2013, it examined whether breakfast eaters weigh less than those who skip the morning meal and if breakfast could protect against obesity. Obesity researcher Andrew Brown and his colleagues found that despite more than 90 mentions of this hypothesis in published media and journals, the evidence for breakfast’s effect on body weight was tenuous and circumstantial. Yet researchers in the field seemed blind to these shortcomings, overstating the evidence and using causative language to describe associations between breakfast and obesity. The human brain is primed to find causality even where it doesn’t exist, and scientists are not immune.

As a society, our stories about how science works are also prone to error. The standard way of thinking about the scientific method is: ask a question, do a study, get an answer. But this notion is vastly oversimplified. A more common path to truth looks like this: ask a question, do a study, get a partial or ambiguous answer, then do another study, and then do another to keep testing potential hypotheses and homing in on a more complete answer. Human fallibilities send the scientific process hurtling in fits, starts and misdirections instead of in a straight line from question to truth.

Media accounts of science tend to gloss over the nuance, and it’s easy to understand why. For one thing, reporters and editors who cover science don’t always have training on how to interpret studies. And headlines that read “weak, unreplicated study finds tenuous link between certain vegetables and cancer risk” don’t fly off the newsstands or bring in the clicks as fast as ones that scream “foods that fight cancer!”

People often joke about the herky-jerky nature of science and health headlines in the media — coffee is good for you one day, bad the next — but that back and forth embodies exactly what the scientific process is all about. It’s hard to measure the impact of diet on health, Nosek told me. “That variation [in results] occurs because science is hard.” Isolating how coffee affects health requires lots of studies and lots of evidence, and only over time and in the course of many, many studies does the evidence start to narrow to a conclusion that’s defensible. “The variation in findings should not be seen as a threat,” Nosek said. “It means that scientists are working on a hard problem.”

The scientific method is the most rigorous path to knowledge, but it’s also messy and tough. Science deserves respect exactly because it is difficult — not because it gets everything correct on the first try. The uncertainty inherent in science doesn’t mean that we can’t use it to make important policies or decisions. It just means that we should remain cautious and adopt a mindset that’s open to changing course if new data arises. We should make the best decisions we can with the current evidence and take care not to lose sight of its strength and degree of certainty. It’s no accident that every good paper includes the phrase “more study is needed” — there is always more to learn….(More)”

Review Federal Agencies on Yelp…and Maybe Get a Response


Yelp Official Blog: “We are excited to announce that Yelp has concluded an agreement with the federal government that will allow federal agencies and offices to claim their Yelp pages, read and respond to reviews, and incorporate that feedback into service improvements.

We encourage Yelpers to review any of the thousands of agency field offices, TSA checkpoints, national parks, Social Security Administration offices, landmarks and other places already listed on Yelp if you have good or bad feedback to share about your experiences. Not only is it helpful to others who are looking for information on these services, but you can actually make an impact by sharing your feedback directly with the source.

It’s clear Washington is eager to engage with people directly through social media. Earlier this year a group of 46 lawmakers called for the creation of a “Yelp for Government” in order to boost transparency and accountability, and Representative Ron Kind reiterated this call in a letter to the General Services Administration (GSA). Luckily for them, there’s no need to create a new platform now that government agencies can engage directly on Yelp.

As this agreement is fully implemented in the weeks and months ahead, we’re excited to help the federal government more directly interact with and respond to the needs of citizens and to further empower the millions of Americans who use Yelp every day.

In addition to working with the federal government, last week we announced our our partnership with ProPublica to incorporate health care statistics and consumer opinion survey data onto the Yelp business pages of more than 25,000 medical treatment facilities. We’ve also partnered with local governments in expanding the LIVES open data standard to show restaurant health scores on Yelp….(More)”

Can big databases be kept both anonymous and useful?


The Economist: “….The anonymisation of a data record typically means the removal from it of personally identifiable information. Names, obviously. But also phone numbers, addresses and various intimate details like dates of birth. Such a record is then deemed safe for release to researchers, and even to the public, to make of it what they will. Many people volunteer information, for example to medical trials, on the understanding that this will happen.

But the ability to compare databases threatens to make a mockery of such protections. Participants in genomics projects, promised anonymity in exchange for their DNA, have been identified by simple comparison with electoral rolls and other publicly available information. The health records of a governor of Massachusetts were plucked from a database, again supposedly anonymous, of state-employee hospital visits using the same trick. Reporters sifting through a public database of web searches were able to correlate them in order to track down one, rather embarrassed, woman who had been idly searching for single men. And so on.

Each of these headline-generating stories creates a demand for more controls. But that, in turn, deals a blow to the idea of open data—that the electronic “data exhaust” people exhale more or less every time they do anything in the modern world is actually useful stuff which, were it freely available for analysis, might make that world a better place.

Of cake, and eating it

Modern cars, for example, record in their computers much about how, when and where the vehicle has been used. Comparing the records of many vehicles, says Viktor Mayer-Schönberger of the Oxford Internet Institute, could provide a solid basis for, say, spotting dangerous stretches of road. Similarly, an opening of health records, particularly in a country like Britain, which has a national health service, and cross-fertilising them with other personal data, might help reveal the multifarious causes of diseases like Alzheimer’s.

This is a true dilemma. People want both perfect privacy and all the benefits of openness. But they cannot have both. The stripping of a few details as the only means of assuring anonymity, in a world choked with data exhaust, cannot work. Poorly anonymised data are only part of the problem. What may be worse is that there is no standard for anonymisation. Every American state, for example, has its own prescription for what constitutes an adequate standard.

Worse still, devising a comprehensive standard may be impossible. Paul Ohm of Georgetown University, in Washington, DC, thinks that this is partly because the availability of new data constantly shifts the goalposts. “If we could pick an industry standard today, it would be obsolete in short order,” he says. Some data, such as those about medical conditions, are more sensitive than others. Some data sets provide great precision in time or place, others merely a year or a postcode. Each set presents its own dangers and requirements.

Fortunately, there are a few easy fixes. Thanks in part to the headlines, many now agree that public release of anonymised data is a bad move. Data could instead be released piecemeal, or kept in-house and accessible by researchers through a question-and-answer mechanism. Or some users could be granted access to raw data, but only in strictly controlled conditions.

All these approaches, though, are anathema to the open-data movement, because they limit the scope of studies. “If we’re making it so hard to share that only a few have access,” says Tim Althoff, a data scientist at Stanford University, “that has profound implications for science, for people being able to replicate and advance your work.”

Purely legal approaches might mitigate that. Data might come with what have been called “downstream contractual obligations”, outlining what can be done with a given data set and holding any onward recipients to the same standards. One perhaps draconian idea, suggested by Daniel Barth-Jones, an epidemiologist at Columbia University, in New York, is to make it illegal even to attempt re-identification….(More).”

President Obama Signs Executive Order Making Presidential Innovation Fellows Program Permanent


White House Press Release: “My hope is this continues to encourage a culture of public service among our innovators, and tech entrepreneurs, so that we can keep building a government that’s as modern, as innovative, and as engaging as our incredible tech sector is.  To all the Fellows who’ve served so far – thank you.  I encourage all Americans with bold ideas to apply.  And I can’t wait to see what those future classes will accomplish on behalf of the American people.” –- President Barack Obama

Today, President Obama signed an executive order that makes the Presidential Innovation Fellows Program a permanent part of the Federal government going forward. The program brings executives, entrepreneurs, technologists, and other innovators into government, and teams them up with Federal employees to improve programs that serve more than 150 million Americans.

The Presidential Innovation Fellows Program is built on four key principles:

  • Recruit the best our nation has to offer: Fellows include entrepreneurs, startup founders, and innovators with experience at large technology companies and startups, each of whom leverage their proven skills and technical expertise to create huge value for the public.
  • Partner with innovators inside government: Working as teams, the Presidential Innovation Fellows and their partners across the government create products and services that are responsive, user-friendly, and help to improve the way the Federal government interacts with the American people.
  • Deploy proven private sector strategies: Fellows leverage best practices from the private sector to deliver better, more effective programs and policies across the Federal government.
  • Focus on some of the Nation’s biggest and most pressing challenges: Projects focus on topics such as improving access to education, fueling job creation and the economy, and expanding the public’s ability to access their personal health data.

Additional Details on Today’s Announcements

The Executive Order formally establishes the Presidential Innovation Fellows Program within the General Services Administration (GSA), where it will continue to serve departments and agencies throughout the Executive Branch. The Presidential Innovation Fellow Program will be administered by a Director and guided by a newly-established Advisory Board. The Director will outline steps for the selection, hiring, and deployment of Fellows within government….

Fellows have partnered with leaders at more than 25 government agencies, delivering impressive results in months, not years, driving extraordinary work and innovative solutions in areas such as health care; open data and data science; crowd-sourcing initiatives; education; veterans affairs; jobs and the economy; and disaster response and recovery. Examples of projects include:

Open Data

When government acts as a platform, entrepreneurs, startups, and the private sector can build value-added services and tools on top of federal datasets supported by federal policies. Taking this approach, Fellows and agency stakeholders have supported the creation of new products and services focused on education, health, the environment, and social justice. As a result of their efforts and the agencies they have worked with:….

Jobs and the Economy

Fellows continue to work on solutions that will give the government better access to innovative tools and services. This is also helping small and medium-sized companies create jobs and compete for Federal government contracts….

Digital Government

The Presidential Innovation Fellows Program is a part of the Administration’s strategy to create lasting change across the Federal Government by improving how it uses technology. The Fellows played a part in launching 18F within the General Services Administration (GSA) and the U.S. Digital Services (USDS) team within the Office of Management and Budget….

Supporting Our Veterans

  • …Built a one-stop shop for finding employment opportunities. The Veterans Employment Center was developed by a team of Fellows working with the Department of Veterans Affairs in connection with the First Lady’s Joining Forces Initiative and the Department of Labor. This is the first interagency website connecting Veterans, transitioning Servicemembers, and their spouses to meaningful employment opportunities. The portal has resulted in cost savings of over $27 million to the Department of Veterans Affairs.

Education

  • …More than 1,900 superintendents pledged to more effectively leverage education technology in their schools. Fellows working at the Department of Education helped develop the idea of Future Ready, which later informed the creation of the Future Ready District Pledge. The Future Ready District Pledge is designed to set out a roadmap to achieve successful personalized digital learning for every student and to commit districts to move as quickly as possible towards our shared vision of preparing students for success. Following the President’s announcement of this effort in 2014, more than 1,900 superintendents have signed this pledge, representing 14 million students.

Health and Patient Care

  • More than 150 million Americans are able to access their health records online. Multiple rounds of Fellows have worked with the Department of Health and Human Services (HHS) and the Department of Veterans Affairs (VA) to expand the reach of theBlue Button Initiative. As a result, patients are able to access their electronic health records to make more informed decisions about their own health care. The Blue Button Initiative has received more than 600 commitments from organizations to advance health information access efforts across the country and has expanded into other efforts that support health care system interoperability….

Disaster Response and Recovery

  • Communities are piloting crowdsourcing tools to assess damage after disasters. Fellows developed the GeoQ platform with FEMA and the National Geospatial-Intelligence Agency that crowdsources photos of disaster-affected areas to assess damage over large regions.  This information helps the Federal government better allocate critical response and recovery efforts following a disaster and allows local governments to use geospatial information in their communities…. (More)

How to predict rising home prices, neighborhood change and gentrification


Emily Badger in the Washington Post: “…In neighborhoods like this undergoing rapid change, there’s a deep gulf between what we can see — someone is trying to build something — and what we know about what’s really happening. How big will that apartment be? When is it supposed to be finished? And, because I know you’re wondering: What are they planning to do about parking?This information, which can be gleaned from the magnificent treasure that is government building permits, often publicly exists. But it’s never really been democratized. A group of tech companies and pilot cities is trying to do that now in ways that could have some fascinating implications. Imagine if you had a location-aware app that could call up the details of a construction site as easily as Redfin can show you the nearest for-sale home….

So Zillow, Accela and several other partners and local governments including Tampa, San Diego and Chattanooga have developed a common standard all cities can use to publish data about building and construction permits. The concept has important precedent: Google helped coax cities to standardize their transit data so you can track bus and train routes on Google Maps. Yelp has tried to do the same with municipal restaurant inspection data so you can see health scores when you’re scouting dinner.

Building permit data similarly has the potential to change how consumers, researchers and cities themselves understand the built world around us. Imagine, to give another example, if an app revealed that the loud construction site in your neighbor’s back yard had no permits attached to it. What if you could click one link and tell the city that, speeding up the bureaucracy around illegal construction? …(More)”

How understanding the ‘shape’ of data could change our world


Gurjeet Singh at the WEF: “We live in an extraordinary time. The capacity to generate and to store data has reached dizzying proportions. What lies within that data represents the chance for this generation to solve its most pressing problems – from disease and climate change, to healthcare and customer understanding.

The magnitude of the opportunity is defined by the magnitude of the data that is created – and it is astonishing….

Despite the technical advances in collection and storage, knowledge generation lags. This is a function of how organizations approach their data, how they conduct analyses, how they automate learning through machine intelligence.

At its heart, it is a mathematical problem. For any dataset the total number of possible hypotheses/queries is exponential in the size of the data. Exponential functions are difficult enough for humans to comprehend; however, to further complicate matters, the size of the data itself is growing exponentially, and is about to hit another inflection point as the Internet of Things kicks in.


What that means is that we are facing double exponential growth in the number of questions that we can ask of our data. If we choose the same approaches that have served us over time – iteratively asking questions of the data until we get the right answer – we will have lost out on opportunity to grasp our generational opportunity.

There are not, and will not ever be enough data scientists in the world to be successful in this approach. We cannot arm enough citizen data scientists with new software to be successful in this approach. Software that makes question asking or hypothesis development more accessible or more efficient miss the central premise that they will only fall further behind as new data becomes available each millisecond.

To truly unlock the value that lies within our data we need to turn our attention to the data, setting aside the questions for later. This too, turns out to be a mathematical problem. Data, it turns out, has shape. That shape has meaning. The shape of data tells you everything you need to know about your data from its obvious features to its secret secrets.

We understand that regression produces lines.

data2

We know that customer segmentation produces groups.

data3

We know that economic growth and interest rates have a cyclical nature (diseases like malaria have this shape too).

data4

By knowing the shape and where we are in the shape, we vastly improve our understanding of where we are, where we have been and perhaps more importantly, what might happen next. In understanding the shape of data we understand every feature of the dataset, immediately grasping what it is important in the data, thus dramatically reducing the number of questions to ask and accelerating the discovery process.

By changing our thinking – and starting with the shape of the data, not a series of questions (which very often come with significant biases) – we can extract knowledge from these rapidly growing, massive and complex datasets.

The knowledge that lies hidden within electronic medical records, billing records and clinical records is enough to transform how we deliver healthcare and how we treat diseases. The knowledge that lies within the massive data stores of governments, universities and other institutions will illuminate the conversation on climate change and point the way to answers on what we need to do to protect the planet for future generations. The knowledge that is obscured by web, transaction, CRM, social and other data will inform a clearer, more meaningful picture of the customer and will, in turn define the optimal way to interact.

This is the opportunity for our generation to turn data into knowledge. To get there will require a different approach, but one with the ability to impact the entirety of humankind….(More)

IBM using Watson to build a “SIRI for Cities”


 at FastCompany: “A new app that incorporates IBM’s Watson cognitive computing platform is like Siri for ordering city services.

IBM said today that the city of Surrey, in British Columbia, Canada, has rolled out the new app, which leverages Watson’s sophisticated language and data analysis system to allow residents to make requests for things like finding out why their trash wasn’t picked up or how to find a lost cat using natural language.

Watson is best known as the computer system that autonomously vanquished the world’s best Jeopardy players during a highly publicized competition in 2011. In the years since, IBM has applied the system to a wide range of computing problems in industries like health care, banking, retail, and education. The system is based on Watson’s ability to understand natural language queries and to analyze huge data sets.

Recently, Watson rolled out a tool designed to help people detect the tone in their writing.

Surrey worked with the developer Purple Forge to build the new city services app, which will be combined with the city’s existing “My Surrey” mobile and web tools. IBM said that residents can ask a wide range of questions on devices like smartphones, laptops, or even Apple Watches. Big Blue said Surrey’s app is the first time Watson has been utilized in a “citizen services” app.

The tool offers a series of frequently asked questions, but also allows residents in the city of nearly half a million to come up with their own. IBM said Surrey officials are hopeful that the app will help them be more responsive to residents’ concerns.

Among the services users can ask about are those provided by Surrey’s police and fire departments, animal control, parking enforcement, trash pickup, and others….(More)”

Algorithms and Bias


Q. and A. With Cynthia Dwork in the New York Times: “Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.

Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.

Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.

Q: Some people have argued that algorithms eliminate discriminationbecause they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?

A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.

Q: Are there examples of that happening?

A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.

This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.

Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?

A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination….

Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?

A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research. …(More)”

ENGAGE: Building and Harnessing Networks for Social Impact


Faizal Karmali and Claudia Juech at the Rockefeller Foundation: “Have you heard of ‘X’ organization? They’re doing interesting work that you should know about. You might even want to work together.”

Words like these abound between individuals at conferences, at industry events, in email, and, all too often, trapped in the minds of those who see the potential in connecting the dots. Bridging individuals, organizations, or ideas is fulfilling because these connections often result in value for everyone, sometimes immediately, but often over the long term. While many of us can think of that extraordinary network connector in our personal or professional circles, if asked to identify an organization that plays a similar role at scale, across multiple sectors, we may be hard-pressed to name more than a few—let alone understand how they do it well….

In an effort to capture and codify the growing breadth of knowledge and experience around leveraging networks for social impact, the Monitor Institute, a part of Deloitte Consulting, with support from The Rockefeller Foundation, have produced ENGAGE: How Funders Can Support and Leverage Networks for Social Impact— an online guide which offers a series of frameworks, tools, insights, and stories to help funders explore the critical questions around using networks as part of their grantmaking strategy—particularly as a means to accelerating impact….

ENGAGE draws on the experience and knowledge of over 40 leaders and practitioners in the field who are using networks to create change; digs into the deep pool of writing on the topic; and mines the significant experience in working with networks that is resident in both Monitor Institute and The Rockefeller Foundation. The result is an aggregation and synthesis of some of the leading thinking in both the theory and practice of engaging with networks as a grantmaker.

Compelling examples on how the Foundation leverages the power of networks can be seen in the creation of formal network institutions like the Global Impact Investing Network (GIIN) and the Joint Learning Network for Universal Health Coverage, but also through more targeted and time-bound network engagement activities, such as enabling greater connectivity among grantees and unleashing the power of technology to surface innovation from loosely curated crowds.

Building and harnessing networks is more an art than a science. It is our hope that ENGAGE will enable grantmakers and other network practitioners to be more deliberate and thoughtful about how and when a network can help accelerate their work…. (More)

Yelp’s Consumer Protection Initiative: ProPublica Partnership Brings Medical Info to Yelp


Yelp Official Blog: “…exists to empower and protect consumers, and we’re continually focused on how we can enhance our service while enhancing the ability for consumers to make smart transactional decisions along the way.

A few years ago, we partnered with local governments to launch the LIVES open data standard. Now, millions of consumers find restaurant inspection scores when that information is most relevant: while they’re in the middle of making a dining decision (instead of when they’re signing the check). Studies have shown that displaying this information more prominently has a positive impact.

Today we’re excited to announce we’ve joined forces with ProPublica to incorporate health care statistics and consumer opinion survey data onto the Yelp business pages of more than 25,000 medical treatment facilities. Read more in today’s Washington Post story.

We couldn’t be more excited to partner with ProPublica, the Pulitzer Prize winning non-profit newsroom that produces investigative journalism in the public interest.

The information is compiled by ProPublica from their own research and the Centers for Medicare and Medicaid Services (CMS) for 4,600 hospitals, 15,000 nursing homes, and 6,300 dialysis clinics in the US and will be updated quarterly. Hover text on the business page will explain the statistics, which include number of serious deficiencies and fines per nursing home and emergency room wait times for hospitals. For example, West Kendall Baptist Hospital has better than average doctor communication and an average 33 minute ER wait time, Beachside Nursing Center currently has no deficiencies, and San Mateo Dialysis Center has a better than average patient survival rate.

Now the millions of consumers who use Yelp to find and evaluate everything from restaurants to retail will have even more information at their fingertips when they are in the midst of the most critical life decisions, like which hospital to choose for a sick child or which nursing home will provide the best care for aging parents….(More)