David Bornstein in the New York Times: “…For all the attention it’s getting inside the administration, evidence-based policy-making seems unlikely to become a headline grabber; it lacks emotional appeal. But it does have intellectual heft. And one group that has been doing creative work to give the message broader appeal is Results for America, which has produced useful teaching aids under the banner “Moneyball for Government,” building on the popularity of the book and movie about Billy Beane’s Oakland A’s, and the rise of data-driven decision making in major league baseball. (Watch their video explainers here and here.)
Results for America works closely with leaders across political parties and social sectors, to build awareness about evidence-based policy making — drawing attention to key areas where government could dramatically improve people’s lives by augmenting well-tested models. They are also chronicling efforts by local governments around the country, to show how an emerging group of “Geek Cities,” including Baltimore, Denver, Miami, New York, Providence and San Antonio, are using data and evidence to drive improvements in various areas of social policy like education, youth development and employment.
“It seems like common sense to use evidence about what works to get better results,” said Michele Jolin, Results for America’s managing partner. “How could anyone be against it? But the way our system is set up, there are so many loud voices pushing to have dollars spent and policy shaped in the way that works for them. There has been no organized constituency for things that work.”
“The debate in Washington is usually about the quantity of resources,” said David Medina, a partner in Results for America. “We’re trying to bring it back to talking about quality.”
Not everyone will find this change appealing. “When you have a longstanding social service policy, there’s going to be a network of [people and groups] who are organized to keep that money flowing regardless of whether evidence suggests it’s warranted,” said Daniel Stid. “People in social services don’t like to think they’re behaving like other organized interests — like dairy farmers or mortgage brokers — but it leads to tremendous inertia in public policy.”
Beyond the politics, there are practical obstacles to overcome, too. Federal agencies lack sufficient budgets for evaluation or a common definition for what constitutes rigorous evidence. (Any lobbyist can walk into a legislator’s office and claim to have solid data to support an argument.) Up-to-date evidence also needs to be packaged in accessible ways and made available on a timely basis, so it can be used to improve programs, rather than to threaten them. Governments need to build regular evaluations into everything they do — not just conduct big, expensive studies every 10 years or so.
That means developing new ways to conduct quick and inexpensive randomized studies using data that is readily available, said Haskins, who is investigating this approach. “We should be running 10,000 evaluations a year, like they do in medicine.” That’s the only way to produce the rapid trial-and-error learning needed to drive iterative program improvements, he added. (I reported on a similar effort being undertaken by the Coalition for Evidence-Based Policy.)
Results for America has developed a scorecard to rank federal departments about how prepared they are to produce or incorporate evidence in their programs. It looks at whether a department has an office and a leader with the authority and budget to evaluate its programs. It asks: Does it make its data accessible to the public? Does it compile standards about what works and share them widely? Does it spend at least 1 percent of its budget evaluating its programs? And — most important — does it incorporate evidence in its big grant programs? For now, the Department of Education gets the top score.
The stakes are high. In 2011, for example, the Obama administration launched a process to reform Head Start, doing things like spreading best practices and forcing the worst programs to improve or lose their funding. This February, for the third time, the government released a list of Head Start providers (103 out of about 1,600) who will have to recompete for federal funding because of performance problems. That list represents tens of thousands of preschoolers, many of whom are missing out on the education they need to succeed in kindergarten — and life.
Improving flagship programs like Head Start, and others, is not just vital for the families they serve; it’s vital to restore trust in government. “I am a card-carrying member of the Republican Party and I want us to be governed well,” said Robert Shea, who pushed for better program evaluations as associate director of the Office of Management and Budget during the Bush administration, and continues to focus on this issue as chairman of the National Academy of Public Administration. “This is the most promising thing I know of to get us closer to that goal.”
“This idea has the prospect of uniting Democrats and Republicans,” said Haskins. “But it will involve a broad cultural change. It has to get down to the program administrators, board members and local staff throughout the country — so they know that evaluation is crucial to their operations.”
“There’s a deep mistrust of government and a belief that problems can’t be solved,” said Michele Jolin. “This movement will lead to better outcomes — and it will help people regain confidence in their public officials by creating a more effective, more credible way for policy choices to be made.”
Passage Of The DATA Act Is A Major Advance In Government Transparency
OpEd by Hudson Hollister in Forbes: “Even as the debate over official secrecy grows on Capitol Hill, basic information about our government’s spending remains hidden in plain sight.
Information that is technically public — federal finance, awards, and expenditures — is effectively locked within a disconnected disclosure system that relies on outdated paper-based technology. Budgets, grants, contracts, and disbursements are reported manually and separately, using forms and spreadsheets. Researchers seeking insights into federal spending must invest time and resources crafting data sets out of these documents. Without common data standards across all government spending, analyses of cross-agency spending trends require endless conversions of apples to oranges.
For a nation whose tech industry leads the world, there is no reason to allow this antiquated system to persist.
That’s why we’re excited to welcome Thursday’s unanimous Senate approval of the Digital Accountability and Transparency Act — known as the DATA Act.
The DATA Act will mandate government-wide standards for federal spending data. It will also require agencies to publish this information online, fully searchable and open to everyone.
Watchdogs and transparency advocates from across the political spectrum have endorsed the DATA Act because all Americans will benefit from clear, accessible information about how their tax dollars are being spent.
It is darkly appropriate that the only organized opposition to this bill took place behind closed doors. In January, Senate sponsors Mark Warner (D-VA) and Rob Portman (R-OH) rejected amendments offered privately by the White House Office of Management and Budget. These nonpublic proposals would have gutted the DATA Act’s key data standards requirement. But Warner and Portman went public with their opposition, and Republicans and Democrats agreed to keep a strong standards mandate.
We now await swift action by the House of Representatives to pass this bill and put it on the President’s desk.
The tech industry is already delivering the technology and expertise that will use federal spending data, once it is open and standardized, to solve problems.
If the DATA Act is fully enforced, citizens will be able to track government spending on a particular contractor or from a particular program, payment by payment. Agencies will be able to deploy sophisticated Big Data analytics to illuminate, and eliminate, waste and fraud. And states and universities will be able to automate their complex federal grant reporting tasks, freeing up more tax dollars for their intended use. Our industry can perform these tasks — as soon as we get the data.
Chairman Earl Devaney’s Recovery Accountability and Transparency Board proved this is possible. Starting in 2009, the Recovery Board applied data standards to track stimulus spending. Our members’ software used that data to help inspectors general prevent and recover over $100 million in spending on suspicious grantees and contractors. The DATA Act applies that approach across the whole of government spending.
Congress is now poised to pass this landmark legislative mandate to transform spending from disconnected documents into open data. Next , the executive branch must implement that mandate.
So our Coalition’s work continues. We will press the Treasury Department and the White House to adopt robust, durable, and nonproprietary data standards for federal spending.
And we won’t stop with spending transparency. The American people deserve access to open data across all areas of government activity — financial regulatory reporting, legislative actions, judicial filings, and much more….”
The Open Data 500: Putting Research Into Action
TheGovLab Blog: “On April 8, the GovLab made two significant announcements. At an open data event in Washington, DC, I was pleased to announce the official launch of the Open Data 500, our study of 500 companies that use open government data as a key business resource. We also announced that the GovLab is now planning a series of Open Data Roundtables to bring together government agencies with the businesses that use their data – and that five federal agencies have agreed to participate. Video of the event, which was hosted by the Center for Data Innovation, is available here.
The Open Data 500, funded by the John S. and James L. Knight Foundation, is the first comprehensive study of U.S.-based companies that rely on open government data. Our website at OpenData500.com includes searchable, sortable information on 500 of these companies. Our data about them comes from responses to a survey we’ve sent to all the companies (190 have responded) and what we’ve been able to learn from research using public information. Anyone can now explore this website, read about specific companies or groups of companies, or download our data to analyze it. The website features an interactive tool on the home page, the Open Data Compass, that shows the connections between government agencies and different categories of companies visually.
We began work on the Open Data 500 study last fall with three goals. First, we wanted to collect information that will ultimately help calculate the economic value of open data – an important question for policymakers and others. Second, we wanted to present examples of open data companies to inspire others to use this important government resource in new ways. And third – and perhaps most important – we’ve hoped that our work will be a first step in creating a dialogue between the government agencies that provide open data and the companies that use it.
That dialogue is critically important to make government open data more accessible and useful. While open government data is a huge potential resource, and federal agencies are working to make it more available, it’s too often trapped in legacy systems that make the data difficult to find and to use. To solve this problem, we plan to connect agencies to their clients in the business community and help them work together to find and liberate the most valuable datasets.
We now plan to convene and facilitate a series of Open Data Roundtables – a new approach to bringing businesses and government agencies together. In these Roundtables, which will be informed by the Open Data 500 study, companies and the agencies that provide their data will come together in structured, results-oriented meetings that we will facilitate. We hope to help figure out what can be done to make the most valuable datasets more available and usable quickly.
We’ve been gratified by the immediate positive response to our plan from several federal agencies. The Department of Commerce has committed to help plan and participate in the first of our Roundtables, now being scheduled for May. By the time we announced our launch on April 8, the Departments of Labor, Transportation, and Treasury had also signed up. And at the end of the launch event, the Deputy Chief Information Officer of the USDA publicly committed her agency to participate as well…”
Historic release of data delivers unprecedented transparency on the medical services physicians provide and how much they are paid
Jonathan Blum, Principal Deputy Administrator, Centers for Medicare & Medicaid Services : “Today the Centers for Medicare & Medicaid Services (CMS) took a major step forward in making Medicare data more transparent and accessible, while maintaining the privacy of beneficiaries, by announcing the release of new data on medical services and procedures furnished to Medicare fee-for-service beneficiaries by physicians and other healthcare professionals (http://www.cms.gov/newsroom/newsroom-center.html). For too long, the only information on physicians readily available to consumers was physician name, address and phone number. This data will, for the first time, provide a better picture of how physicians practice in the Medicare program.
This new data set includes over nine million rows of data on more than 880,000 physicians and other healthcare professionals in all 50 states, DC and Puerto Rico providing care to Medicare beneficiaries in 2012. The data set presents key information on the provision of services by physicians and how much they are paid for those services, and is organized by provider (National Provider Identifier or NPI), type of service (Healthcare Common Procedure Coding System, or HCPCS) code, and whether the service was performed in a facility or office setting. This public data set includes the number of services, average submitted charges, average allowed amount, average Medicare payment, and a count of unique beneficiaries treated. CMS takes beneficiary privacy very seriously and we will protect patient-identifiable information by redacting any data in cases where it includes fewer than 11 beneficiaries.
Previously, CMS could not release this information due to a permanent injunction issued by a court in 1979. However, in May 2013, the court vacated this injunction, causing a series of events that has led CMS to be able to make this information available for the first time.
Data to Fuel Research and Innovation
In addition to the public data release, CMS is making slight modifications to the process to request CMS data for research purposes. This will allow researchers to conduct important research at the physician level. As with the public release of information described above, CMS will continue to prohibit the release of patient-identifiable information. For more information about CMS’s disclosures to researchers, please contact the Research Data Assistance Center (ResDAC) at http://www.resdac.org/.
Unprecedented Data Access
This data release follows other CMS efforts to make more data available to the public. Since 2010, the agency has released an unprecedented amount of aggregated data in machine-readable form, with much of it available at http://www.healthdata.gov. These data range from previously unpublished statistics on Medicare spending, utilization, and quality at the state, hospital referral region, and county level, to detailed information on the quality performance of hospitals, nursing homes, and other providers.
In May 2013, CMS released information on the average charges for the 100 most common inpatient services at more than 3,000 hospitals nationwide http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Inpatient.html.
In June 2013, CMS released average charges for 30 selected outpatient procedures http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Outpatient.html.
We will continue to work toward harnessing the power of data to promote quality and value, and improve the health of our seniors and persons with disabilities.”
The Data Mining Techniques That Reveal Our Planet's Cultural Links and Boundaries
Emerging Technology From the arXiv: “The habits and behaviors that define a culture are complex and fascinating. But measuring them is a difficult task. What’s more, understanding the way cultures change from one part of the world to another is a task laden with challenges.
The gold standard in this area of science is known as the World Values Survey, a global network of social scientists studying values and their impact on social and political life. Between 1981 and 2008, this survey conducted over 250,000 interviews in 87 societies. That’s a significant amount of data and the work has continued since then. This work is hugely valuable but it is also challenging, time-consuming and expensive.
Today, Thiago Silva at the Universidade Federal de Minas Gerais in Brazil and a few buddies reveal another way to collect data that could revolutionize the study of global culture. These guys study cultural differences around the world using data generated by check-ins on the location-based social network, Foursquare.
That allows these researchers to gather huge amounts of data, cheaply and easily in a short period of time. “Our one-week dataset has a population of users of the same order of magnitude of the number of interviews performed in [the World Values Survey] in almost three decades,” they say.
Food and drink are fundamental aspects of society and so the behaviors and habits associated with them are important indicators. The basic question that Silva and co attempt to answer is: what are your eating and drinking habits? And how do these differ from a typical individual in another part of the world such as Japan, Malaysia, or Brazil?
Foursquare is ideally set up to explore this question. Users “check in” by indicating when they have reached a particular location that might be related to eating and drinking but also to other activities such as entertainment, sport and so on.
Silva and co are only interested in the food and drink preferences of individuals and, in particular, on the way these preferences change according to time of day and geographical location.
So their basic approach is to compare a large number individual preferences from different parts of the world and see how closely they match or how they differ.
Because Foursquare does not share its data, Silva and co downloaded almost five million tweets containing Foursquare check-ins, URLs pointing to the Foursquare website containing information about each venue. They discarded check-ins that were unrelated to food or drink.
That left them with some 280,000 check-ins related to drink from 160,000 individuals; over 400,000 check-ins related to fast food from 230,000 people; and some 400,000 check-ins relating to ordinary restaurant food or what Silva and co call slow food.
They then divide each of these classes into subcategories. For example, the drink class has 21 subcategories such as brewery, karaoke bar, pub, and so on. The slow food class has 53 subcategories such as Chinese restaurant, Steakhouse, Greek restaurant, and so on.
Each check-in gives the time and geographical location which allows the team to compare behaviors from all over the world. They compare, for example, eating and drinking times in different countries both during the week and at the weekend. They compare the choices of restaurants, fast food habits and drinking habits by continent and country. The even compare eating and drinking habits in New York, London, and Tokyo.
The results are a fascinating insight into humanity’s differing habits. Many places have similar behaviors, Malaysia and Singapore or Argentina and Chile, for example, which is just as expected given the similarities between these places.
But other resemblances are more unexpected. A comparison of drinking habits show greater similarity between Brazil and France, separated by the Atlantic Ocean, than they do between France and England, separated only by the English Channel…
They point out only two major differences. The first is that no Islamic cluster appears in the Foursquare data. Countries such as Turkey are similar to Russia, while Indonesia seems related to Malaysia and Singapore.
The second is that the U.S. and Mexico make up their own individual cluster in the Foursquare data whereas the World Values Survey has them in the “English-speaking” and “Latin American” clusters accordingly.
That’s exciting data mining work that has the potential to revolutionize the way sociologists and anthropologists study human culture around the world. Expect to hear more about it
Ref: http://arxiv.org/abs/1404.1009: You Are What You Eat (and Drink): Identifying Cultural Boundaries By Analyzing Food & Drink Habits In Foursquare”.
'Hackathons' Aim to Solve Health Care's Ills
Amy Dockser Marcus in the Wall Street Journal: “Hackathons, the high-octane, all-night problem-solving sessions popularized by the software-coding community, are making their way into the more traditional world of health care. At Massachusetts Institute of Technology, a recent event called Hacking Medicine’s Grand Hackfest attracted more than 450 people to work for one weekend on possible solutions to problems involving diabetes, rare diseases, global health and information technology used at hospitals.
Health institutions such as New York-Presbyterian Hospital and Brigham and Women’s Hospital in Boston have held hackathons. MIT, meantime, has co-sponsored health hackathons in India, Spain and Uganda.
Hackathons of all kinds are increasingly popular. Intel Corp. recently bought a group that organizes them. Companies hoping to spark creative thinking sponsor them. And student-run hackathons have turned into intercollegiate competitions.
But in health care, where change typically comes much more slowly than in Silicon Valley, they represent a cultural shift. To solve a problem, scientists and doctors can spend years painstakingly running experiments, gathering data, applying for grants and publishing results. So the idea of an event where people give two-minute pitches describing a problem, then join a team of strangers to come up with a solution in the course of one weekend is radical.
“We are not trying to replace the medical culture with Facebook culture,” said Elliot Cohen, who wore a hoodie over a button-down dress shirt at the MIT event in March and helped start MIT Hacking Medicine while at business school. “But we want to try to blend them more.”
Mr. Cohen co-founded and is chief technology officer at PillPack, a pharmacy that sends customers personalized packages of their medications, a company that started at a hackathon.
At MIT’s health-hack, physicians, researchers, students and a smattering of people wearing Google Glass sprawled on the floor of MIT’s Media Lab and at tables with a view of the Boston skyline. At one table, a group of college students, laptops plastered with stickers, pulled juice boxes and snacks out of backpacks, trash piling up next to them as they feverishly wrote code.
Nupur Garg, an emergency-room physician and one of the eventual winners, finished her hospital shift at 2 a.m. Saturday in New York, drove to Boston and arrived at MIT in time to pitch the need for a way to capture images of patients’ ears and throats that can be shared with specialists to help make diagnoses. She and her team immediately started working on a prototype for the device, testing early versions on anyone who stopped by their table.
Dr. Garg and teammate Nancy Liang, who runs a company that makes Web apps for 3-D printers, caught a few hours of sleep in a dorm room Saturday night. They came up with the idea for their product’s name—MedSnap—later that night while watching students use cellphone cameras to send SnapChats to one another. “There was no time to conduct surveys on what was the best name,” said Ms. Liang. “Many ideas happen after midnight.”
Winning teams in each category won $1,000, as well as access to the hackathons sponsors for advice and pilot projects.
Yet even supporters say hackathons can’t solve medicine’s challenges overnight. Harlan Krumholz, a professor at Yale School of Medicine who ran a many-months trial that found telemonitoring didn’t reduce hospitalizations or deaths of cardiology patients, said he supports the problem-solving ethos of hackathons. But he added that “improvements require a long-term commitment, not just a weekend.”
Ned McCague, a data scientist at Blue Cross Blue Shield of Massachusetts, served as a mentor at the hackathon. He said he wasn’t representing his employer, but he used his professional experiences to push groups to think about the potential customer. “They have a good idea and are excited about it, but they haven’t thought about who is paying for it,” he said.
Zen Chu, a senior lecturer in health-care innovation and entrepreneur-in-residence at MIT, and one of the founders of Hacking Medicine, said more than a dozen startups conceived since the first hackathon, in 2011, are still in operation. Some received venture-capital funding.
The upsides of hackathons were made clear to Sharon Moalem, a physician who studies rare diseases. He had spent years developing a mobile app that can take pictures of faces to help diagnose rare genetic conditions, but was stumped on how to give the images a standard size scale to make comparisons. At the hackathon, Dr. Moalem said he was approached by an MIT student who suggested sticking a coin on the subjects’ forehead. Since quarters have a standard measurement, it “creates a scale,” said Dr. Moalem.
Dr. Moalem said he had never considered such a simple, elegant solution. The team went on to write code to help standardize facial measurements based on the dimensions of a coin and a credit card.
“Sometimes when you are too close to something, you stop seeing solutions, you only see problems,” Dr. Moalem said. “I needed to step outside my own silo.”
Book Review: 'The Rule of Nobody' by Philip K. Howard
Stuart Taylor Jr in the Wall Street Journal: “Amid the liberal-conservative ideological clash that paralyzes our government, it’s always refreshing to encounter the views of Philip K. Howard, whose ideology is common sense spiked with a sense of urgency. In “The Rule of Nobody,” Mr. Howard shows how federal, state and local laws and regulations have programmed officials of both parties to follow rules so detailed, rigid and, often, obsolete as to leave little room for human judgment. He argues passionately that we will never solve our social problems until we abandon what he calls a misguided legal philosophy of seeking to put government on regulatory autopilot. He also predicts that our legal-governmental structure is “headed toward a stall and then a frightening plummet toward insolvency and political chaos.”
Mr. Howard, a big-firm lawyer who heads the nonpartisan government-reform coalition Common Good, is no conventional deregulator. But he warns that the “cumulative complexity” of the dense rulebooks that prescribe “every nuance of how law is implemented” leaves good officials without the freedom to do what makes sense on the ground. Stripped of the authority that they should have, he adds, officials have little accountability for bad results. More broadly, he argues that the very structure of our democracy is so clogged by deep thickets of dysfunctional law that it will only get worse unless conservatives and liberals alike cast off their distrust of human discretion.
The rulebooks should be “radically simplified,” Mr. Howard says, on matters ranging from enforcing school discipline to protecting nursing-home residents, from operating safe soup kitchens to building the nation’s infrastructure: Projects now often require multi-year, 5,000-page environmental impact statements before anything can begin to be constructed. Unduly detailed rules should be replaced by general principles, he says, that take their meaning from society’s norms and values and embrace the need for official discretion and responsibility.
Mr. Howard serves up a rich menu of anecdotes, including both the small-scale activities of a neighborhood and the vast administrative structures that govern national life. After a tree fell into a stream and caused flooding during a winter storm, Franklin Township, N.J., was barred from pulling the tree out until it had spent 12 days and $12,000 for the permits and engineering work that a state environmental rule required for altering any natural condition in a “C-1 stream.” The “Volcker Rule,” designed to prevent banks from using federally insured deposits to speculate in securities, was shaped by five federal agencies and countless banking lobbyists into 963 “almost unintelligible” pages. In New York City, “disciplining a student potentially requires 66 separate steps, including several levels of potential appeals”; meanwhile, civil-service rules make it virtually impossible to terminate thousands of incompetent employees. Children’s lemonade stands in several states have been closed down for lack of a vendor’s license.

Conservatives as well as liberals like detailed rules—complete with tedious forms, endless studies and wasteful legal hearings—because they don’t trust each other with discretion. Corporations like them because they provide not only certainty but also “a barrier to entry for potential competitors,” by raising the cost of doing business to prohibitive levels for small businesses with fresh ideas and other new entrants to markets. Public employees like them because detailed rules “absolve them of responsibility.” And, adds Mr. Howard, “lawsuits [have] exploded in this rules-based regime,” shifting legal power to “self-interested plaintiffs’ lawyers,” who have learned that they “could sue for the moon and extract settlements even in cases (as with some asbestos claims) that were fraudulent.”
So habituated have we become to such stuff, Mr. Howard says, that government’s “self-inflicted ineptitude is accepted as a state of nature, as if spending an average of eight years on environmental reviews—which should be a national scandal—were an unavoidable mountain range.” Common-sensical laws would place outer boundaries on acceptable conduct based on reasonable norms that are “far better at preventing abuse of power than today’s regulatory minefield.”
“As Mr. Howard notes, his book is part of a centuries-old rules-versus-principles debate. The philosophers and writers whom he quotes approvingly include Aristotle, James Madison, Isaiah Berlin and Roscoe Pound, a prominent Harvard law professor and dean who condemned “mechanical jurisprudence” and championed broad official discretion. Berlin, for his part, warned against “monstrous bureaucratic machines, built in accordance with the rules that ignore the teeming variety of the living world, the untidy and asymmetrical inner lives of men, and crush them into conformity.” Mr. Howard juxtaposes today’s roughly 100 million words of federal law and regulations with Madison’s warning that laws should not be “so voluminous that they cannot be read, or so incoherent that they cannot be understood.”…
Let’s get geeks into government
Gillian Tett in the Financial Times: “Fifteen years ago, Brett Goldstein seemed to be just another tech entrepreneur. He was working as IT director of OpenTable, then a start-up website for restaurant bookings. The company was thriving – and subsequently did a very successful initial public offering. Life looked very sweet for Goldstein. But when the World Trade Center was attacked in 2001, Goldstein had a moment of epiphany. “I spent seven years working in a startup but, directly after 9/11, I knew I didn’t want my whole story to be about how I helped people make restaurant reservations. I wanted to work in public service, to give something back,” he recalls – not just by throwing cash into a charity tin, but by doing public service. So he swerved: in 2006, he attended the Chicago police academy and then worked for a year as a cop in one of the city’s toughest neighbourhoods. Later he pulled the disparate parts of his life together and used his number-crunching skills to build the first predictive data system for the Chicago police (and one of the first in any western police force), to indicate where crime was likely to break out.
This was such a success that Goldstein was asked by Rahm Emanuel, the city’s mayor, to create predictive data systems for the wider Chicago government. The fruits of this effort – which include a website known as “WindyGrid” – went live a couple of years ago, to considerable acclaim inside the techie scene.
This tale might seem unremarkable. We are all used to hearing politicians, business leaders and management consultants declare that the computing revolution is transforming our lives. And as my colleague Tim Harford pointed out in these pages last week, the idea of using big data is now wildly fashionable in the business and academic worlds….
In America when top bankers become rich, they often want to “give back” by having a second career in public service: just think of all those Wall Street financiers who have popped up at the US Treasury in recent years. But hoodie-wearing geeks do not usually do the same. Sure, there are some former techie business leaders who are indirectly helping government. Steve Case, a co-founder of AOL, has supported White House projects to boost entrepreneurship and combat joblessness. Tech entrepreneurs also make huge donations to philanthropy. Facebook’s Mark Zuckerberg, for example, has given funds to Newark education. And the whizz-kids have also occasionally been summoned by the White House in times of crisis. When there was a disastrous launch of the government’s healthcare website late last year, the Obama administration enlisted the help of some of the techies who had been involved with the president’s election campaign.
But what you do not see is many tech entrepreneurs doing what Goldstein did: deciding to spend a few years in public service, as a government employee. There aren’t many Zuckerberg types striding along the corridors of federal or local government.
. . .
It is not difficult to work out why. To most young entrepreneurs, the idea of working in a state bureaucracy sounds like utter hell. But if there was ever a time when it might make sense for more techies to give back by doing stints of public service, that moment is now. The civilian public sector badly needs savvier tech skills (just look at the disaster of that healthcare website for evidence of this). And as the sector’s founders become wealthier and more powerful, they need to show that they remain connected to society as a whole. It would be smart political sense.
So I applaud what Goldstein has done. I also welcome that he is now trying to persuade his peers to do the same, and that places such as the University of Chicago (where he teaches) and New York University are trying to get more young techies to think about working for government in between doing those dazzling IPOs. “It is important to see more tech entrepreneurs in public service. I am always encouraging people I know to do a ‘stint in government”. I tell them that giving back cannot just be about giving money; we need people from the tech world to actually work in government, “ Goldstein says.
But what is really needed is for more technology CEOs and leaders to get involved by actively talking about the value of public service – or even encouraging their employees to interrupt their private-sector careers with the occasional spell as a government employee (even if it is not in a sector quite as challenging as the police). Who knows? Maybe it could be Sheryl Sandberg’s next big campaigning mission. After all, if she does ever jump back to Washington, that could have a powerful demonstration effect for techie women and men. And shake DC a little too.”
Politics and the Internet
Edited book by William H. Dutton (Routledge – 2014 – 1,888 pages: “It is commonplace to observe that the Internet—and the dizzying technologies and applications which it continues to spawn—has revolutionized human communications. But, while the medium’s impact has apparently been immense, the nature of its political implications remains highly contested. To give but a few examples, the impact of networked individuals and institutions has prompted serious scholarly debates in political science and related disciplines on: the evolution of ‘e-government’ and ‘e-politics’ (especially after recent US presidential campaigns); electronic voting and other citizen participation; activism; privacy and surveillance; and the regulation and governance of cyberspace.
As research in and around politics and the Internet flourishes as never before, this new four-volume collection from Routledge’s acclaimed Critical Concepts in Political Science series meets the need for an authoritative reference work to make sense of a rapidly growing—and ever more complex—corpus of literature. Edited by William H. Dutton, Director of the Oxford Internet Institute (OII), the collection gathers foundational and canonical work, together with innovative and cutting-edge applications and interventions.
With a full index and comprehensive bibliographies, together with a new introduction by the editor, which places the collected material in its historical and intellectual context, Politics and the Internet is an essential work of reference. The collection will be particularly useful as a database allowing scattered and often fugitive material to be easily located. It will also be welcomed as a crucial tool permitting rapid access to less familiar—and sometimes overlooked—texts. For researchers, students, practitioners, and policy-makers, it is a vital one-stop research and pedagogic resource.”
Eight (No, Nine!) Problems With Big Data
Gary Marcus and Ernest Davis in the New York Times: “BIG data is suddenly everywhere. Everyone seems to be collecting it, analyzing it, making money from it and celebrating (or fearing) its powers. Whether we’re talking about analyzing zillions of Google search queries to predict flu outbreaks, or zillions of phone records to detect signs of terrorist activity, or zillions of airline stats to find the best time to buy plane tickets, big data is on the case. By combining the power of modern computing with the plentiful data of the digital era, it promises to solve virtually any problem — crime, public health, the evolution of grammar, the perils of dating — just by crunching the numbers.
Or so its champions allege. “In the next two decades,” the journalist Patrick Tucker writes in the latest big data manifesto, “The Naked Future,” “we will be able to predict huge areas of the future with far greater accuracy than ever before in human history, including events long thought to be beyond the realm of human inference.” Statistical correlations have never sounded so good.
Is big data really all it’s cracked up to be? There is no doubt that big data is a valuable tool that has already had a critical impact in certain areas. For instance, almost every successful artificial intelligence computer program in the last 20 years, from Google’s search engine to the I.B.M. “Jeopardy!” champion Watson, has involved the substantial crunching of large bodies of data. But precisely because of its newfound popularity and growing use, we need to be levelheaded about what big data can — and can’t — do.
The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two. Likewise, from 1998 to 2007 the number of new cases of autism diagnosed was extremely well correlated with sales of organic food (both went up sharply), but identifying the correlation won’t by itself tell us whether diet has anything to do with autism.
Second, big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement. Molecular biologists, for example, would very much like to be able to infer the three-dimensional structure of proteins from their underlying DNA sequence, and scientists working on the problem use big data as one tool among many. But no scientist thinks you can solve this problem by crunching data alone, no matter how powerful the statistical analysis; you will always need to start with an analysis that relies on an understanding of physics and biochemistry.
Third, many tools that are based on big data can be easily gamed. For example, big data programs for grading student essays often rely on measures like sentence length and word sophistication, which are found to correlate well with the scores given by human graders. But once students figure out how such a program works, they start writing long sentences and using obscure words, rather than learning how to actually formulate and write clear, coherent text. Even Google’s celebrated search engine, rightly seen as a big data success story, is not immune to “Google bombing” and “spamdexing,” wily techniques for artificially elevating website search placement.
Fourth, even when the results of a big data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem. Consider Google Flu Trends, once the poster child for big data. In 2009, Google reported — to considerable fanfare — that by analyzing flu-related search queries, it had been able to detect the spread of the flu as accurately and more quickly than the Centers for Disease Control and Prevention. A few years later, though, Google Flu Trends began to falter; for the last two years it has made more bad predictions than good ones.
As a recent article in the journal Science explained, one major contributing cause of the failures of Google Flu Trends may have been that the Google search engine itself constantly changes, such that patterns in data collected at one time do not necessarily apply to data collected at another time. As the statistician Kaiser Fung has noted, collections of big data that rely on web hits often merge data that was collected in different ways and with different purposes — sometimes to ill effect. It can be risky to draw conclusions from data sets of this kind.
A fifth concern might be called the echo-chamber effect, which also stems from the fact that much of big data comes from the web. Whenever the source of information for a big data analysis is itself a product of big data, opportunities for vicious cycles abound. Consider translation programs like Google Translate, which draw on many pairs of parallel texts from different languages — for example, the same Wikipedia entry in two different languages — to discern the patterns of translation between those languages. This is a perfectly reasonable strategy, except for the fact that with some of the less common languages, many of the Wikipedia articles themselves may have been written using Google Translate. In those cases, any initial errors in Google Translate infect Wikipedia, which is fed back into Google Translate, reinforcing the error.
A sixth worry is the risk of too many correlations. If you look 100 times for correlations between two variables, you risk finding, purely by chance, about five bogus correlations that appear statistically significant — even though there is no actual meaningful connection between the variables. Absent careful supervision, the magnitudes of big data can greatly amplify such errors.
Seventh, big data is prone to giving scientific-sounding solutions to hopelessly imprecise questions. In the past few months, for instance, there have been two separate attempts to rank people in terms of their “historical importance” or “cultural contributions,” based on data drawn from Wikipedia. One is the book “Who’s Bigger? Where Historical Figures Really Rank,” by the computer scientist Steven Skiena and the engineer Charles Ward. The other is an M.I.T. Media Lab project called Pantheon.
Both efforts get many things right — Jesus, Lincoln and Shakespeare were surely important people — but both also make some egregious errors. “Who’s Bigger?” claims that Francis Scott Key was the 19th most important poet in history; Pantheon has claimed that Nostradamus was the 20th most important writer in history, well ahead of Jane Austen (78th) and George Eliot (380th). Worse, both projects suggest a misleading degree of scientific precision with evaluations that are inherently vague, or even meaningless. Big data can reduce anything to a single number, but you shouldn’t be fooled by the appearance of exactitude.
FINALLY, big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like “in a row”). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of language.
To select an example more or less at random, a book review that the actor Rob Lowe recently wrote for this newspaper contained nine trigrams such as “dumbed-down escapist fare” that had never before appeared anywhere in all the petabytes of text indexed by Google. To witness the limitations that big data can have with novelty, Google-translate “dumbed-down escapist fare” into German and then back into English: out comes the incoherent “scaled-flight fare.” That is a long way from what Mr. Lowe intended — and from big data’s aspirations for translation.
Wait, we almost forgot one last problem: the hype….