Politics and the New Machine


Jill Lepore in the NewYorker on “What the turn from polls to data science means for democracy”: “…The modern public-opinion poll has been around since the Great Depression, when the response rate—the number of people who take a survey as a percentage of those who were asked—was more than ninety. The participation rate—the number of people who take a survey as a percentage of the population—is far lower. Election pollsters sample only a minuscule portion of the electorate, not uncommonly something on the order of a couple of thousand people out of the more than two hundred million Americans who are eligible to vote. The promise of this work is that the sample is exquisitely representative. But the lower the response rate the harder and more expensive it becomes to realize that promise, which requires both calling many more people and trying to correct for “non-response bias” by giving greater weight to the answers of people from demographic groups that are less likely to respond. Pollster.com’s Mark Blumenthal has recalled how, in the nineteen-eighties, when the response rate at the firm where he was working had fallen to about sixty per cent, people in his office said, “What will happen when it’s only twenty? We won’t be able to be in business!” A typical response rate is now in the single digits.

Meanwhile, polls are wielding greater influence over American elections than ever….

Still, data science can’t solve the biggest problem with polling, because that problem is neither methodological nor technological. It’s political. Pollsters rose to prominence by claiming that measuring public opinion is good for democracy. But what if it’s bad?

A “poll” used to mean the top of your head. Ophelia says of Polonius, “His beard as white as snow: All flaxen was his poll.” When voting involved assembling (all in favor of Smith stand here, all in favor of Jones over there), counting votes required counting heads; that is, counting polls. Eventually, a “poll” came to mean the count itself. By the nineteenth century, to vote was to go “to the polls,” where, more and more, voting was done on paper. Ballots were often printed in newspapers: you’d cut one out and bring it with you. With the turn to the secret ballot, beginning in the eighteen-eighties, the government began supplying the ballots, but newspapers kept printing them; they’d use them to conduct their own polls, called “straw polls.” Before the election, you’d cut out your ballot and mail it to the newspaper, which would make a prediction. Political parties conducted straw polls, too. That’s one of the ways the political machine worked….

Ever since Gallup, two things have been called polls: surveys of opinions and forecasts of election results. (Plenty of other surveys, of course, don’t measure opinions but instead concern status and behavior: Do you own a house? Have you seen a doctor in the past month?) It’s not a bad idea to reserve the term “polls” for the kind meant to produce election forecasts. When Gallup started out, he was skeptical about using a survey to forecast an election: “Such a test is by no means perfect, because a preelection survey must not only measure public opinion in respect to candidates but must also predict just what groups of people will actually take the trouble to cast their ballots.” Also, he didn’t think that predicting elections constituted a public good: “While such forecasts provide an interesting and legitimate activity, they probably serve no great social purpose.” Then why do it? Gallup conducted polls only to prove the accuracy of his surveys, there being no other way to demonstrate it. The polls themselves, he thought, were pointless…

If public-opinion polling is the child of a strained marriage between the press and the academy, data science is the child of a rocky marriage between the academy and Silicon Valley. The term “data science” was coined in 1960, one year after the Democratic National Committee hired Simulmatics Corporation, a company founded by Ithiel de Sola Pool, a political scientist from M.I.T., to provide strategic analysis in advance of the upcoming Presidential election. Pool and his team collected punch cards from pollsters who had archived more than sixty polls from the elections of 1952, 1954, 1956, 1958, and 1960, representing more than a hundred thousand interviews, and fed them into a UNIVAC. They then sorted voters into four hundred and eighty possible types (for example, “Eastern, metropolitan, lower-income, white, Catholic, female Democrat”) and sorted issues into fifty-two clusters (for example, foreign aid). Simulmatics’ first task, completed just before the Democratic National Convention, was a study of “the Negro vote in the North.” Its report, which is thought to have influenced the civil-rights paragraphs added to the Party’s platform, concluded that between 1954 and 1956 “a small but significant shift to the Republicans occurred among Northern Negroes, which cost the Democrats about 1 per cent of the total votes in 8 key states.” After the nominating convention, the D.N.C. commissioned Simulmatics to prepare three more reports, including one that involved running simulations about different ways in which Kennedy might discuss his Catholicism….

Data science may well turn out to be as flawed as public-opinion polling. But a stage in the development of any new tool is to imagine that you’ve perfected it, in order to ponder its consequences. I asked Hilton to suppose that there existed a flawless tool for measuring public opinion, accurately and instantly, a tool available to voters and politicians alike. Imagine that you’re a member of Congress, I said, and you’re about to head into the House to vote on an act—let’s call it the Smeadwell-Nutley Act. As you do, you use an app called iThePublic to learn the opinions of your constituents. You oppose Smeadwell-Nutley; your constituents are seventy-nine per cent in favor of it. Your constituents will instantly know how you’ve voted, and many have set up an account with Crowdpac to make automatic campaign donations. If you vote against the proposed legislation, your constituents will stop giving money to your reëlection campaign. If, contrary to your convictions but in line with your iThePublic, you vote for Smeadwell-Nutley, would that be democracy? …(More)”

 

How Satellite Data and Artificial Intelligence could help us understand poverty better


Maya Craig at Fast Company: “Governments and development organizations currently measure poverty levels by conducting door-to-door surveys. The new partnership will test the use of AI to supplement these surveys and increase the accuracy of poverty data. Orbital said its AI software will analyze satellite images to see if characteristics such as building height and rooftop material can effectively indicate wealth.

The pilot study will be conducted in Sri Lanka. If successful, the World Bank hopes to scale it worldwide. A recent study conducted by the organization found that more than 50 countries lack legitimate poverty estimates, which limits the ability of the development community to support the world’s poorest populations.

“Data depravation is a serious issue, especially in many of the countries where we need it most,” says David Newhouse, senior economist at the World Bank. “This technology has the potential to help us get that data more frequently and at a finer level of detail than is currently possible.”

The announcement is the latest in an emerging industry of AI analysis of satellite photos. A growing number of investors and entrepreneurs are betting that the convergence of these fields will have far-reaching impacts on business, policy, resource management and disaster response.

Wall Street’s biggest hedge-fund businesses have begun using the technology to improve investment strategies. The Pew Charitable Trust employs the method to monitor oceans for illegal fishing activities. And startups like San Francisco-based Mavrx use similar analytics to optimize crop harvest.

The commercial earth-imaging satellite market, valued at $2.7 billion in 2014, is predicted to grow by 14% each year through the decade, according to a recent report.

As recently as two years ago, there were just four commercial earth imaging satellites operated in the U.S., and government contracts accounted for about 70% of imagery sales. By 2020, there will be hundreds of private-sector “smallsats” in orbit capturing imagery that will be easily accessible online. Companies like Skybox Imaging and Planet Labs have the first of these smallsats already active, with plans for more.

The images generated by these companies will be among the world’s largest data sets. And recent breakthroughs in AI research have made it possible to analyze these images to inform decision-making…(More)”

Push, Pull, and Spill: A Transdisciplinary Case Study in Municipal Open Government


Paper by Jan Whittington et al: “Cities hold considerable information, including details about the daily lives of residents and employees, maps of critical infrastructure, and records of the officials’ internal deliberations. Cities are beginning to realize that this data has economic and other value: If done wisely, the responsible release of city information can also release greater efficiency and innovation in the public and private sector. New services are cropping up that leverage open city data to great effect.

Meanwhile, activist groups and individual residents are placing increasing pressure on state and local government to be more transparent and accountable, even as others sound an alarm over the privacy issues that inevitably attend greater data promiscuity. This takes the form of political pressure to release more information, as well as increased requests for information under the many public records acts across the country.

The result of these forces is that cities are beginning to open their data as never before. It turns out there is surprisingly little research to date into the important and growing area of municipal open data. This article is among the first sustained, cross-disciplinary assessments of an open municipal government system. We are a team of researchers in law, computer science, information science, and urban studies. We have worked hand-in-hand with the City of Seattle, Washington for the better part of a year to understand its current procedures from each disciplinary perspective. Based on this empirical work, we generate a set of recommendations to help the city manage risk latent in opening its data….(More)”

Using Crowdsourcing to Track the Next Viral Disease Outbreak


The TakeAway: “Last year’s Ebola outbreak in West Africa killed more than 11,000 people. The pandemic may be diminished, but public health officials think that another major outbreak of infectious disease is fast-approaching, and they’re busy preparing for it.

Boston public radio station WGBH recently partnered with The GroundTruth Project and NOVA Next on a series called “Next Outbreak.” As part of the series, they reported on an innovative global online monitoring system called HealthMap, which uses the power of the internet and crowdsourcing to detect and track emerging infectious diseases, and also more common ailments like the flu.

Researchers at Boston Children’s Hospital are the ones behind HealthMap (see below), and they use it to tap into tens of thousands of sources of online data, including social media, news reports, and blogs to curate information about outbreaks. Dr. John Brownstein, chief innovation officer at Boston Children’s Hospital and co-founder of HealthMap, says that smarter data collection can help to quickly detect and track emerging infectious diseases, fatal or not.

“Traditional public health is really slowed down by the communication process: People get sick, they’re seen by healthcare providers, they get laboratory confirmed, information flows up the channels to state and local health [agencies], national governments, and then to places like the WHO,” says Dr. Brownstein. “Each one of those stages can take days, weeks, or even months, and that’s the problem if you’re thinking about a virus that can spread around the world in a matter of days.”

The HealthMap team looks at a variety of communication channels to undo the existing hierarchy of health information.

“We make everyone a stakeholder when it comes to data about outbreaks, including consumers,” says Dr. Brownstein. “There are a suite of different tools that public health officials have at their disposal. What we’re trying to do is think about how to communicate and empower individuals to really understand what the risks are, what the true information is about a disease event, and what they can do to protect themselves and their families. It’s all about trying to demystify outbreaks.”

In addition to the map itself, the HealthMap team has a number of interactive tools that individuals can both use and contribute to. Dr. Brownstein hopes these resources will enable the public to care more about disease outbreaks that may be happening around them—it’s a way to put the “public” back in “public health,” he says.

“We have a app called Outbreaks Near Me that allows people to know about what disease outbreaks are happening in their neighborhood,” Dr. Brownstein says. “Flu Near You is a an app that people use to self report on symptoms; Vaccine Finder is a tool that allows people to know what vaccines are available to them and their community.”

In addition to developing their own app, the HealthMap has partnered with existing tech firms like Uber to spread the word about public health.

“We worked closely with Uber last year and actually put nurses in Uber cars and delivered vaccines to people,” Dr. Brownstein says. “The closest vaccine location might still be only a block away for people, but people are still hesitant to get it done.”…(More)”

How smartphones are solving one of China’s biggest mysteries


Ana Swanson at the Washington Post: “For decades, China has been engaged in a building boom of a scale that is hard to wrap your mind around. In the last three decades, 260 million people have moved from the countryside to Chinese cities — equivalent to around 80 percent of the population of the U.S. To make room for all of those people, the size of China’s built-up urban areas nearly quintupled between 1984 and 2010.

Much of that development has benefited people’s lives, but some has not. In a breathless rush to boost growth and development, some urban areas have built vast, unused real estate projects — China’s infamous “ghost cities.” These eerie, shining developments are complete except for one thing: people to live in them.

China’s ghost cities have sparked a lot of debate over the last few years. Some argue that the developments are evidence of the waste in top-down planning, or the result of too much cheap funding for businesses. Some blame the lack of other good places for average people to invest their money, or the desire of local officials to make a quick buck — land sales generate a lot of revenue for China’s local governments.

Others say the idea of ghost cities has been overblown. They espouse a “build it and they will come” philosophy, pointing out that, with time, some ghost cities fill up and turn into vibrant communities.

It’s been hard to evaluate these claims, since most of the research on ghost cities has been anecdotal. Even the most rigorous research methods leave a lot to be desired — for example, investment research firms sending poor junior employees out to remote locations to count how many lights are turned on in buildings at night.

Now new research from Baidu, one of China’s biggest technology companies, provides one of the first systematic looks at Chinese ghost cities. Researchers from Baidu’s Big Data Lab and Peking University in Beijing used the kind of location data gathered by mobile phones and GPS receivers to track how people moved in and out suspected ghost cities, in real time and on a national scale, over a period of six months. You can see the interactive project here.

Google has been blocked in China for years, and Baidu dominates the market in terms of search, mobile maps and other offerings. That gave the researchers a huge data base to work with —  770 million users, a hefty chunk of China’s 1.36 billion people.

To identify potential ghost cities, the researchers created an algorithm that identifies urban areas with a relatively spare population. They define a ghost city as an urban region with a population of fewer than 5,000 people per square kilometer – about half the density recommended by the Chinese Ministry of Housing and Urban-Rural Development….(More)”

Mobile data: Made to measure


Neil Savage in Nature: “For decades, doctors around the world have been using a simple test to measure the cardiovascular health of patients. They ask them to walk on a hard, flat surface and see how much distance they cover in six minutes. This test has been used to predict the survival rates of lung transplant candidates, to measure the progression of muscular dystrophy, and to assess overall cardiovascular fitness.

The walk test has been studied in many trials, but even the biggest rarely top a thousand participants. Yet when Euan Ashley launched a cardiovascular study in March 2015, he collected test results from 6,000 people in the first two weeks. “That’s a remarkable number,” says Ashley, a geneticist who heads Stanford University’s Center for Inherited Cardiovascular Disease. “We’re used to dealing with a few hundred patients, if we’re lucky.”

Numbers on that scale, he hopes, will tell him a lot more about the relationship between physical activity and heart health. The reason they can be achieved is that millions of people now have smartphones and fitness trackers with sensors that can record all sorts of physical activity. Health researchers are studying such devices to figure out what sort of data they can collect, how reliable those data are, and what they might learn when they analyse measurements of all sorts of day-to-day activities from many tens of thousands of people and apply big-data algorithms to the readings.

By July, more than 40,000 people in the United States had signed up to participate in Ashley’s study, which uses an iPhone application called MyHeart Counts. He expects the numbers to surge as the app becomes more widely available around the world. The study — designed by scientists, approved by institutional review boards, and requiring informed consent — asks participants to answer questions about their health and risk factors, and to use their phone’s motion sensors to collect data about their activities for seven days. They also do a six-minute walk test, and the phone measures the distance they cover. If their own doctors have ordered blood tests, users can enter information such as cholesterol or glucose measurements. Every three months, the app checks back to update their data.

Physicians know that physical activity is a strong predictor of long-term heart health, Ashley says. But it is less clear what kind of activity is best, or whether different groups of people do better with different types of exercise. MyHeart Counts may open a window on such questions. “We can start to look at subgroups and find differences,” he says.

“You can take pretty noisy data, but if you have enough of it, you can find a signal.”

It is the volume of the data that makes such studies possible. In traditional studies, there may not be enough data to find statistically significant results for such subgroups. And rare events may not occur in the smaller samples, or may produce a signal so weak that it is lost in statistical noise. Big data can overcome those problems, and if the data set is big enough, small errors can be smoothed out. “You can take pretty noisy data, but if you have enough of it, you can find a signal,” Ashley says….(More)”.

Teaching Open Data for Social Movements: a Research Strategy


Alan Freihof Tygel and Maria Luiza Machado Campo at the Journal of Community Informatics: “Since the year 2009, the release of public government data in open formats has been configured as one of the main actions taken by national states in order to respond to demands for transparency and participation by the civil society. The United States and theUnited Kingdom were pioneers, and today over 46 countries have their own Open Government Data Portali , many of them fostered by the Open Government Partnership (OGP), an international agreement aimed at stimulating transparency.

The premise of these open data portals is that, by making data publicly available in re-usable formats, society would take care of building applications and services, and gain value from this data (Huijboom & Broek, 2011). According to the same authors, the discourse around open data policies also includes increasing democratic control and participation and strengthening law enforcement.

Several recent works argue that the impact of open data policies, especially the release of open data portals, is still difficult to assess (Davies & Bawa, 2012; Huijboom & Broek, 2011; Zuiderwijk, Janssen, Choenni, Meijer, & Alibaks, 2012). One important consideration is that “The gap between the promise and reality of OGD [Open Government Data] re-use cannot be addressed by technological solutions alone” (Davies, 2012). Therefore, sociotechnical approaches (Mumford, 1987) are mandatory.

The targeted users of open government data lie over a wide range that includes journalists, non-governmental organizations (NGO), civil society organizations (CSO), enterprises, researchers and ordinary citizens who want to audit governments’ actions. Among them, the focus of our research is on social (or grassroots) movements. These are groups of organized citizens at local, national or international level who drive some political action, normally placing themselves in opposition to the established power relations and claiming rights for oppressed groups.

A literature definition gives a social movement as “collective social actions with a socio-political and cultural approach, which enable distinct forms of organizing the population and expressing their demands” (Gohn, 2011).

Social movements have been using data in their actions repertory with several motivations (as can be seen in Table 1 and Listing 1). From our experience, an overview of several cases where social movements use open data reveals a better understanding of reality and a more solid basis for their claims as motivations. Additionally, in some cases data produced by the social movements was used to build a counter-hegemonic discourse based on data. An interesting example is the Citizen Public Depth Audit Movement which takes place in Brazil. This movement, which is part of an international network, claims that “significant amounts registered as public debt do not correspond to money collected through loans to the country” (Fattorelli, 2011), and thus origins of this debt should be proven. According to the movement, in 2014 45% of Brazil’s Federal spend was paid to debt services.

Recently, a number of works tried to develop comparison schemes between open data strategies (Atz, Heath, & Fawcet, 2015; Caplan et al., 2014; Ubaldi, 2013; Zuiderwijk & Janssen, 2014). Huijboom & Broek (2011) listed four categories of instruments applied by the countries to implement their open data policies:

  • voluntary approaches, such as general recommendations,
  • economic instruments,
  • legislation and control, and
  • education and training.

One of the conclusions is that the latter was used to a lesser extent than the others.

Social movements, in general, are composed of people with little experience of informatics, either because of a lack of opportunities or of interest. Although it is recognized that using data is important for a social movement’s objectives, the training aspect still hinders a wider use of it.

In order to address this issue, an open data course for social movements was designed. Besides building a strategy on open data education, the course also aims to be a research strategy to understand three aspects:

  • the motivations of social movements for using open data;
  • the impediments that block a wider and better use; and
  • possible actions to be taken to enhance the use of open data by social movements….(More)”

Smarter Government For Social Impact: A New Mindset For Better Outcomes


Report by Drive Impact: “From Kentucky to Arkansas to New York, government leaders across the United States are leveraging data, technology, and a heightened focus on outcomes to deliver social impact with modern solutions. In Louisville, Kentucky, “smart” asthma inhalers track where attacks happen citywide and feed this data into a government dashboard, helping policymakers identify hot spots to improve air quality and better treat patients. Policy leaders in New York and Texas are reforming Medicaid with “value-based payments” that reward doctors for performing preventive procedures that protect against costly tests and treatments down the road. In Arkansas, a digital government platform called Gov2Go connects citizens with a personalized console that sends reminders to file paperwork, renew registrations, and seek out other relevant government services.

What all of these initiatives share is a smarter approach to policymaking: an operating belief that government can and should reward the best policies and programs by paying for the best outcomes and using the best data and technology to identify solutions that can transform service delivery and strengthen citizens’ connection to government. These transformational policies are smarter government, and America needs more of it. Smarter government uses an outcomes mindset to embrace cutting-edge data and technology, make better funding choices, learn from policy failures and successes, act on new knowledge about what works, and align clear goals with the right incentives to achieve them. Americans need a smarter, outcomes-focused government for the twenty-first century—one that can identify and address systemic barriers to effective service delivery and seek out and promote innovative solutions to our greatest social challenges….(More)”

When Lobbyists Write Legislation, This Data Mining Tool Traces The Paper Trail


FastCoExist: “Most kids learn the grade school civics lesson about how a bill becomes a law. What those lessons usually neglect to show is how legislation today is often birthed on a lobbyist’s desk.

But even for expert researchers, journalists, and government transparency groups, tracing a bill’s lineage isn’t easy—especially at the state level. Last year alone, there were 70,000 state bills introduced in 50 states. It would take one person five weeks to even read them all. Groups that do track state legislation usually focus narrowly on a single topic, such as abortion, or perhaps a single lobby groups.

Computers can do much better. A prototype tool, presented in September at Bloomberg’sData for Good Exchange 2015 conference, mines the Sunlight Foundation’s database of more than 500,000 bills and 200,000 resolutions for the 50 states from 2007 to 2015. It also compares them to 1,500 pieces of “model legislation” written by a few lobbying groups that made their work available, such as the conservative group ALEC (American Legislative Exchange Council) and the liberal group the State Innovation Exchange(formerly called ALICE).

The results are interesting. In one example of the program in use, the team—all from the Data Science for Social Good fellowship program in Chicago—created a graphic (above) that presents the relative influence of ALEC and ALICE in different states. The thickness of each line in the graphic correlates to the percentage of bills introduced in each state that are modeled on either group’s legislation. So a relatively liberal state like New York is mostly ALICE bills, while a “swing” state like Illinois has a lot from both groups….

Along with researchers from the University of Chicago, Wikimedia Foundation, Microsoft Research, and Northwestern University, Walsh is also co-author of another paperpresented at the Bloomberg conference shows how data science can increase government transparency.

Walsh and these co-authors developed software that automatically identifies earmarks in U.S. Congressional bills, showing how representatives are benefiting their own states with pork barrel projects. They verified that it works by comparing it to the results of a massive effort from the U.S. Office of Management and Budget to analyze earmarks for a few limited years. Their results, extended back to 1995 in a public database, showed that there may be many more earmarks than anyone thought.

“Governments are making more data available. It’s something like a needle in a haystack problem, trying to extract all that information out,” says Walsh. “Both of these projects are really about shining light to these dark places where we don’t know what’s going on.”

The state legislation tracker data is available for download here, and the team is working on an expanded system that automatically downloads new state legislation so it can stay up to date…(More)”

Advancing Open and Citizen-Centered Government


The White House: “Today, the United States released our third Open Government National Action Plan, announcing more than 40 new or expanded initiatives to advance the President’s commitment to an open and citizen-centered government….In the third Open Government National Action Plan, the Administration both broadens and deepens efforts to help government become more open and more citizen-centered. The plan includes new and impactful steps the Administration is taking to openly and collaboratively deliver government services and to support open government efforts across the country. These efforts prioritize a citizen-centric approach to government, including improved access to publicly available data to provide everyday Americans with the knowledge and tools necessary to make informed decisions.

One example is the College Scorecard, which shares data through application programming interfaces (APIs) to help students and families make informed choices about education. Open APIs help create an ecosystem around government data in which civil society can provide useful visual tools, making this data more accessible and commercial developers can enable even more value to be extracted to further empower students and their families. In addition to these newer approaches, the plan also highlights significant longstanding open government priorities such as access to information, fiscal transparency, and records management, and continues to push for greater progress in that work.

The plan also focuses on supporting implementation of the landmark 2030 Agenda for Sustainable Development, which sets out a vision and priorities for global development over the next 15 years and was adopted last month by 193 world leaders including President Obama. The plan includes commitments to harness open government and progress toward the Sustainable Development Goals (SDGs) both in the United States and globally, including in the areas of education, health, food security, climate resilience, science and innovation, justice and law enforcement. It also includes a commitment to take stock of existing U.S. government data that relates to the 17 SDGs, and to creating and using data to support progress toward the SDGs.

Some examples of open government efforts newly included in the plan:

  • Promoting employment by unlocking workforce data, including training, skill, job, and wage listings.
  • Enhancing transparency and participation by expanding available Federal services to theOpen311 platform currently available to cities, giving the public a seamless way to report problems and request assistance.
  • Releasing public information from the electronically filed tax forms of nonprofit and charitable organizations (990 forms) as open, machine-readable data.
  • Expanding access to justice through the White House Legal Aid Interagency Roundtable.
  • Promoting open and accountable implementation of the Sustainable Development Goals….(More)”