Collection

The Crusade Against Multiple Regression Analysis

Curated on January 26, 2016August 3, 2018 by Stefaan Verhulst

Richard Nisbett at the Edge: (VIDEO) “…The thing I’m most interested in right now has become a kind of crusade against correlational statistical analysis—in particular, what’s called multiple regression analysis. Say you want to find out whether taking Vitamin E is associated with lower prostate cancer risk. You look at the correlational evidence and indeed it turns out that men who take Vitamin E have lower risk for prostate cancer. Then someone says, “Well, let’s see if we do the actual experiment, what happens.” And what happens when you do the experiment is that Vitamin E contributes to the likelihood of prostate cancer. How could there be differences? These happen a lot. The correlational—the observational—evidence tells you one thing, the experimental evidence tells you something completely different.

In the case of health data, the big problem is something that’s come to be called the healthy user bias, because the guy who’s taking Vitamin E is also doing everything else right. A doctor or an article has told him to take Vitamin E, so he does that, but he’s also the guy who’s watching his weight and his cholesterol, gets plenty of exercise, drinks alcohol in moderation, doesn’t smoke, has a high level of education, and a high income. All of these things are likely to make you live longer, to make you less subject to morbidity and mortality risks of all kinds. You pull one thing out of that correlate and it’s going to look like Vitamin E is terrific because it’s dragging all these other good things along with it.

This is not, by any means, limited to health issues. A while back, I read a government report in The New York Times on the safety of automobiles. The measure that they used was the deaths per million drivers of each of these autos. It turns out that, for example, there are enormously more deaths per million drivers who drive Ford F150 pickups than for people who drive Volvo station wagons. Most people’s reaction, and certainly my initial reaction to it was, “Well, it sort of figures—everybody knows that Volvos are safe.”

Let’s describe two people and you tell me who you think is more likely to be driving the Volvo and who is more likely to be driving the pickup: a suburban matron in the New York area and a twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are not assigned their cars. We don’t say, “Billy, you’ll be driving a powder blue Volvo station wagon.” Because of this self-selection problem, you simply can’t interpret data like that. You know virtually nothing about the relative safety of cars based on that study.

I saw in The New York Times recently an article by a respected writer reporting that people who have elaborate weddings tend to have marriages that last longer. How would that be? Maybe it’s just all the darned expense and bother—you don’t want to get divorced. It’s a cognitive dissonance thing.

Let’s think about who makes elaborate plans for expensive weddings: people who are better off financially, which is by itself a good prognosis for marriage; people who are more educated, also a better prognosis; people who are richer; people who are older—the later you get married, the more likelihood that the marriage will last, and so on.

The truth is you’ve learned nothing. It’s like saying men who are a somebody III or IV have longer-lasting marriages. Is it because of the suffix there? No, it’s because those people are the types who have a good prognosis for a lengthy marriage.

A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging….(More)”

What World Are We Building?

Curated on January 26, 2016August 3, 2018 by Stefaan Verhulst

danah boyd at Points: “….Knowing how to use data isn’t easy. One of my colleagues at Microsoft Research — Eric Horvitz — can predict with startling accuracy whether someone will be hospitalized based on what they search for. What should he do with that information? Reach out to people? That’s pretty creepy. Do nothing? Is that ethical? No matter how good our predictions are, figuring out how to use them is a complex social and cultural issue that technology doesn’t solve for us. In fact, as it stands, technology is just making it harder for us to have a reasonable conversation about agency and dignity, responsibility and ethics.

Data is power. Increasingly we’re seeing data being used to assert power over people. It doesn’t have to be this way, but one of the things that I’ve learned is that, unchecked, new tools are almost always empowering to the privileged at the expense of those who are not.

For most media activists, unfettered Internet access is at the center of the conversation, and that is critically important. Today we’re standing on a new precipice, and we need to think a few steps ahead of the current fight.

We are moving into a world of prediction. A world where more people are going to be able to make judgments about others based on data. Data analysis that can mark the value of people as worthy workers, parents, borrowers, learners, and citizens. Data analysis that has been underway for decades but is increasingly salient in decision-making across numerous sectors. Data analysis that most people don’t understand.

Many activists will be looking to fight the ecosystem of prediction — and to regulate when and where prediction can be used. This is all fine and well when we’re talking about how these technologies are designed to do harm. But more often than not, these tools will be designed to be helpful, to increase efficiency, to identify people who need help. Their positive uses will exist alongside uses that are terrifying. What do we do?One of the most obvious issues is the limited diversity of people who are building and using these tools to imagine our future. Statistical and technical literacy isn’t even part of the curriculum in most American schools. In our society where technology jobs are high-paying and technical literacy is needed for citizenry, less than 5% of high schools offer AP computer science courses. Needless to say, black and brown youth are much less likely to have access, let alone opportunities. If people don’t understand what these systems are doing, how do we expect people to challenge them?

One of the most obvious issues is the limited diversity of people who are building and using these tools to imagine our future. Statistical and technical literacy isn’t even part of the curriculum in most American schools. In our society where technology jobs are high-paying and technical literacy is needed for citizenry, less than 5% of high schools offer AP computer science courses. Needless to say, black and brown youth are much less likely to have access, let alone opportunities. If people don’t understand what these systems are doing, how do we expect people to challenge them?

We must learn how to ask hard questions of technology and of those making decisions based data-driven tech. And opening the black box isn’t enough. Transparency of data, algorithms, and technology isn’t enough. We need to build assessment into any system that we roll-out. You can’t just put millions of dollars of surveillance equipment into the hands of the police in the hope of creating police accountability, yet, with police body-worn cameras, that’s exactly what we’re doing. And we’re not even trying to assess the implications. This is probably the fastest roll-out of a technology out of hope, and it won’t be the last. How do we get people to look beyond their hopes and fears and actively interrogate the trade-offs?

Technology plays a central role — more and more — in every sector, every community, every interaction. It’s easy to screech in fear or dream of a world in which every problem magically gets solved. To make the world a better place, we need to start paying attention to the different tools that are emerging and learn to frame hard questions about how they should be put to use to improve the lives of everyday people.

We need those who are thinking about social justice to understand technology and those who understand technology to commit to social justice….(More)”

Methods of Estimating the Total Cost of Regulations

Curated on January 26, 2016August 3, 2018 by Stefaan Verhulst

Maeve P. Carey for the Congressional Research Service: “Federal agencies issue thousands of regulations each year under delegated authority from Congress. Over the past 70 years, Congress and various Presidents have created a set of procedures agencies must follow to issue these regulations, some of which contain requirements for the calculation and consideration of costs, benefits, and other economic effects of regulations. In recent years, many Members of Congress have expressed an interest in various regulatory reform efforts that would change the current set of rulemaking requirements, including requirements to estimate costs and benefits of regulations. As part of this debate, it has become common for supporters of regulatory reform to comment on the total cost of federal regulation. Estimating the total cost of regulations is inherently difficult. Current estimates of the cost of regulation should be viewed with a great deal of caution. Scholars and governmental entities estimating the total cost of regulation use one of two methods, which are referred to as the “bottom-up” and the “top-down” approach.

The bottom-up approach aggregates individual cost and benefit estimates produced by agencies, arriving at a governmentwide total. In 2014, the annual report to Congress from the Office of Management and Budget estimated the total cost of federal regulations to range between $68.5 and $101.8 billion and the total benefits to be between $261.7 billion and $1,042.1 billion. The top-down approach estimates the total cost of regulation by looking at the relationship of certain macroeconomic factors, including the size of a country’s economy and a proxy measure of how much regulation the country has. This method estimates the economic effect that a hypothetical change in the amount of regulation in the United States might have, considering that economic effect to represent the cost of regulation. One frequently cited study estimated the total cost of regulation in 2014 to be $2.028 trillion, $1.439 trillion of which was calculated using this top-down approach. Each approach has inherent advantages and disadvantages.

The bottom-up approach relies on agency estimates of the effects of specific regulations and can also be used to estimate benefits, because agencies typically estimate both costs and benefits under current requirements so that they may be compared and evaluated against alternatives. The bottom-up approach does not, however, include estimates of costs and benefits of all rules, nor does it include costs and benefits of regulations that are not monetized—meaning that the bottom-up approach is likely an underestimate of the total cost of regulation. Furthermore, the individual estimates produced by agencies and used in the bottom-up approach may not always be accurate.

The top-down approach can be used to estimate effects of rules that are not captured by the bottom-up approach—such as indirect costs and costs of rules issued by independent regulatory agencies, which are not included in the bottom-up approach—thus theoretically capturing the whole universe of regulatory costs. Its results are, however, entirely reliant upon a number of methodological challenges that are difficult, if not impossible, to overcome. The biggest challenge may be finding a valid proxy measure for regulation: proxy measures of the total amount of regulation in a country are inherently imprecise and cannot be reliably used to estimate macroeconomic outcomes. Because of this difficulty in identifying a suitable proxy measure of regulation, even if the total cost of regulation is substantial, it cannot be estimated with any precision. The top-down method is intended to measure only costs; measuring costs without also considering benefits does not provide the complete context for evaluating the appropriateness of a country’s amount of regulation.

For these and other reasons, both approaches to estimating the total cost of regulation have inherent—and potentially insurmountable—flaws….(More)”

Can We Use Data to Stop Deadly Car Crashes?

Curated on January 26, 2016October 10, 2018 by Stefaan Verhulst

Allison Shapiro in Pacific Standard Magazine: “In 2014, New York City Mayor Bill de Blasio decided to adopt Vision Zero, a multi-national initiative dedicated to eliminating traffic-related deaths. Under Vision Zero, city services, including the Department of Transportation, began an engineering and public relations plan to make the streets safer for drivers, pedestrians, and cyclists. The plan included street re-designs, improved accessibility measures, and media campaigns on safer driving.

The goal may be an old one, but the approach is innovative: When New York City officials wanted to reduce traffic deaths, they crowdsourced and used data.

Many cities in the United States—from Washington, D.C., all the way to Los Angeles—have adopted some version of Vision Zero, which began in Sweden in 1997. It’s part of a growing trend to make cities “smart” by integrating data collection into things like infrastructure and policing.

Map of high crash corridors in Portland, Oregon. (Map: Portland Bureau of Transportation)

Cities have access to an unprecedented amount of data about traffic patterns, driving violations, and pedestrian concerns. Although advocacy groups say Vision Zero is moving too slowly, de Blasio has invested another $115 million in this data-driven approach.

Interactive safety map. (Map: District Department of Transportation)

De Blasio may have been vindicated. A 2015 year-end report released by the city last week analyzes the successes and shortfalls of data-driven city life, and the early results look promising. In 2015, fewer New Yorkers lost their lives in traffic accidents than in any year since 1910, according to the report, despite the fact that the population has almost doubled in those 105 years.

Below are some of the project highlights.

New Yorkers were invited to add to this public dialogue map, where they could list information ranging from “not enough time to cross” to “red light running.” The Department of Transportation ended up with over 10,000 comments, which led to 80 safety projects in 2015, including the creation of protected bike lanes, the introduction of leading pedestrian intervals, and the simplifying of complex intersections….

Data collected from the public dialogue map, town hall meetings, and past traffic accidents led to “changes to signals, street geometry and markings and regulations that govern actions like turning and parking. These projects simplify driving, walking and bicycling, increase predictability, improve visibility and reduce conflicts,” according to Vision Zero in NYC….(More)”

Iowa fights snow with data

Curated on January 26, 2016October 24, 2018 by Stefaan Verhulst

Patrick Marshall at GCN: “Most residents of the Mid-Atlantic states, now digging out from the recent record-setting snowstorm, probably don’t know how soon their streets will be clear. If they lived in Iowa, however, they could simply go to the state’s Track a Plow website to see in near real time where snow plows are and in what direction they’re heading.

In fact, the Track a Plow site — the first iteration of which launched three years ago — shows much more than just the location and direction of the state’s more than 900 plows. Because they are equipped with geolocation equipment and a variety of sensors, the plows also provide information on road conditions, road closures and whether trucks are applying liquid or solid materials to counter snow and ice. That data is regularly uploaded to Track a Plow, which also offers near-real-time video and photos of conditions.

Track a Plow screenshot

According to Eric Abrams, geospatial manager at the Iowa Department of Transportation, the service is very popular and is being used for a variety of purposes. “It’s been one of the greatest public interface things that DOT has ever done,” he said. In addition to citizens considering travel, Abrams said the, site’s heavy users include news stations, freight companies routing vehicles and school districts determining whether to delay opening or cancel classes.

How it works

While Track a Plow launched with just location information, it has been frequently enhanced over the past two years, beginning with the installation of video cameras. “The challenge was to find a cost-effective way to put cams in the plows and then get those images not just to supervisors but to the public,” Abrams said. The solution he arrived at was dashboard-mounted iPhones that transmit time and location data in addition to images. These were especially cost-effective because they were free with the department’s Verizon data plan. “Our IT division built a custom iPhone app that is configurable for how often it sends pictures back to headquarters here, where we process them and get them out to the feed,” he explained….(More)”

Crowdfunded Journalism: A Small but Growing Addition to Publicly Driven Journalism

Curated on January 26, 2016October 9, 2018 by Stefaan Verhulst

Nancy Vogt and Amy Mitchell at PewResearchCenter: “Projects funded through Kickstarter cut across more than 60 countries

Over the past several years, crowdfunding via the internet has become a popular way to engage public support – and financial backing – for all kinds of projects, from the Coolest Cooler to a virtual reality gaming headset to a prototype of a sailing spacecraft and a bailout fund for Greece.

From April 28, 2009 to September 15, 2015, 658 journalism-related projects proposed on Kickstarter, one of the largest single hubs for crowdfunding journalism, received full – or more than full – funding, to the tune of nearly $6.3 million.These totals – both in terms of number of projects and funds raised – trail nearly all of Kickstarter’s other funding categories, from music, theater and film to technology and games. Nevertheless, the number of funded journalism projects has seen an ongoing increase over time and includes a growing number of proposals from established media organizations.

These totals – both in terms of number of projects and funds raised – trail nearly all of Kickstarter’s other funding categories, from music, theater and film to technology and games. Nevertheless, the number of funded journalism projects has seen an ongoing increase over time and includes a growing number of proposals from established media organizations….(More)

4 reasons why businesses should be more open

Curated on January 25, 2016August 3, 2018 by Stefaan Verhulst

Cobus De Swardt at WEF: “Many initiatives in recent years have extolled the virtues of governments becoming more open, but now the focus is turning to whether and how businesses will embrace openness.

Here are four reasons why I think businesses should take openness seriously.

1. Openness enhances stability and lowers risk

In a 2012 report, 70% of business executives said “their companies face extensive risk of corrupt activities when engaging agents/business partners in emerging markets and a significant number (46%) felt there was extensive risk when engaging suppliers”.

In the context of foreign bribery laws, companies need to know who they are doing business with – the real living, breathing individuals behind the companies with which they have relationships. ….

2. Openness reassures companies, consumers and citizens

How can citizens know that they are getting the best deal? Let’s take the way governments spend money by contracting the goods and services of companies. Spending this money at the right time, in the right place, for the right purpose is crucial for taxpayers, the people who stand to win or lose the most. So companies who win public contracts must be those with the best bid – not the best contacts book.

Some governments are moving towards more open contracting arrangements, but businesses should see the benefits too…

3. Openness lowers business costs

A growing number of cases show that when governments publish contracts, the quality and quantity of bids increase. Businesses have a better understanding of what is required; they can also make more targeted bids early on. It’s important to prove that there are no dodgy deals going on behind closed doors, which makes ensuring full transparency around who actually owns and controls the bidding companies crucial…

4. Openness demonstrates that businesses are part of the solution

Finally, being open to sharing more information and engaging with stakeholders in a more open manner helps demonstrate that companies can be part of the solution. Businesses are often seen, and sometimes deservedly so, as the perpetrators of corruption – but they can also be its victim. Responsible companies have a role to play in calling for higher standards, publishing information beyond the standards required of them, embracing openness and not fighting lawsuits to lock the information up….

So, how to open up?

It may take a while for the business world to see openness as something more than a compliance or administrative burden, but those that do so are sure to gain. Governments should find ways to incentivize companies to publish information such as their ownership and control structures. ….(More)

Opening Governance – Change, Continuity and Conceptual Ambiguity

Curated on January 25, 2016August 3, 2018 by Stefaan Verhulst

Introduction to special issue of IDS Bulletin by Rosemary McGee and Duncan Edwards: “Open government and open data are new areas of research, advocacy and activism that have entered the governance field alongside the more established areas of transparency and accountability. This article reviews recent scholarship in these areas, pinpointing contributions to more open, transparent, accountable and responsive governance via improved practice, projects and programmes. The authors set the rest of the articles from this IDS Bulletin in the context of the ideas, relationships, processes, behaviours, policy frameworks and aid funding practices of the last five years, and critically discuss questions and weaknesses that limit the effectiveness and impact of this work. Identifying conceptual ambiguity as a key problem, they offer a series of definitions to help overcome the technical and political difficulties this causes. They also identify hype and euphemism, and offer a series of conclusions to help restore meaning and ideological content to work on open government and open data in transparent and accountable governance….(More)”

2015 Philip Meyer Award winners for data-driven investigation

Curated on January 23, 2016August 3, 2018 by Stefaan Verhulst

From IRE: “First Place: “Failure Factories” | Tampa Bay Times
Cara Fitzpatrick, Michael LaForgia, Lisa Gartner, Nathaniel Lash and Connie Humburg

The team used statistical analysis and linear regression of data from dozens of records requests to document how steady resegregation of Pinellas County schools left black children to fail at increasingly higher rates than anywhere else in Florida. The series focused on failures of school district officials to give the schools the support necessary for success. The judges praised the reporters for dogged work on a project that took 18 months to report and write, and noted that the results underscored what decades of sociological research has shown happens in racially segregated schools.

Second Place: “The Changing Face of America” | USA Today
Paul Overberg, Sarah Frostenson, Marisol Bello, Greg Toppo, and Jodi Upton

The project was built around measurements across time of the racial and ethnic diversity of each of America’s more than 3,100 counties, going back to 1960 and projected ahead to 2060. The reporters used the results to reveal that high levels of diversity, once found only in a few Southern states and along the border with Mexico, had bloomed out into large areas of the upper Midwest and the Appalachians, for instance. Those results informed the assignments of reporters to find the local stories that illustrated those changes, with the results running in more than 100 Gannett papers and broadcast stations.

Third Place: “The Echo Chamber” | Thomson Reuters
Joan Biskupic, Janet Roberts and John Shiffman

The Reuters team analyzed the characteristics of more than 14,400 U.S. Supreme Court records from nine years worth of petitions seeking review by the Court. The analysis showed that 43% of cases eventually heard by the court came from a tiny pool of a few dozen lawyers who represent less than 1% of the more than 17,000 lawyers seeking such review. Further reporting showed that these elite lawyers, mostly representing large corporations, had strong personal connections with the justices, with about half of them having served as clerks to the justices….(More)”