Safety Datapalooza Shows Power of Data.gov Communities


Lisa Nelson at DigitalGov: “The White House Office of Public Engagement held the first Safety Datapalooza illustrating the power of Data.gov communities. Federal Chief Technology Officer Todd Park and Deputy Secretary of Transportation John Porcari hosted the event, which touted the data available on Safety.Data.gov and the community of innovators using it to make effective tools for consumers.
The event showcased many of the  tools that have been produced as a result of  opening this safety data including:

  • PulsePoint, from the San Ramon Fire Protection District, a lifesaving mobile app that allows CPR-trained volunteers to be notified if someone nearby is in need of emergency assistance;
  • Commute and crime maps, from Trulia, allow home buyers to choose their new residence based on two important everyday factors; and
  • Hurricane App, from the American Red Cross, to monitor storm conditions, prepare your family and home, find help, and let others know you’re safe even if the power is out;

Safety data is far from alone in generating innovative ideas and gathering a community of developers and entrepreneurs, Data.gov currently has 16 different topically diverse communities on land and sea — the Cities and Oceans communities being two such examples. Data.gov’s communities are a virtual meeting spot for interested parties across government, academia and industry to come together and put the data to use. Data.gov enables a whole set of tools to make these communities come to life: apps, blogs, challenges, forums, ranking, rating and wikis.
For a summary of the Safety Datapalooza visit Transportation’s “Fast Lane” blog.”

EPA Launches New Citizen Science Website


Press Release:The U.S. Environmental Protection Agency has revamped its Citizen Science website to provide new resources and success stories to assist the public in conducting scientific research and collecting data to better understand their local environment and address issues of concern. The website can be found at www.epa.gov/region2/citizenscience.
“Citizen Science is an increasingly important part of EPA’s commitment to using sound science and technology to protect people’s health and safeguard the environment,” said Judith A. Enck, EPA Regional Administrator. “The EPA encourages the public to use the new website as a tool in furthering their scientific investigations and developing solutions to pollution problems.”
The updated website now offers detailed information about air, water and soil monitoring, including recommended types of equipment and resources for conducting investigations. It also includes case studies and videotapes that showcase successful citizen science projects in New York and New Jersey, provides funding opportunities, quality assurance information and workshops and webinars.”

E-government and organisational transformation of government: Black box revisited?


New paper in Government Information Quarterly: “During the e-government era the role of technology in the transformation of public sector organisations has significantly increased, whereby the relationship between ICT and organisational change in the public sector has become the subject of increasingly intensive research over the last decade. However, an overview of the literature to date indicates that the impacts of e-government on the organisational transformation of administrative structures and processes are still relatively poorly understood and vaguely defined.

The main purpose of the paper is therefore the following: (1) to examine the interdependence of e-government development and organisational transformation in public sector organisations and propose a clearer explanation of ICT’s role as a driving force of organisational transformation in further e-government development; and (2) to specify the main characteristics of organisational transformation in the e-government era through the development of a new framework. This framework describes organisational transformation in two dimensions, i.e. the ‘depth’ and the ‘nature’ of changes, and specifies the key attributes related to the three typical organisational levels.”

The LinkedIn Volunteer Marketplace: Connecting Professionals to Nonprofit Volunteer Opportunities


LinkedIn: “Last spring, a shelter in Berkeley, CA needed an architect to help it expand its facilities. A young architect who lives nearby had just made a New Year’s resolution to join a nonprofit board. In an earlier era, they would not have known each other existed.
But in this instance the shelter’s executive director used LinkedIn to contact the architect – and the architect jumped at the opportunity to serve on the shelter’s board. The connection brought enormous value to both parties involved – the nonprofit shelter got the expertise it needed and the young architect was able to amplify her social impact while broadening her professional skills.
This story inspired me and my colleagues at LinkedIn. As someone who studies and invests (as a venture capitalist) in internet marketplaces, I realized the somewhat serendipitous connection between architect and shelter would happen more often if there were a dedicated volunteer marketplace. After all, there are hundreds of thousands of “nonprofit needs” in the world, and even more professionals who want to donate their skills to help meet these needs.
The challenge is that nonprofits and professionals don’t know how to easily find each other. LinkedIn Volunteer Marketplace aims to solve that problem.
Changing the professional definition of “opportunity”
When I talk with LinkedIn members, many tell me they aren’t actively looking for traditional job opportunities. Instead, they want to hone or leverage their skills while also making a positive impact on the world.
Students often fall into this category. Retired professionals and stay-at-home parents seek ways to continue to leverage their skills and experience. And while busy professionals who love their current gigs may not necessarily be looking for a new position, these are often the very people who are most actively engaged in “meaningful searches” – a volunteer opportunity that will enhance their life in ways beyond what their primary vocation provides.
By providing opportunities for all these different kinds of LinkedIn members, we aim to help the social sector by doing what we do best as a company: connecting talent with opportunity at massive scale.
And to ensure that the volunteer opportunities you see in the LinkedIn Volunteer Marketplace are high quality, we’re partnering with the most trusted organizations in this space, including Catchafire, Taproot Foundation, BoardSource and VolunteerMatch.”
 

Tech Policy Is Not A Religion


Opinion Piece by Robert Atkinson: “”Digital libertarians” and “digital technocrats” want us to believe their way is the truth and the light. It’s not that black and white. Manichaeism, an ancient religion, took a dualistic view of the world. It described the struggle between a good, spiritual world of light, and an evil, material world of darkness. Listening to tech policy debates, especially in America, one would presume that Manichaeism is alive and well.
On one side (light or dark, depending on your view) are the folks who embrace free markets, bottom-up processes, multi-stakeholderism, open-source systems, and crowdsourced innovations. On the other are those who embrace government intervention, top-down processes, additional regulation, proprietary systems, and expert-based innovations.
For the first group, whom I’ll call the digital libertarians, government is the problem, not the solution. Tech enables freedom, and statist actions can only limit it.
According to this camp, tech is moving so fast that government can’t hope to keep up — the only workable governance system is a nimble one based on multi-stakeholder processes, such as ICANN and W3C. With Web 2.0, everyone can be a contributor, and it is through the proliferation of multiple and disparate voices that we discover the truth. And because of the ability of communities of coders to add their contributions, the only viable tech systems are based on open-source models.
For the second group, the digital technocrats, the problem is the anarchic, lawless, corporate-dominated nature of the digital world. Tech is so disruptive, including to long-established norms and laws, it needs to be limited and shaped, and only the strong hand of the state can do that. Because of the influence of tech on all aspects of society, any legitimate governance process must stem from democratic institutions — not from a select group of insiders — and that can only happen with government oversight such as through the UN’s International Telecommunication Union.
According to this camp, because there are so many uninformed voices on the Internet spreading urban myths like wildfire, we need carefully vetted experts, whether in media or other organizations, to sort through the mass of information and provide expert, unbiased analysis. And because IT systems are so critical to the safety and well-functioning of  society, we need companies to build and profit from them through a closed-source model.
Of course, just as religious Manichaeism leads to distorted practices of faith, tech Manichaeism leads to distorted policy practices and views. Take Internet governance. The process of ensuring Internet governance and evolution is complex and rapidly changing. A strong case can be made for the multi-stakeholder process as the driving force.
But this situation doesn’t mean, as digital libertarians would assert, that governments should stay out of the Internet altogether. Governments are not, as digital libertarian John Perry Barlow arrogantly asserts, “weary giants of flesh and steel.” Governments can and do play legitimate roles in many Internet policy issues, from establishing cybersecurity guidelines to setting online sales tax policy to combatting spam and digital piracy to setting rules governing unfair and deceptive online marketing practices.
This assertion doesn’t mean governments always get things right. They don’t. But as the Information Technology and Innovation Foundation writes in its recent response to Barlow’s manifesto, to deny people the right to regulate Internet activity through their government officials ignores the significant contribution the government can play in promoting the continued development of the Internet and digital economy.
At the same time, the digital technocrats must understand that the digital world is different from the analog one, and that old rules, regulations, and governing structures simply don’t apply. When ITU Secretary General Hamadoun Toure argues that “at the behest of all the world’s nations, the UN must lead this effort” to manage the global Internet, and that “for big commercial interests, it’s about maximizing the bottom line,” he’s ignoring the critical role that tech companies and other non-government stakeholders play in the Internet ecosystem.
Because digital technology is such a vastly complex system, digital libertarians claim that their “light” approach is superior to the “dark,” controlling, technocratic approach. In fact, this very complexity requires that we base Internet policy on pragmatism, not religion.
Conversely, because technology is so important to opportunity and the functioning of societies, digital technocrats assert that only governments can maximize these benefits. In fact, its importance requires us to respect its complexity and the role of private sector innovators in driving digital progress.
In short, the belief that one or the other of these approaches is sufficient in itself to maximize tech innovation is misleading at best and damaging at worst.”

Bad Data


Bad Data is a site providing real-world examples of how not to prepare or provide data. It showcases the poorly structured, the mis-formatted, or the just plain ugly. Its primary purpose is to educate – though there may also be some aspect of entertainment.
As a side-product it also provides a source of good practice material for budding data wranglers (the repo in fact began as a place to keep practice data for Data Explorer).
New examples wanted and welcome – submit them here »

Examples

Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk


New paper by Catherine E. Schmitt-Sands and Richard J. Smith: “While the internet has created new opportunities for research, managing the increased complexity of relationships and knowledge also creates challenges. Amazon.com has a Mechanical Turk service that allows people to crowdsource simple tasks for a nominal fee. The online workers may be anywhere in North America or India and range in ability. Social science researchers are only beginning to use this service. While researchers have used crowdsourcing to find research subjects or classify texts, we used Mechanical Turk to conduct a policy scan of local government websites. This article describes the process used to train and ensure quality of the policy scan. It also examines choices in the context of research ethics.”

Garbage In, Garbage Out… Or, How to Lie with Bad Data


Medium: For everyone who slept through Stats 101, Charles Wheelan’s Naked Statistics is a lifesaver. From batting averages and political polls to Schlitz ads and medical research, Wheelan “illustrates exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life” (New York Times). What follows is adapted from the book, out now in paperback.
Behind every important study there are good data that made the analysis possible. And behind every bad study . . . well, read on. People often speak about “lying with statistics.” I would argue that some of the most egregious statistical mistakes involve lying with data; the statistical analysis is fine, but the data on which the calculations are performed are bogus or inappropriate. Here are some common examples of “garbage in, garbage out.”

Selection Bias

….Selection bias can be introduced in many other ways. A survey of consumers in an airport is going to be biased by the fact that people who fly are likely to be wealthier than the general public; a survey at a rest stop on Interstate 90 may have the opposite problem. Both surveys are likely to be biased by the fact that people who are willing to answer a survey in a public place are different from people who would prefer not to be bothered. If you ask 100 people in a public place to complete a short survey, and 60 are willing to answer your questions, those 60 are likely to be different in significant ways from the 40 who walked by without making eye contact.

Publication Bias

Positive findings are more likely to be published than negative findings, which can skew the results that we see. Suppose you have just conducted a rigorous, longitudinal study in which you find conclusively that playing video games does not prevent colon cancer. You’ve followed a representative sample of 100,000 Americans for twenty years; those participants who spend hours playing video games have roughly the same incidence of colon cancer as the participants who do not play video games at all. We’ll assume your methodology is impeccable. Which prestigious medical journal is going to publish your results?

Most things don’t prevent cancer.

None, for two reasons. First, there is no strong scientific reason to believe that playing video games has any impact on colon cancer, so it is not obvious why you were doing this study. Second, and more relevant here, the fact that something does not prevent cancer is not a particularly interesting finding. After all, most things don’t prevent cancer. Negative findings are not especially sexy, in medicine or elsewhere.
The net effect is to distort the research that we see, or do not see. Suppose that one of your graduate school classmates has conducted a different longitudinal study. She finds that people who spend a lot of time playing video games do have a lower incidence of colon cancer. Now that is interesting! That is exactly the kind of finding that would catch the attention of a medical journal, the popular press, bloggers, and video game makers (who would slap labels on their products extolling the health benefits of their products). It wouldn’t be long before Tiger Moms all over the country were “protecting” their children from cancer by snatching books out of their hands and forcing them to play video games instead.
Of course, one important recurring idea in statistics is that unusual things happen every once in a while, just as a matter of chance. If you conduct 100 studies, one of them is likely to turn up results that are pure nonsense—like a statistical association between playing video games and a lower incidence of colon cancer. Here is the problem: The 99 studies that find no link between video games and colon cancer will not get published, because they are not very interesting. The one study that does find a statistical link will make it into print and get loads of follow-on attention. The source of the bias stems not from the studies themselves but from the skewed information that actually reaches the public. Someone reading the scientific literature on video games and cancer would find only a single study, and that single study will suggest that playing video games can prevent cancer. In fact, 99 studies out of 100 would have found no such link.

Recall Bias

Memory is a fascinating thing—though not always a great source of good data. We have a natural human impulse to understand the present as a logical consequence of things that happened in the past—cause and effect. The problem is that our memories turn out to be “systematically fragile” when we are trying to explain some particularly good or bad outcome in the present. Consider a study looking at the relationship between diet and cancer. In 1993, a Harvard researcher compiled a data set comprising a group of women with breast cancer and an age-matched group of women who had not been diagnosed with cancer. Women in both groups were asked about their dietary habits earlier in life. The study produced clear results: The women with breast cancer were significantly more likely to have had diets that were high in fat when they were younger.
Ah, but this wasn’t actually a study of how diet affects the likelihood of getting cancer. This was a study of how getting cancer affects a woman’s memory of her diet earlier in life. All of the women in the study had completed a dietary survey years earlier, before any of them had been diagnosed with cancer. The striking finding was that women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

Women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

The New York Times Magazine described the insidious nature of this recall bias:

The diagnosis of breast cancer had not just changed a woman’s present and the future; it had altered her past. Women with breast cancer had (unconsciously) decided that a higher-fat diet was a likely predisposition for their disease and (unconsciously) recalled a high-fat diet. It was a pattern poignantly familiar to anyone who knows the history of this stigmatized illness: these women, like thousands of women before them, had searched their own memories for a cause and then summoned that cause into memory.

Recall bias is one reason that longitudinal studies are often preferred to cross-sectional studies. In a longitudinal study the data are collected contemporaneously. At age five, a participant can be asked about his attitudes toward school. Then, thirteen years later, we can revisit that same participant and determine whether he has dropped out of high school. In a cross-sectional study, in which all the data are collected at one point in time, we must ask an eighteen-year-old high school dropout how he or she felt about school at age five, which is inherently less reliable.

Survivorship Bias

Suppose a high school principal reports that test scores for a particular cohort of students has risen steadily for four years. The sophomore scores for this class were better than their freshman scores. The scores from junior year were better still, and the senior year scores were best of all. We’ll stipulate that there is no cheating going on, and not even any creative use of descriptive statistics. Every year this cohort of students has done better than it did the preceding year, by every possible measure: mean, median, percentage of students at grade level, and so on. Would you (a) nominate this school leader for “principal of the year” or (b) demand more data?

If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.

I say “b.” I smell survivorship bias, which occurs when some or many of the observations are falling out of the sample, changing the composition of the observations that are left and therefore affecting the results of any analysis. Let’s suppose that our principal is truly awful. The students in his school are learning nothing; each year half of them drop out. Well, that could do very nice things for the school’s test scores—without any individual student testing better. If we make the reasonable assumption that the worst students (with the lowest test scores) are the most likely to drop out, then the average test scores of those students left behind will go up steadily as more and more students drop out. (If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.)

Healthy User Bias

People who take vitamins regularly are likely to be healthy—because they are the kind of people who take vitamins regularly! Whether the vitamins have any impact is a separate issue. Consider the following thought experiment. Suppose public health officials promulgate a theory that all new parents should put their children to bed only in purple pajamas, because that helps stimulate brain development. Twenty years later, longitudinal research confirms that having worn purple pajamas as a child does have an overwhelmingly large positive association with success in life. We find, for example, that 98 percent of entering Harvard freshmen wore purple pajamas as children (and many still do) compared with only 3 percent of inmates in the Massachusetts state prison system.

The purple pajamas do not matter.

Of course, the purple pajamas do not matter; but having the kind of parents who put their children in purple pajamas does matter. Even when we try to control for factors like parental education, we are still going to be left with unobservable differences between those parents who obsess about putting their children in purple pajamas and those who don’t. As New York Times health writer Gary Taubes explains, “At its simplest, the problem is that people who faithfully engage in activities that are good for them—taking a drug as prescribed, for instance, or eating what they believe is a healthy diet—are fundamentally different from those who don’t.” This effect can potentially confound any study trying to evaluate the real effect of activities perceived to be healthful, such as exercising regularly or eating kale. We think we are comparing the health effects of two diets: kale versus no kale. In fact, if the treatment and control groups are not randomly assigned, we are comparing two diets that are being eaten by two different kinds of people. We have a treatment group that is different from the control group in two respects, rather than just one.

If statistics is detective work, then the data are the clues. My wife spent a year teaching high school students in rural New Hampshire. One of her students was arrested for breaking into a hardware store and stealing some tools. The police were able to crack the case because (1) it had just snowed and there were tracks in the snow leading from the hardware store to the student’s home; and (2) the stolen tools were found inside. Good clues help.
Like good data. But first you have to get good data, and that is a lot harder than it seems.

From funding agencies to scientific agency –


New paper on “Collective allocation of science funding as an alternative to peer review”: “Publicly funded research involves the distribution of a considerable amount of money. Funding agencies such as the US National Science Foundation (NSF), the US National Institutes of Health (NIH) and the European Research Council (ERC) give billions of dollars or euros of taxpayers’ money to individual researchers, research teams, universities, and research institutes each year. Taxpayers accordingly expect that governments and funding agencies will spend their money prudently and efficiently.

Investing money to the greatest effect is not a challenge unique to research funding agencies and there are many strategies and schemes to choose from. Nevertheless, most funders rely on a tried and tested method in line with the tradition of the scientific community: the peer review of individual proposals to identify the most promising projects for funding. This method has been considered the gold standard for assessing the scientific value of research projects essentially since the end of the Second World War.

However, there is mounting critique of the use of peer review to direct research funding. High on the list of complaints is the cost, both in terms of time and money. In 2012, for example, NSF convened more than 17,000 scientists to review 53,556 proposals [1]. Reviewers generally spend a considerable time and effort to assess and rate proposals of which only a minority can eventually get funded. Of course, such a high rejection rate is also frustrating for the applicants. Scientists spend an increasing amount of time writing and submitting grant proposals. Overall, the scientific community invests an extraordinary amount of time, energy, and effort into the writing and reviewing of research proposals, most of which end up not getting funded at all. This time would be better invested in conducting the research in the first place.

Peer review may also be subject to biases, inconsistencies, and oversights. The need for review panels to reach consensus may lead to sub‐optimal decisions owing to the inherently stochastic nature of the peer review process. Moreover, in a period where the money available to fund research is shrinking, reviewers may tend to “play it safe” and select proposals that have a high chance of producing results, rather than more challenging and ambitious projects. Additionally, the structuring of funding around calls‐for‐proposals to address specific topics might inhibit serendipitous discovery, as scientists work on problems for which funding happens to be available rather than trying to solve more challenging problems.

The scientific community holds peer review in high regard, but it may not actually be the best possible system for identifying and supporting promising science. Many proposals have been made to reform funding systems, ranging from incremental changes to peer review—including careful selection of reviewers [2] and post‐hoc normalization of reviews [3]—to more radical proposals such as opening up review to the entire online population [4] or removing human reviewers altogether by allocating funds through an objective performance measure [5].

We would like to add another alternative inspired by the mathematical models used to search the internet for relevant information: a highly decentralized funding model in which the wisdom of the entire scientific community is leveraged to determine a fair distribution of funding. It would still require human insight and decision‐making, but it would drastically reduce the overhead costs and may alleviate many of the issues and inefficiencies of the proposal submission and peer review system, such as bias, “playing it safe”, or reluctance to support curiosity‐driven research.

Our proposed system would require funding agencies to give all scientists within their remit an unconditional, equal amount of money each year. However, each scientist would then be required to pass on a fixed percentage of their previous year’s funding to other scientists whom they think would make best use of the money (Fig 1). Every year, then, scientists would receive a fixed basic grant from their funding agency combined with an elective amount of funding donated by their peers. As a result of each scientist having to distribute a given percentage of their previous year’s budget to other scientists, money would flow through the scientific community. Scientists who are generally anticipated to make the best use of funding will accumulate more.”

Open data movement faces fresh hurdles


SciDevNet: “The open-data community made great strides in 2013 towards increasing the reliability of and access to information, but more efforts are needed to increase its usability on the ground and the general capacity of those using it, experts say.
An international network of innovation hubs, the first extensive open data certification system and a data for development partnership are three initiatives launched last year by the fledgling Open Data Institute (ODI), a UK-based not-for-profit firm that champions the use of open data to aid social, economic and environmental development.
Before open data can be used effectively the biggest hurdles to be cleared are agreeing common formats for data sets and improving their trustworthiness and searchability, says the ODI’s chief statistician, Ulrich Atz.
“As it is so new, open data is often inconsistent in its format, making it difficult to reuse. We see a great need for standards and tools,” he tells SciDev.Net. Data that is standardised is of “incredible value” he says, because this makes it easier and faster to use and gives it a longer useable lifetime.
The ODI — which celebrated its first anniversary last month — is attempting to achieve this with a first-of-its-kind certification system that gives publishers and users important details about online data sets, including publishers’ names and contact information, the type of sharing licence, the quality of information and how long it will be available.
Certificates encourage businesses and governments to make use of open data by guaranteeing their quality and usability, and making them easier to find online, says Atz.
Finding more and better ways to apply open data will also be supported by a growing network of ODI ‘nodes’: centres that bring together companies, universities and NGOs to support open-data projects and communities….
Because lower-income countries often lack well-established data collection systems, they have greater freedom to rethink how data are collected and how they flow between governments and civil society, he says.
But there is still a long way to go. Open-data projects currently rely on governments and other providers sharing their data on online platforms, whereas in a truly effective system, information would be published in an open format from the start, says Davies.
Furthermore, even where advances are being made at a strategic level, open-data initiatives are still having only a modest impact in the real world, he says.
“Transferring [progress at a policy level] into availability of data on the ground and the capacity to use it is a lot tougher and slower,” Davies says.”