Government data: How open is too open?


Sharon Fisher at HPE: “The notion of “open government” appeals to both citizens and IT professionals seeking access to freely available government data. But is there such a thing as data access being too open? Governments may want to be transparent, yet they need to avoid releasing personally identifiable information.

There’s no question that open government data offers many benefits. It gives citizens access to the data their taxes paid for, enables government oversight, and powers the applications developed by government, vendors, and citizens that improve people’s lives.

However, data breaches and concerns about the amount of data that government is collecting makes some people wonder: When is it too much?

“As we think through the big questions about what kind of data a state should collect, how it should use it, and how to disclose it, these nuances become not some theoretical issue but a matter of life and death to some people,” says Alexander Howard, deputy director of the Sunlight Foundation, a Washington nonprofit that advocates for open government. “There are people in government databases where the disclosure of their [physical] location is the difference between a life-changing day and Wednesday.

Open data supporters point out that much of this data has been considered a public record all along and tout the value of its use in analytics. But having personal data aggregated in a single place that is accessible online—as opposed to, say, having to go to an office and physically look up each record—makes some people uneasy.

Privacy breaches, wholesale

“We’ve seen a real change in how people perceive privacy,” says Michael Morisy, executive director at MuckRock, a Cambridge, Massachusetts, nonprofit that helps media and citizens file public records requests. “It’s been driven by a long-standing concept in transparency: practical obscurity.” Even if something was technically a public record, effort needed to be expended to get one’s hands on it. That amount of work might be worth it about, say, someone running for office, but on the whole, private citizens didn’t have to worry. Things are different now, says Morisy. “With Google, and so much data being available at the click of a mouse or the tap of a phone, what was once practically obscure is now instantly available.”

People are sometimes also surprised to find out that public records can contain their personally identifiable information (PII), such as addresses, phone numbers, and even Social Security numbers. That may be on purpose or because someone failed to redact the data properly.

That’s had consequences. Over the years, there have been a number of incidents in which PII from public records, including addresses, was used to harass and sometimes even kill people. For example, in 1989, Rebecca Schaeffer was murdered by a stalker who learned her address from the Department of Motor Vehicles. Other examples of harassment via driver’s license numbers include thieves who tracked down the address of owners of expensive cars and activists who sent anti-abortion literature to women who had visited health clinics that performed abortions.

In response, in 1994, Congress enacted the Driver’s Privacy Protection Act to restrict the sale of such data. More recently, the state of Idaho passed a law protecting the identity of hunters who shot wolves, because the hunters were being harassed by wolf supporters. Similarly, the state of New York allowed concealed pistol permit holders to make their name and address private after a newspaper published an online interactive map showing the names and addresses of all handgun permit holders in Westchester and Rockland counties….(More)”.

When census taking is a recipe for controversy


Anjana Ahuja in the Financial Times: “Population counts are important tools for development, but also politically fraught…The UN describes a census as “among the most complex and massive peacetime exercises a nation undertakes”. Given that social trends, migration patterns and inequalities can be determined from questions that range from health to wealth, housing and even religious beliefs, censuses can also be controversial. So it is with the next one in the US, due to be conducted in 2020. The US Department of Justice has proposed that participants should be quizzed on their citizenship status. Vanita Gupta, president of the Leadership Conference on Civil and Human Rights, warned the journal Science that many would refuse to take part. Ms Gupta said that, in the current political climate, enquiring about citizenship “would destroy any chance of an accurate count, discard years of careful research and increase costs significantly”.

The row has taken on a new urgency because the 2020 census must be finalised by April. The DoJ claims that a citizenship question will ensure that ethnic minorities are treated fairly in the voting process. Currently, only about one in six households is asked about citizenship, with the results extrapolated for the whole population, a process observers say is statistically acceptable and less intrusive. In 2011, the census for England and Wales asked for country of birth and passports held — but not citizenship explicitly. It is one of those curious cases when fewer questions might lead to more accurate and useful data….(More)”.

Cops, Docs, and Code: A Dialogue between Big Data in Health Care and Predictive Policing


Paper by I. Glenn Cohen and Harry Graver: “Big data” has become the ubiquitous watchword of this decade. Predictive analytics, which is something we want to do with big data — to use of electronic algorithms to forecast future events in real time. Predictive analytics is interfacing with the law in a myriad of settings: how votes are counted and voter rolls revised, the targeting of taxpayers for auditing, the selection of travelers for more intensive searching, pharmacovigilance, the creation of new drugs and diagnostics, etc.

In this paper, written for the symposium “Future Proofing the Law,” we want to engage in a bit of legal arbitrage; that is, we want to examine which insights from legal analysis of predictive analytics in better-trodden ground — predictive policing — can be useful for understanding relatively newer ground for legal scholars — the use of predictive analytics in health care. To the degree lessons can be learned from this dialogue, we think they go in both directions….(More)”.

Why the deliberative democracy framework doesn’t quite work for me


Essay by Peter Levine: “In some ways, I came of age in the field of deliberative democracy. I had an internship at the Kettering Foundation when I was a college sophomore (when the foundation defined itself more purely in deliberative terms than it does today). By that time, I had already taken a philosophy seminar on the great deliberative theorist Jürgen Habermas. In the three decades since then, I’ve served on the boards of Kettering, Everyday Democracy, and AmericaSPEAKS. I wrote a book with “deliberative democracy” in its subtitle and co-edited The Deliberative Democracy Handbook with John Gastil. I was one of many co-founders of the Deliberative Democracy Consortium and have served on its steering committee since the last century.

None of these groups is committed to deliberation in a narrow sense (although opinions differ within the field). For me, these are the main limitations of focusing on deliberation as the central topic or unit of analysis:

Deliberative values are worthy ones, but they are not the only worthy ones. My own values would also include personal liberties and nonnegotiable rights, concerns for nature, and virtues of the inner life, such as equanimity and personal development. Stating my values doesn’t substitute for an argument, but it may suffice to make the point that deliberation is not the only good thing, and it’s in tension with other goods. A deliberative democrat will reply that I should discuss my values with other people. And so I should–but that doesn’t mean that the norms intrinsic to deliberation trump all other norms. Nor are fellow citizens the only sources of guidance; introspecting, reading ancient texts, consulting legal precedents, and conducting scientific experiments are helpful, too.

By the same token, deliberative virtues are not the only civic virtues. Deliberation is about discourse–talking and listening–so its virtues are discursive ones: humility and openness, empathy, sincerity, and perhaps eloquence. (The list is contested.) But a good citizen may be hard-working, physically courageous, or aesthetically creative instead of especially good at deliberating. The people who physically built the Athenian agora were as important as the people who exchanged ideas in it.

Deliberation depends on social organization. In order for people to have something that’s worth discussing, they must already make, control, or influence things of value together. That requires social organization, whether in the form of a market, a commons, a voluntary association, a functional network, or a political institution. Discussion rarely precedes these forms, because people can’t and won’t come together in completely amorphous groupings. Discussion is more typically a moment in an ongoing process of governance. Often a small group of founders chooses the rules-in-use that create a group in which deliberation can occur.

Thus we should ask about leadership and rules, not just about deliberation….(More)”.

Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor


Book by Virginia Eubanks: “The State of Indiana denies one million applications for healthcare, foodstamps and cash benefits in three years—because a new computer system interprets any mistake as “failure to cooperate.” In Los Angeles, an algorithm calculates the comparative vulnerability of tens of thousands of homeless people in order to prioritize them for an inadequate pool of housing resources. In Pittsburgh, a child welfare agency uses a statistical model to try to predict which children might be future victims of abuse or neglect.

Since the dawn of the digital age, decision-making in finance, employment, politics, health and human services has undergone revolutionary change. Today, automated systems—rather than humans—control which neighborhoods get policed, which families attain needed resources, and who is investigated for fraud. While we all live under this new regime of data, the most invasive and punitive systems are aimed at the poor.

In Automating Inequality, Virginia Eubanks systematically investigates the impacts of data mining, policy algorithms, and predictive risk models on poor and working-class people in America. The book is full of heart-wrenching and eye-opening stories, from a woman in Indiana whose benefits are literally cut off as she lays dying to a family in Pennsylvania in daily fear of losing their daughter because they fit a certain statistical profile.

The U.S. has always used its most cutting-edge science and technology to contain, investigate, discipline and punish the destitute. Like the county poorhouse and scientific charity before them, digital tracking and automated decision-making hide poverty from the middle-class public and give the nation the ethical distance it needs to make inhumane choices: which families get food and which starve, who has housing and who remains homeless, and which families are broken up by the state. In the process, they weaken democracy and betray our most cherished national values….(More)”.

“Crowdsourcing” ten years in: A review


Kerri Wazny at the Journal of Global Health: “First coined by Howe in 2006, the field of crowdsourcing has grown exponentially. Despite its growth and its transcendence across many fields, the definition of crowdsourcing has still not been agreed upon, and examples are poorly indexed in peer–reviewed literature. Many examples of crowdsourcing have not been scaled–up past the pilot phase.

In spite of this, crowdsourcing has great potential, especially in global health where resources are lacking. This narrative review seeks to review both indexed and grey crowdsourcing literature broadly in order to explore the current state of the field….(More)”.

Visualizing the Uncertainty in Data


Nathan Yau at FlowingData: “Data is a representation of real life. It’s an abstraction, and it’s impossible to encapsulate everything in a spreadsheet, which leads to uncertainty in the numbers.

How well does a sample represent a full population? How likely is it that a dataset represents the truth? How much do you trust the numbers?

Statistics is a game where you figure out these uncertainties and make estimated judgements based on your calculations. But standard errors, confidence intervals, and likelihoods often lose their visual space in data graphics, which leads to judgements based on simplified summaries expressed as means, medians, or extremes.

That’s no good. You miss out on the interesting stuff. The important stuff. So here are some visualization options for the uncertainties in your data, each with its pros, cons, and examples….(More)”.

AI System Sorts News Articles By Whether or Not They Contain Actual Information


Michael Byrne at Motherboard:”… in a larger sense it’s worth wondering to what degree the larger news feed is being diluted by news stories that are not “content dense.” That is, what’s the real ratio between signal and noise, objectively speaking? To start, we’d need a reasonably objective metric of content density and a reasonably objective mechanism for evaluating news stories in terms of that metric.

In a recent paper published in the Journal of Artificial Intelligence Research, computer scientists Ani Nenkova and Yinfei Yang, of Google and the University of Pennsylvania, respectively, describe a new machine learning approach to classifying written journalism according to a formalized idea of “content density.” With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles.

At a high level this works like most any other machine learning system. Start with a big batch of data—news articles, in this case—and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers….(More)”.

Universities must prepare for a technology-enabled future


 in the Conversation: “Automation and artificial intelligence technologies are transforming manufacturingcorporate work and the retail business, providing new opportunities for companies to explore and posing major threats to those that don’t adapt to the times. Equally daunting challenges confront colleges and universities, but they’ve been slower to acknowledge them.

At present, colleges and universities are most worried about competition from schools or training systems using online learning technology. But that is just one aspect of the technological changes already under way. For example, some companies are moving toward requiring workers have specific skills trainings and certifications – as opposed to college degrees.

As a professor who researches artificial intelligence and offers distance learning courses, I can say that online education is a disruptive challenge for which colleges are ill-prepared. Lack of student demand is already closing 800 out of roughly 10,000 engineering colleges in India. And online learning has put as many as half the colleges and universities in the U.S. at risk of shutting down in the next couple decades as remote students get comparable educations over the internet – without living on campus or taking classes in person. Unless universities move quickly to transform themselves into educational institutions for a technology-assisted future, they risk becoming obsolete….(More)”

Letters From Congress


From-Congress is an attempt to collect letters sent by representatives to their constituents. These letters often contain statements by the rep about positions that might otherwise be difficult to discover.

This project exists to increase the amount of transparency and accountability of representatives in their districts….

If you would like to send a letter to your congress person, I would heavily recommend resistbot. If you receive a reply, please consider uploading the reply here.

If in the past year you’ve received a correspondence from your congress person, I would encourage you to upload them as well.

In the future, we would like to transcribe these letters (hopefully automatically) and put the text in each article. Along with providing accessibility for visually-impaired readers, this will also allow searching of politicians’ view points….(More)”

See also: Project Legisletters