Metadata Liberation Movement


Holman Jenkins in the Wall Street Journal: “The biggest problem, then, with metadata surveillance may simply be that the wrong agencies are in charge of it. One particular reason why this matters is that the potential of metadata surveillance might actually be quite large but is being squandered by secret agencies whose narrow interest is only looking for terrorists….
“Big data” is only as good as the algorithms used to find out things worth finding out. The efficacy and refinement of big-data techniques are advanced by repetition, by giving more chances to find something worth knowing. Bringing metadata out of its black box wouldn’t only be a way to improve public trust in what government is doing. It would be a way to get more real value for society out of techniques that are being squandered on a fairly minor threat.
Bringing metadata out of the black box would open up new worlds of possibility—from anticipating traffic jams to locating missing persons after a disaster. It would also create an opportunity to make big data more consistent with the constitutional prohibition of unwarranted search and seizure. In the first instance, with the computer withholding identifying details of the individuals involved, any red flag could be examined by a law-enforcement officer to see, based on accumulated experience, whether the indication is of interest.
If so, a warrant could be obtained to expose the identities involved. If not, the record could immediately be expunged. All this could take place in a reasonably aboveboard, legal fashion, open to inspection in court when and if charges are brought or—this would be a good idea—a court is informed of investigations that led to no action.
Our guess is that big data techniques would pop up way too many false positives at first, and only considerable learning and practice would allow such techniques to become a useful tool. At the same time, bringing metadata surveillance out of the shadows would help the Googles, Verizons and Facebooks defend themselves from a wholly unwarranted suspicion that user privacy is somehow better protected by French or British or (heavens) Chinese companies from their own governments than U.S. data is from the U.S. government.
Most of all, it would allow these techniques to be put to work on solving problems that are actual problems for most Americans, which terrorism isn’t.”

Predictive Policing: Don’t even think about it


The Economist: “PredPol is one of a range of tools using better data, more finely crunched, to predict crime. They seem to promise better law-enforcement. But they also bring worries about privacy, and of justice systems run by machines not people.
Criminal offences, like infectious disease, form patterns in time and space. A burglary in a placid neighbourhood represents a heightened risk to surrounding properties; the threat shrinks swiftly if no further offences take place. These patterns have spawned a handful of predictive products which seem to offer real insight. During a four-month trial in Kent, 8.5% of all street crime occurred within PredPol’s pink boxes, with plenty more next door to them; predictions from police analysts scored only 5%. An earlier trial in Los Angeles saw the machine score 6% compared with human analysts’ 3%.
Intelligent policing can convert these modest gains into significant reductions in crime…
Predicting and forestalling crime does not solve its root causes. Positioning police in hotspots discourages opportunistic wrongdoing, but may encourage other criminals to move to less likely areas. And while data-crunching may make it easier to identify high-risk offenders—about half of American states use some form of statistical analysis to decide when to parole prisoners—there is little that it can do to change their motivation.
Misuse and overuse of data can amplify biases…But mathematical models might make policing more equitable by curbing prejudice.”

9 models to scale open data – past, present and future


Open Knowledge Foundation Blog: “The possibilities of open data have been enthralling us for 10 years…But that excitement isn’t what matters in the end. What matters is scale – which organisational structures will make this movement explode?  This post quickly and provocatively goes through some that haven’t worked (yet!) and some that have.
Ones that are working now
1) Form a community to enter in new data. Open Street Map and MusicBrainz are two big examples. It works as the community is the originator of the data. That said, neither has dominated its industry as much as I thought they would have by now.
2) Sell tools to an upstream generator of open data. This is what CKAN does for central Governments (and the new ScraperWiki CKAN tool helps with). It’s what mySociety does, when selling FixMyStreet installs to local councils, thereby publishing their potholes as RSS feeds.
3) Use open data (quietly). Every organisation does this and never talks about it. It’s key to quite old data resellers like Bloomberg. It is what most of ScraperWiki’s professional services customers ask us to do. The value to society is enormous and invisible. The big flaw is that it doesn’t help scale supply of open data.
4) Sell tools to downstream users. This isn’t necessarily open data specific – existing software like spreadsheets and Business Intelligence can be used with open or closed data. Lots of open data is on the web, so tools like the new ScraperWiki which work well with web data are particularly suited to it.
Ones that haven’t worked
5) Collaborative curation ScraperWiki started as an audacious attempt to create an open data curation community, based on editing scraping code in a wiki. In its original form (now called ScraperWiki Classic) this didn’t scale. …With a few exceptions, notably OpenCorporates, there aren’t yet open data curation projects.
6) General purpose data marketplaces, particularly ones that are mainly reusing open data, haven’t taken off. They might do one day, however I think they need well-adopted higher level standards for data formatting and syncing first (perhaps something like dat, perhaps something based on CSV files).
Ones I expect more of in the future
These are quite exciting models which I expect to see a lot more of.
7) Give labour/money to upstream to help them create better data. This is quite new. The only, and most excellent, example of it is the UK’s National Archive curating the Statute Law Database. They do the work with the help of staff seconded from commercial legal publishers and other parts of Government.
It’s clever because it generates money for upstream, which people trust the most, and which has the most ability to improve data quality.
8) Viral open data licensing. MySQL made lots of money this way, offering proprietary dual licenses of GPLd software to embedded systems makers. In data this could use OKFN’s Open Database License, and organisations would pay when they wanted to mix the open data with their own closed data. I don’t know anyone actively using it, although Chris Taggart from OpenCorporates mentioned this model to me years ago.
9) Corporations release data for strategic advantage. Companies are starting to release their own data for strategic gain. This is very new. Expect more of it.”

BaltimoreCode.org


Press Release: “The City of Baltimore’s Chief Technology Officer Chris Tonjes and the non-partisan, non-profit OpenGov Foundation announced today the launch of BaltimoreCode.org, a free software platform that empowers all Baltimore residents to discover, access, and use local laws when they want, and how they want.

BaltimoreCode.org lifts and ‘liberates’ the Baltimore City Charter and Code from unalterable, often hard to find online files —such as PDFs—by inserting them into user-friendly, organized and modern website formats.  This straightforward switch delivers significant results:  more clarity, context, and public understanding of the laws’ impact on Baltimore citizens’ daily lives. For the first-time, BaltimoreCode.org allows  uninhibited reuse of City law data by everyday Baltimore residents to use, share, and spread as they see fit. Simply, BaltimoreCode.org gives citizens the information they need, on their terms.”

The Republic of Choosing


William H. Simon in the Boston Review: “Cass Sunstein went to Washington with the aim of putting some theory into practice. As administrator of the Office of Information and Regulatory Affairs (OIRA) during President Obama’s first term, he drew on the behavioral economics he helped develop as an academic. In his new book, Simpler, he reports on these efforts and elaborates a larger vision in which they exemplify “the future of government.”
Simpler reports some notable achievements, but it exaggerates the practical value of the behaviorist toolkit. The Obama administration’s most important policy initiatives make only minor use of it. Despite its upbeat tone, the book implies an oddly constrained conception of the means and ends of government. It sometimes calls to mind a doctor putting on a cheerful face to say that, while there is little he can do to arrest the disease, he will try to make the patient as comfortable as possible.
…The obverse of Sunstein’s preoccupation with choice architecture is his relative indifference to other approaches to making administration less rigid. Recall that among the problems Sunstein sees with conventional regulation are, first, that it mandates conduct in situations where the regulator doesn’t know with confidence what is the right thing to do, and second, that it is insufficiently sensitive to relevant local variations in taste or circumstances.
The most common way to deal with the first problem—insufficient information—is to build learning into the process of intervention: the regulator intervenes provisionally, studies the effects of her intervention, and adapts as she learns. It is commonplace for statutes to mandate or fund demonstration or pilot projects. More importantly, statutes often demand that both top administrators and frontline workers reassess and adjust their practices continuously. This approach is the central and explicit thrust of Race to the Top’s “instructional improvement systems,” and it recurs prominently in all the statutes mentioned so far.”

Capitol Words


CaptureAbout Capitol Words: “For every day Congress is in session, Capitol Words visualizes the most frequently used words in the Congressional Record, giving you an at-a-glance view of which issues lawmakers address on a daily, weekly, monthly and yearly basis. Capitol Words lets you see what are the most popular words spoken by lawmakers on the House and Senate floor.

Methodology

The contents of the Congressional Record are downloaded daily from the website of the Government Printing Office. The GPO distributes the Congressional Record in ZIP files containing the contents of the record in plain-text format.

Each text file is parsed and turned into an XML document, with things like the title and speaker marked up. The contents of each file are then split up into words and phrases — from one word to five.

The resulting data is saved to a search engine. Capitol Words has data from 1996 to the present.”

PCORI seeks the wisdom of crowds


Modern Healthcare: “The Patient-Centered Outcomes Research Institute is trying to live up to the first two words in its name. A team of researchers from the University of Michigan at Ann Arbor has been tapped by PCORI to scale up their prototype of a Web-based crowd-sourcing platform called WellSpringboard, which is designed to enable patients to propose ideas and pledge funds for clinical research.
Washington-based PCORI, an independent not-for-profit group established by the healthcare reform law, recently awarded the Michigan researchers $40,000, the top prize from its PCORI Challenge, a competition seeking novel approaches to connecting researchers with interested patients….
The platform works like this: A person has an idea for a research project and records a video explaining what that idea is. WellSpringboard posts the video on its site, sets a goal for funding and then spreads the word about the project using social media outlets like Facebook and Twitter. Once the funding target is reached, the project is opened up to researchers, who post their profiles to the site and whose applications are reviewed by a board of scientists and members of the public.”

It’s Time to Rewrite the Internet to Give Us Better Privacy, and Security


Larry Lessig in The Daily Beast: “Almost 15 years ago, as I was just finishing a book about the relationship between the Net (we called it “cyberspace” then) and civil liberties, a few ideas seemed so obvious as to be banal: First, life would move to the Net. Second, the Net would change as it did so. Gone would be simple privacy, the relatively anonymous default infrastructure for unmonitored communication; in its place would be a perpetually monitored, perfectly traceable system supporting both commerce and the government. That, at least, was the future that then seemed most likely, as business raced to make commerce possible and government scrambled to protect us (or our kids) from pornographers, and then pirates, and now terrorists.

But another future was also possible, and this was my third, and only important point: Recognizing these obvious trends, we just might get smart about how code (my shorthand for the technology of the Internet) regulates us, and just possibly might begin thinking smartly about how we could embed in that code the protections that the Constitution guarantees us. Because—and here was the punchline, the single slogan that all 724 people who read that book remember—code is law. And if code is law, then we need to be as smart about how code regulates us as we are about how the law does so….
But what astonishes me is that today, more than a decade into the 21st century, the world has remained mostly oblivious to these obvious points about the relationship between law and code….
the fact is that there is technology that could be deployed that would give many the confidence that none of us now have. “Trust us” does not compute. But trust and verify, with high-quality encryption, could. And there are companies, such as Palantir, developing technologies that could give us, and more importantly, reviewing courts, a very high level of confidence that data collected or surveilled was not collected or used in an improper way. Think of it as a massive audit log, recording how and who used what data for what purpose. We could code the Net in a string of obvious ways to give us even better privacy, while also enabling better security.

Filling Power Vacuums in the New Global Legal Order


Paper by Anne-Marie Slaughter in the latest issue of Boston College Law Review: “In her Keynote Address at the October, 12, 2012 Symposium, Filling Power Vacuums in the New Global Legal Order, Anne-Marie  Slaughter describes the concepts of “power over” and “power with” in the global world of law. Power over is the ability to achieve the outcomes you want by commanding or manipulating others. Power with is the ability to mobilize people to do things. In the globalized world, power operates much more through power with than  through power over. In contrast to the hierarchical power of national governments, globally it is more important to be central in the  horizontal system of multiple sovereigns. This Address illustrates different examples of power over and power with. It concludes that in this globalized world, lawyers are ideally trained and positioned to exercise power.”

If My Data Is an Open Book, Why Can’t I Read It?


Natasha Singer in the New York Times: “Never mind all the hoopla about the presumed benefits of an “open data” society. In our day-to-day lives, many of us are being kept in the data dark.

“The fact that I am producing data and companies are collecting it to monetize it, if I can’t get a copy myself, I do consider it unfair,” says Latanya Sweeney, the director of the Data Privacy Lab at Harvard, where she is a professor of government and technology….

In fact, a few companies are challenging the norm of corporate data hoarding by actually sharing some information with the customers who generate it — and offering tools to put it to use. It’s a small but provocative trend in the United States, where only a handful of industries, like health care and credit, are required by federal law to provide people with access to their records.

Last year, San Diego Gas and Electric, a utility, introduced an online energy management program in which customers can view their electricity use in monthly, daily or hourly increments. There is even a practical benefit: customers can earn credits by reducing energy consumption during peak hours….