Open Data Literature Review


Review by Emmie Tran and Ginny Scholtes: “Open data describes large datasets that governments at all levels release online and free of charge for analysis by anyone for any purpose. Entrepreneurs may use open data to create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from open data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. But data held by the government is implicitly or explicitly about individuals. While open government is often presented as an unqualified good, sometimes open data can identify individuals or groups, leading to invasions of privacy and disparate impact on vulnerable populations.

This review provides background to parties interested in open data, specifically for those attending the 19th Annual BCLT/BTLJ Symposium on open data. Part I defines open data, focusing on the origins of the open data movement and the types of data subject to government retention and public access. Part II discusses how open data can benefit society, and Part III delves into the many challenges and dangers of open data. Part IV addresses these challenges, looking at how the United States and other countries have implemented open data regimes, and considering some of the proposed measures to mitigate the dangers of open data….(More)”

The End of Asymmetric Information


Essay by Alex Tabarrok and Tyler Cowen: Might the age of asymmetric information – for better or worse – be over?  Market institutions are rapidly evolving to a situation where very often the buyer and the seller have roughly equal knowledge. Technological developments are giving everyone who wants it access to the very best information when it comes to product quality, worker performance, matches to friends and partners, and the nature of financial transactions, among many other areas.

These developments will have implications for how markets work, how much consumers benefit, and also economic policy and the law. As we will see, there may be some problematic sides to these new arrangements, specifically when it comes to privacy. Still, a large amount of economic regulation seems directed at a set of problems which, in large part, no longer exist…

Many “public choice” problems are really problems of asymmetric information. In William Niskanen’s (1974) model of bureaucracy, government workers usually benefit from larger bureaus, and they are able to expand their bureaus to inefficient size because they are the primary providers of information to politicians. Some bureaus, such as the NSA and the CIA, may still be able to use secrecy to benefit from information asymmetry. For instance they can claim to politicians that they need more resources to deter or prevent threats, and it is hard for the politicians to have well-informed responses on the other side of the argument. Timely, rich information about most other bureaucracies, however, is easily available to politicians and increasingly to the public as well. As information becomes more symmetric, Niskanen’s (1974) model becomes less applicable, and this may help check the growth of unneeded bureaucracy.

Cheap sensors are greatly extending how much information can be economically gathered and analyzed. It’s not uncommon for office workers to have every key stroke logged. When calling customer service, who has not been told “this call may be monitored for quality control purposes?” Service-call workers have their location tracked through cell phones. Even information that once was thought to be purely subjective can now be collected and analyzed, often with the aid of smart software or artificial intelligence. One firm, for example, uses badges equipped with microphones, accelerometers, and location sensors to measure tone of voice, posture, and body language, as well as who spoke to whom and for how long (Lohr 2014). The purpose is not only to monitor workers but to deduce when, where and why workers are the most productive. We are again seeing trade-offs which bring greater productivity, and limit asymmetric information, albeit at the expense of some privacy.

As information becomes more prevalent and symmetric, earlier solutions to asymmetric problems will become less necessary. When employers do not easily observe workers, for example, employers may pay workers unusually high wages, generating a rent. Workers will then work at high levels despite infrequent employer observation, to maintain their future rents (Shapiro and Stiglitz 1984). But those higher wages involved a cost, namely that fewer workers were hired, and the hires that were made often were directed to people who were already known to the firm. Better monitoring of workers will mean that employers will hire more people and furthermore they may be more willing to take chances on risky outsiders, rather than those applicants who come with impeccable pedigree. If the outsider does not work out and produce at an acceptable level, it is easy enough to figure this out and fire them later on….(More)”

The Healing Power of Your Own Medical Data


in the New York Times: “Steven Keating’s doctors and medical experts view him as a citizen of the future.

A scan of his brain eight years ago revealed a slight abnormality — nothing to worry about, he was told, but worth monitoring. And monitor he did, reading and studying about brain structure, function and wayward cells, and obtaining a follow-up scan in 2010, which showed no trouble.

But he knew from his research that his abnormality was near the brain’s olfactory center. So when he started smelling whiffs of vinegar last summer, he suspected they might be “smell seizures.”

He pushed doctors to conduct an M.R.I., and three weeks later, surgeons in Boston removed a cancerous tumor the size of a tennis ball from his brain.

At every stage, Mr. Keating, a 26-year-old doctoral student at the Massachusetts Institute of Technology’s Media Lab, has pushed and prodded to get his medical information, collecting an estimated 70 gigabytes of his own patient data by now. His case points to what medical experts say could be gained if patients had full and easier access to their medical information. Better-informed patients, they say, are more likely to take better care of themselves, comply with prescription drug regimens and even detect early-warning signals of illness, as Mr. Keating did.

“Today he is a big exception, but he is also a glimpse of what people will want: more and more information,” said Dr. David W. Bates, chief innovation officer at Brigham and Women’s Hospital.

Some of the most advanced medical centers are starting to make medical information more available to patients. Brigham and Women’s, where Mr. Keating had his surgery, is part of the Partners HealthCare Group, which now has 500,000 patients with web access to some of the information in their health records including conditions, medications and test results.

Other medical groups are beginning to allow patients online access to the notes taken by physicians about them, in an initiative called OpenNotes. In a yearlong evaluation project at medical groups in three states, more than two-thirds of the patients reported having a better understanding of their health and medical conditions, adopting healthier habits and taking their medications as prescribed more regularly.

The medical groups with OpenNotes programs include Beth Israel Deaconess Medical Center in Boston, Geisinger Health System in Pennsylvania, Harborview Medical Center in Seattle, the Mayo Clinic, the Cleveland Clinic and the Veterans Affairs department. By now, nearly five million patients in America have been given online access to their notes.

As an articulate young scientist who had studied his condition, Mr. Keating had a big advantage over most patients in obtaining his data. He knew what information to request, spoke the language of medicine and did not need help. The information he collected includes the video of his 10-hour surgery, dozens of medical images, genetic sequencing data and 300 pages of clinical documents. Much of it is on his website, and he has made his medical data available for research….

Opening data to patients raises questions. Will worried patients inundate physicians with time-consuming questions? Will sharing patient data add to legal risks? One detail in the yearlong study of OpenNotes underlines doctors’ concerns; 105 primary physicians completed the study, but 143 declined to participate.

Still, the experience of the doctors in the evaluation seemed reassuring. Only 3 percent said they spent more time answering patient questions outside of visits. Yet knowing that patients could read the notes, one-fifth of the physicians said they changed the way they wrote about certain conditions, like substance abuse and obesity.

Evidence of the benefit to individuals from sharing information rests mainly on a few studies so far. For example, 55 percent of the members of the epilepsy community on PatientsLikeMe, a patient network, reported that sharing information and experiences with others helped them learn about seizures, and 27 percent said it helped them be more adherent to their medications.

Mr. Keating has no doubts. “Data can heal,” he said. “There is a huge healing power to patients understanding and seeing the effects of treatments and medications.”

Health information, by its very nature, is personal. So even when names and other identifiers are stripped off, sharing personal health data more freely with patients, health care providers and researchers raises thorny privacy issues.

Mr. Keating says he is a strong believer in privacy, but he personally believes that the benefits outweigh the risks — and whether to share data or not should be an individual’s choice and an individual responsibility.

Not everyone, surely, would be as comfortable as Mr. Keating is sharing all his medical information. But he says he believes that people will increasingly want access to their medical data and will share it, especially younger people reared on social networks and smartphones.

“This is what the next generation, which lives on data, is going to want,” Mr. Keating said….(More)”

Sensor Law


Paper by Sandra Braman: For over two decades, information policy-making for human society has been increasingly supplemented, supplanted, and/or superceded by machinic decision-making; over three decades since legal decision-making has been explicitly put in place to serve machinic rather than social systems; and over four decades since designers of the Internet took the position that they were serving non-human (machinic, or daemon) users in addition to humans. As the “Internet of Things” becomes more and more of a reality, these developments increasingly shape the nature of governance itself. This paper’s discussion of contemporary trends in these diverse modes of human-computer interaction at the system level — interactions between social systems and technological systems — introduces the changing nature of the law as a sociotechnical problem in itself. In such an environment, technological innovations are often also legal innovations, and legal developments require socio-technical analysis as well as social, legal, political, and cultural approaches.

Examples of areas in which sensors are already receiving legal attention are rife. A non-comprehensive listing includes privacy concerns beginning but not ending with those raised by sensors embedded in phones and geolocation devices, which are the most widely discussed and those of which the public is most aware. Sensor issues arise in environmental law, health law, marine law, intellectual property law, and as they are raised by new technologies in use for national security purposes that include those confidence- and security-building measures intended for peacekeeping. They are raised by liability issues for objects that range from cars to ovens. And sensor issues are at the core of concerns about “telemetric policing,” as that is coming into use not only in North America and Europe, but in societies such as that of Brazil as well.

Sensors are involved in every stage of legal processes, from identification of persons of interest to determination of judgments and consequences of judgments. Their use significantly alters the historically-developed distinction among types of decision-making meant to come into use at different stages of the process, raising new questions about when, and how, human decision-making needs to dominate and when, and how, technological innovation might need to be shaped by the needs of social rather than human systems.

This paper will focus on the legal dimensions of sensors used in ubiquitous embedded computing….(More)”

Why Google’s Waze Is Trading User Data With Local Governments


Parmy Olson at Forbes: “In Rio de Janeiro most eyes are on the final, nail-biting matches of the World Cup. Over in the command center of the city’s department of transport though, they’re on a different set of screens altogether.

Planners there are watching the aggregated data feeds of thousands of smartphones being walked or driven around a city, thanks to two popular travel apps, Waze and Moovit.

The goal is traffic management, and it involves swapping data for data. More cities are lining up to get access, and while the data the apps are sharing is all anonymous for now, identifying details could get more specific if cities like what they see, and people become more comfortable with being monitored through their smartphones in return for incentives.

Rio is the first city in the world to collect real-time data both from drivers who use the Waze navigation app and pedestrians who use the public-transportation app Moovit, giving it an unprecedented view on thousands of moving points across the sprawling city. Rio is also talking to the popular cycling app Strava to start monitoring how cyclists are moving around the city too.

All three apps are popular, consumer services which, in the last few months, have found a new way to make their crowdsourced data useful to someone other than advertisers. While consumers use Waze and Moovit to get around, both companies are flipping the use case and turning those millions of users into a network of sensors that municipalities can tap into for a better view on traffic and hazards. Local governments can also use these apps as a channel to send alerts.

On an average day in June, Rio’s transport planners could get an aggregated view of 110,000 drivers (half a million over the course of the month), and see nearly 60,000 incidents being reported each day – everything from built-up traffic, to hazards on the road, Waze says. Till now they’ve been relying on road cameras and other basic transport-department information.

What may be especially tantalizing for planners is the super-accurate read Waze gets on exactly where drivers are going, by pinging their phones’ GPS once every second. The app can tell how fast a driver is moving and even get a complete record of their driving history, according to Waze spokesperson Julie Mossler. (UPDATE: Since this story was first published Waze has asked to clarify that it separates users’ names and their 30-day driving info. The driving history is categorized under an alias.)

This passively-tracked GPS data “is not something we share,” she adds. Waze, which Google bought last year for $1.3 billion, can turn the data spigots on and off through its application programing interface (API).

Waze has been sharing user data with Rio since summer 2013 and it just signed up the State of Florida. It says more departments of transport are in the pipeline.

But none of these partnerships are making Waze any money. The app’s currency of choice is data. “It’s a two-way street,” says Mossler. “Literally.”

In return for its user updates, Waze gets real-time information from Rio on highways, from road sensors and even from cameras, while Florida will give the app data on construction projects or city events.

Florida’s department of transport could not be reached for comment, but one of its spokesmen recently told a local news station: “We’re going to share our information, our camera images, all of our information that comes from the sensors on the roadway, and Waze is going to share its data with us.”…

To get Moovit’s data, municipalities download a web interface that gives them an aggregated view of where pedestrians using Moovit are going. In return, the city feeds Moovit’s database with a stream of real-time GPS data for buses and trains, and can issue transport alerts to Moovit’s users. Erez notes the cities aren’t allowed to make “any sort of commercial approach to the users.”

Erez may be saving that for advertisers, an avenue he says he’s still exploring. For now getting data from cities is the bigger priority. It gives Moovit “a competitive advantage,” he says.

Cycling app Strava also recently started sharing its real-time user data as part of a paid-for service called Strava Metro.

Municipalities pay 80 cents a year for every Strava member being tracked. Metro only launched in May, but it already counts the state of Oregon; London, UK; Glasgow, Scotland; Queensland, Austalia and Evanston, Illinois as customers.
….
Privacy advocates will naturally want to keep a wary eye on what data is being fed to cities, and that it doesn’t leak or get somehow misused by City Hall. The data-sharing might not be ubiquitous enough for that to be a problem yet, and it should be noted that any kind of deal making with the public sector can get wrapped up in bureaucracy and take years to get off the ground.

For now Waze says it’s acting for the public good….(More)

Methods to Protect and Secure “Big Data” May Be Unknowingly Corrupting Research


New paper by John M. Abowd and Ian M. Schmutte: “…As the government and private companies increase the amount of data made available for public use (e.g. Census data, employment surveys, medical data), efforts to protect privacy and confidentiality (through statistical disclosure limitation or SDL) can often cause misleading and compromising effects on economic research and analysis, particularly in cases where data properties are unclear for the end-user.

Data swapping is a particularly insidious method of SDL and is frequently used by important data aggregators like the Census Bureau, the National Center for Health Statistics and others, which interferes with the results of empirical analysis in ways that few economists and other social scientists are aware of.

To encourage more transparency, the authors call for both government statistical agencies as well as the private sector (Amazon, Google, Microsoft, Netfix, Yahoo!, etc.) to release more information about parameters used in SDL methods, and insist that journals and editors publishing such research require documentation of the author’s entire methodological process….(More)

VIDEO:

Turning Government Data into Better Public Service


OMB Blog: “Every day, millions of people use their laptops, phones, and tablets to check the status of their tax refund, get the latest forecast from the National Weather Service, book a campsite at one of our national parks, and much more. There were more than 1.3 billion visits to websites across the Federal Government in just the past 90 days.

Today, during Sunshine Week when we celebrate openness and transparency in government, we are pleased to release the Digital Analytics Dashboard, a new window into the way people access the government online. For the first time, you can see how many people are using a Federal Government website, which pages are most popular, and which devices, browsers, and operating systems people are using. We’ll use the data from the Digital Analytics Program to focus our digital service teams on the services that matter most to the American people, and analyze how much progress we are making. The Dashboard will help government agencies understand how people find, access, and use government services online to better serve the public – all while protecting privacy.  The program does not track individuals. It anonymizes the IP addresses of all visitors and then uses the resulting information in the aggregate….(More)

 

Big Data Is an Economic Justice Issue, Not Just a Privacy Problem


in the Huffington Post: “The control of personal data by “big data” companies is not just an issue of privacy but is becoming a critical issue of economic justice, argues a new report issued by the organization Data Justice>, which itself is being publicly launched in conjunction with the report. ..

At the same time, big data is fueling economic concentration across our economy. As a handful of data platforms generate massive amounts of user data, the barriers to entry rise, since potential competitors have little data themselves to entice advertisers compared with the incumbents, who have both the concentrated processing power and the supply of user data to dominate particular sectors. With little competition, companies end up with little incentive to either protect user privacy or share the economic value of that user data with the consumers generating those profits.

The report argues for a threefold approach to making big data work for everyone in the economy, not just for the big data platforms’ shareholders:

  • First, regulators need to strengthen user control of their own data by both requiring explicit consent for all uses of the data and better informing users of how it’s being used and how companies profit from that data.
  • Second, regulators need to factor control of data into merger review, and to initiate antitrust actions against companies like Google where monopoly control of a sector like search advertising has been established.
  • Third, policymakers should restrict practices that harm consumers, including banning price discrimination where consumers are not informed of all discount options available and bringing the participation of big data platforms in marketing financial services under the regulation of the Consumer Financial Protection Bureau.

Data Justice itself has been founded as an organization “to promote public education and new alliances to challenge the danger of big data to workers, consumers and the public.” It will work to educate the public, policymakers and organizational allies on how big data is contributing to economic inequality in the economy. Its new website at datajustice.org is intended to bring together a wide range of resources highlighting the economic justice aspects of big data.”

Netpolitik: What the Emergence of Networks Means for Diplomacy and Statecraft


Charlie Firestone and Leshuo Dong at the Aspen Journal of Ideas: “…The network is emerging as a dominant form of organization for our age of complexity. This is supported by technological and economic trends. Furthermore, enemies are networks, players are networks, even governments are becoming networks. It makes sense to understand network principles and apply them for use in the world of diplomacy. Accordingly, governments, organizations and individuals should heed these recommendations:

  • Understand and apply two-way communications and network principles to all forms of diplomacy with the aim of earning the sympathy, empathy and where applicable, the loyalty of future generations. This is a mindset shift for governments, diplomats and citizens around the world.
  • This means engaging the world’s populations to communicate with each other. That will entail physical connections to the global common medium, an ability to have what you send be received by others in the form you send it, end to end, and literacy in the communications methods of the day. The world’s population should have a meaningful right to connect.
  • Of course, if there is to be a global communications network, it needs to be safe, so governments remain in the role of protector of the environment needed for users to trust in their networks. States have a role to protect against cyberwar, cybercrimes, and loss of a person’s identity, i.e., security and privacy online. But these protections cannot be a screen for illegitimate governmental controls over or unwarranted surveillance of its citizens. Nor can governments be expected to shoulder that burden alone. Everyone will need to practice a basic level of Net hygiene and literacy as an element of their digital citizenship.

As networks proliferate, principles of netpolitik will emerge. Governments, businesses, non-governmental organizations, and every citizen would be well advised to be thinking in these terms in the years ahead….(More).”

Data-Driven Development Pathways for Progress


Report from the World Economic Forum: “Data is the lifeblood of sustainable development and holds tremendous potential for transformative positive change particularly for lower- and middle-income countries. Yet despite the promise of a “Data Revolution”, progress is not a certainty. Lack of clarity on privacy and ethical issues, asymmetric power dynamics and an array of entangled societal and commercial risks threaten to hinder progress.
Written by the World Economic Forum Global Agenda Council on Data-Driven Development, this report serves to clarify how big data can be leveraged to address the challenges of sustainable development. Providing a blueprint for balancing competing tensions, areas of focus include: addressing the data deficit of the Global South, establishing resilient governance and strengthening capacities at the community and individual level. (PDF)”