As Data Overflows Online, Researchers Grapple With Ethics


at The New York Times: “Scholars are exhilarated by the prospect of tapping into the vast troves of personal data collected by Facebook, Google, Amazon and a host of start-ups, which they say could transform social science research.

Once forced to conduct painstaking personal interviews with subjects, scientists can now sit at a screen and instantly play with the digital experiences of millions of Internet users. It is the frontier of social science — experiments on people who may never even know they are subjects of study, let alone explicitly consent.

“This is a new era,” said Jeffrey T. Hancock, a Cornell University professor of communication and information science. “I liken it a little bit to when chemistry got the microscope.”

But the new era has brought some controversy with it. Professor Hancock was a co-author of the Facebook study in which the social network quietly manipulated the news feeds of nearly 700,000 people to learn how the changes affected their emotions. When the research was published in June, the outrage was immediate…

Such testing raises fundamental questions. What types of experiments are so intrusive that they need prior consent or prompt disclosure after the fact? How do companies make sure that customers have a clear understanding of how their personal information might be used? Who even decides what the rules should be?

Existing federal rules governing research on human subjects, intended for medical research, generally require consent from those studied unless the potential for harm is minimal. But many social science scholars say the federal rules never contemplated large-scale research on Internet users and provide inadequate guidance for it.

For Internet projects conducted by university researchers, institutional review boards can be helpful in vetting projects. However, corporate researchers like those at Facebook don’t face such formal reviews.

Sinan Aral, a professor at the Massachusetts Institute of Technology’s Sloan School of Management who has conducted large-scale social experiments with several tech companies, said any new rules must be carefully formulated.

“We need to understand how to think about these rules without chilling the research that has the promise of moving us miles and miles ahead of where we are today in understanding human populations,” he said. Professor Aral is planning a panel discussion on ethics at a M.I.T. conference on digital experimentation in October. (The professor also does some data analysis for The New York Times Company.)

Mary L. Gray, a senior researcher at Microsoft Research and associate professor at Indiana University’s Media School, who has worked extensively on ethics in social science, said that too often, researchers conducting digital experiments work in isolation with little outside guidance.

She and others at Microsoft Research spent the last two years setting up an ethics advisory committee and training program for researchers in the company’s labs who are working with human subjects. She is now working with Professor Hancock to bring such thinking to the broader research world.

“If everyone knew the right thing to do, we would never have anyone hurt,” she said. “We really don’t have a place where we can have these conversations.”…

An Infographic That Maps 2,000 Years of Cultural History in 5 Minutes


in Wired:  “…Last week in the journal Science, the researchers (led by University of Texas art historian Maximilian Schich) published a study that looked at the cultural history of Europe and North America by mapping the birth and deaths of more than 150,000 notable figures—including everyone from Leonardo Da Vinci to Ernest Hemingway. That data was turned into an amazing animated infographic that looks strikingly similar to the illustrated flight paths you find in the back of your inflight magazine. Blue dots indicate a birth, red ones means death.

The researchers used data from Freebase, which touts itself as a “community curated database of people, places and things.” This gives the data a strong western-bent. You’ll notice that many parts of Asia and the Middle East (not to mention pre-colonized North America), are almost wholly ignored in this video. But to be fair, the abstract did acknowledge that the study was focused mainly on Europe and North America.
Still, mapping the geography of cultural migration does gives you some insight about how the kind of culture we value has shifted over the centuries. It’s also a novel lens through which to view our more general history, as those migration trends likely illuminate bigger historical happenings like wars and the building of cross-country infrastructure.

Collective Genius


Linda A. Hill, Greg Brandeau, Emily Truelove, and Kent Lineback in HBR Review: “Google’s astonishing success in its first decade now seems to have been almost inevitable. But step inside its systems infrastructure group, and you quickly learn otherwise. The company’s meteoric growth depended in large part on its ability to innovate and scale up its infrastructure at an unprecedented pace. Bill Coughran, as a senior vice president of engineering, led the group from 2003 to 2011. His 1,000-person organization built Google’s “engine room,” the systems and equipment that allow us all to use Google and its many services 24/7. “We were doing work that no one else in the world was doing,” he says. “So when a problem happened, we couldn’t just go out and buy a solution. We had to create it.”
Coughran joined Google in 2003, just five years after its founding. By then it had already reinvented the way it handled web search and data storage multiple times. His group was using Google File System (GFS) to store the massive amount of data required to support Google searches. Given Google’s ferocious appetite for growth, Coughran knew that GFS—once a groundbreaking innovation—would have to be replaced within a couple of years. The number of searches was growing dramatically, and Google was adding Gmail and other applications that needed not just more storage but storage of a kind different from what GFS had been optimized to handle.
Building the next-generation system—and the next one, and the one after that—was the job of the systems infrastructure group. It had to create the new engine room, in-house, while simultaneously refining the current one. Because this was Coughran’s top priority—and given that he had led the storied Bell Labs and had a PhD in computer science from Stanford and degrees in mathematics from Caltech—one might expect that he would first focus on developing a technical solution for Google’s storage problems and then lead his group through its implementation.
But that’s not how Coughran proceeded. To him, there was a bigger problem, a perennial challenge that many leaders inevitably come to contemplate: How do I build an organization capable of innovating continually over time? Coughran knew that the role of a leader of innovation is not to set a vision and motivate others to follow it. It’s to create a community that is willing and able to generate new ideas…”

In Tests, Scientists Try to Change Behaviors


Wall Street Journal: “Behavioral scientists look for environmental ‘nudges’ to influence how people act. Pelle Guldborg Hansen, a behavioral scientist, is trying to figure out how to board passengers on a plane with less fuss.
The goal is to make plane-boarding more efficient by coaxing passengers to want to be more orderly, not by telling them they must. It is one of many projects in which Dr. Hansen seeks to encourage people, when faced with options, to make better choices. Among these: prompting people to properly dispose of cigarette butts outside of bars and clubs and inducing hospital workers to use hand sanitizers.
Dr. Hansen, 37 years old, is director of the Initiative for Science, Society & Policy, a collaboration of the University of Southern Denmark and Roskilde University. The concept behind his work is known commonly as a nudge, dubbed such because of the popular 2008 book of the same name by U.S. academics Richard Thaler and Cass Sunstein that examined how people make decisions.
At the Copenhagen airport, Dr. Hansen recently deployed a team of three young researchers to mill about a gate in terminal B. The trio was dressed casually in jeans and wore backpacks. They blended in with the passengers, except for the badges they wore displaying airport credentials, and the clipboards and pens they carried to record how the boarding process unfolds.
Thirty-five minutes before a flight departed, the team got into position. Andreas Rathmann Jensen stood in one corner, then moved to another, so he could survey the entire gate area. He mapped where people were sitting and where they placed their bags. This behavior can vary depending, for example, if people are flying alone, with a partner or in a group.
Johannes Schuldt-Jensen circulated among the rows and counted how many bags were blocking seats and how many seats were empty as boarding time approached. He wore headphones, though he wasn’t listening to music, because people seem less suspicious of behavior when a person has headphones on, he says. Another researcher, Kasper Hulgaard, counted how many people were standing versus sitting.
The researchers are mapping out gate-seating patterns for a total of about 500 flights. Some early observations: The more people who are standing, the more chaotic boarding tends to be. Copenhagen airport seating areas are designed for groups, even though most travelers come solo or in pairs. Solo flyers like to sit in a corner and put their bag on an adjacent seat. Pairs of travelers tend to perch anywhere as long as they can sit side-by-side….”

Complexity, Governance, and Networks: Perspectives from Public Administration


Paper by Naim Kapucu: “Complex public policy problems require a productive collaboration among different actors from multiple sectors. Networks are widely applied as a public management tool and strategy. This warrants a deeper analysis of networks and network management in public administration. There is a strong interest in both in practice and theory of networks in public administration. This requires an analysis of complex networks within public governance settings. In this this essay I briefly discuss research streams on complex networks, network governance, and current research challenges in public administration.”

Quantifying the Interoperability of Open Government Datasets


Paper by Pieter Colpaert, Mathias Van Compernolle, Laurens De Vocht, Anastasia Dimou, Miel Vander Sande, Peter Mechant, Ruben Verborgh, and Erik Mannens, to be published in Computer: “Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.”

Policy bubbles: What factors drive their birth, maturity and death?


Moshe Maor at LSE Blog: “A policy bubble is a real or perceived policy overreaction that is reinforced by positive feedback over a relatively long period of time. This type of policy imposes objective and/or perceived social costs without producing offsetting objective and/or perceived benefits over a considerable length of time. A case in point is when government spending over a policy problem increases due to public demand for more policy while the severity of the problem decreases over an extended period of time. Another case is when governments raise ‘green’ or other standards due to public demand while the severity of the problem does not justify this move…
Drawing on insights from a variety of fields – including behavioural economics, psychology, sociology, political science and public policy – three phases of the life-cycle of a policy bubble may be identified: birth, maturity and death. A policy bubble may emerge when certain individuals perceive opportunities to gain from public policy or to exploit it by rallying support for the policy, promoting word-of-mouth enthusiasm and widespread endorsement of the policy, heightening expectations for further policy, and increasing demand for this policy….
How can one identify a policy bubble? A policy bubble may be identified by measuring parliamentary concerns, media concerns, public opinion regarding the policy at hand, and the extent of a policy problem, against the budget allocation to said policy over the same period, preferably over 50 years or more. Measuring the operation of different transmission mechanisms in emotional contagion and human herding, particularly the spread of social influence and feeling, can also work to identify a policy bubble.
Here, computer-aided content analysis of verbal and non-verbal communication in social networks, especially instant messaging, may capture emotional and social contagion. A further way to identify a policy bubble revolves around studying bubble expectations and individuals’ confidence over time by distributing a questionnaire to a random sample of the population, experts in the relevant policy sub-field, as well as decision makers, and comparing the results across time and nations.
To sum up, my interpretation of the process that leads to the emergence of policy bubbles allows for the possibility that different modes of policy overreaction lead to different types of human herding, thereby resulting in different types of policy bubbles. This interpretation has the added benefit of contributing to the explanation of economic, financial, technological and social bubbles as well”

OkCupid reveals it’s been lying to some of its users. Just to see what’ll happen.


Brian Fung in the Washington Post: “It turns out that OkCupid has been performing some of the same psychological experiments on its users that landed Facebook in hot water recently.
In a lengthy blog post, OkCupid cofounder Christian Rudder explains that OkCupid has on occasion played around with removing text from people’s profiles, removing photos, and even telling some users they were an excellent match when in fact they were only a 30 percent match according to the company’s systems. Just to see what would happen.
OkCupid defends this behavior as something that any self-respecting Web site would do.
“OkCupid doesn’t really know what it’s doing. Neither does any other Web site,” Rudder wrote. “But guess what, everybody: if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.”…
we have a bigger problem on our hands: A problem about how to reconcile the sometimes valuable lessons of data science with the creep factor — particularly when you aren’t notified about being studied. But as I’ve written before, these kinds of studies happen all the time; it’s just rare that the public is presented with the results.
Short of banning the practice altogether, which seems totally unrealistic, corporate data science seems like an opportunity on a number of levels, particularly if it’s disclosed to the public. First, it helps us understand how human beings tend to behave at Internet scale. Second, it tells us more about how Internet companies work. And third, it helps consumers make better decisions about which services they’re comfortable using.
I suspect that what bothers us most of all is not that the research took place, but that we’re slowly coming to grips with how easily we ceded control over our own information — and how the machines that collect all this data may all know more about us than we do ourselves. We had no idea we were even in a rabbit hole, and now we’ve discovered we’re 10 feet deep. As many as 62.5 percent of Facebook users don’t know the news feed is generated by a company algorithm, according to a recent study conducted by Christian Sandvig, an associate professor at the University of Michigan, and Karrie Karahalios, an associate professor at the University of Illinois.
OkCupid’s blog post is distinct in several ways from Facebook’s psychological experiment. OkCupid didn’t try to publish its findings in a scientific journal. It isn’t even claiming that what it did was science. Moreover, OkCupid’s research is legitimately useful to users of the service — in ways that Facebook’s research is arguably not….
But in any case, there’s no such motivating factor when it comes to Facebook. Unless you’re a page administrator or news organization, understanding how the newsfeed works doesn’t really help the average user in the way that understanding how OkCupid works does. That’s because people use Facebook for all kinds of reasons that have nothing to do with Facebook’s commercial motives. But people would stop using OkCupid if they discovered it didn’t “work.”
If you’re lying to your users in an attempt to improve your service, what’s the line between A/B testing and fraud?”

Request for Proposals: Exploring the Implications of Government Release of Large Datasets


“The Berkeley Center for Law & Technology and Microsoft are issuing this request for proposals (RFP) to fund scholarly inquiry to examine the civil rights, human rights, security and privacy issues that arise from recent initiatives to release large datasets of government information to the public for analysis and reuse.  This research may help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.
This RFP seeks to:

    • Gain knowledge of the impact of the online release of large amounts of data generated by citizens’ interactions with government
    • Imagine new possibilities for technical, legal, and regulatory interventions that avoid abuse
    • Begin building a body of research that addresses these issues

– BACKGROUND –

 
Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.”  Using Open Data, entrepreneurs may create new products and services, and citizens may use it to gain insight into the government.  A plethora of time saving and other useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods.  Sometimes governments release large datasets in order to encourage the development of unimagined new applications.  For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.
Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen.  While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry.  The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever.  Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities.  Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern.  A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a “central finding” of a recent policy review by White House adviser John Podesta.
The Berkeley Center for Law & Technology (BCLT) and Microsoft are issuing this request for proposals in an effort to better understand the implications and potential impact of the release of data related to U.S. citizens’ interactions with their local, state and federal governments. BCLT and Microsoft will fund up to six grants, with a combined total of $300,000.  Grantees will be required to participate in a workshop to present and discuss their research at the Berkeley Technology Law Journal (BTLJ) Spring Symposium.  All grantees’ papers will be published in a dedicated monograph.  Grantees’ papers that approach the issues from a legal perspective may also be published in the BTLJ. We may also hold a followup workshop in New York City or Washington, DC.
While we are primarily interested in funding proposals that address issues related to the policy impacts of Open Data, many of these issues are intertwined with general societal implications of “big data.” As a result, proposals that explore Open Data from a big data perspective are welcome; however, proposals solely focused on big data are not.  We are open to proposals that address the following difficult question.  We are also open to methods and disciplines, and are particularly interested in proposals from cross-disciplinary teams.

    • To what extent does existing Open Data made available by city and state governments affect individual profiling?  Do the effects change depending on the level of aggregation (neighborhood vs. cities)?  What releases of information could foreseeably cause discrimination in the future? Will different groups in society be disproportionately impacted by Open Data?
    • Should the use of Open Data be governed by a code of conduct or subject to a review process before being released? In order to enhance citizen privacy, should governments develop guidelines to release sampled or perturbed data, instead of entire datasets? When datasets contain potentially identifiable information, should there be a notice-and-comment proceeding that includes proposed technological solutions to anonymize, de-identify or otherwise perturb the data?
    • Is there something fundamentally different about government services and the government’s collection of citizen’s data for basic needs in modern society such as power and water that requires governments to exercise greater due care than commercial entities?
    • Companies have legal and practical mechanisms to shield data submitted to government from public release.  What mechanisms do individuals have or should have to address misuse of Open Data?  Could developments in the constitutional right to information policy as articulated in Whalen and Westinghouse Electric Co address Open Data privacy issues?
    • Collecting data costs money, and its release could affect civil liberties.  Yet it is being given away freely, sometimes to immensely profitable firms.  Should governments license data for a fee and/or impose limits on its use, given its value?
    • The privacy principle of “collection limitation” is under siege, with many arguing that use restrictions will be more efficacious for protecting privacy and more workable for big data analysis.  Does the potential of Open Data justify eroding state and federal privacy act collection limitation principles?   What are the ethical dimensions of a government system that deprives the data subject of the ability to obscure or prevent the collection of data about a sensitive issue?  A move from collection restrictions to use regulation raises a number of related issues, detailed below.
    • Are use restrictions efficacious in creating accountability?  Consumer reporting agencies are regulated by use restrictions, yet they are not known for their accountability.  How could use regulations be implemented in the context of Open Data efficaciously?  Can a self-learning algorithm honor data use restrictions?
    • If an Open Dataset were regulated by a use restriction, how could individuals police wrongful uses?   How would plaintiffs overcome the likely defenses or proof of facts in a use regulation system, such as a burden to prove that data were analyzed and the product of that analysis was used in a certain way to harm the plaintiff?  Will plaintiffs ever be able to beat first amendment defenses?
    • The President’s Council of Advisors on Science and Technology big data report emphasizes that analysis is not a “use” of data.  Such an interpretation suggests that NSA metadata analysis and large-scale scanning of communications do not raise privacy issues.  What are the ethical and legal implications of the “analysis is not use” argument in the context of Open Data?
    • Open Data celebrates the idea that information collected by the government can be used by another person for various kinds of analysis.  When analysts are not involved in the collection of data, they are less likely to understand its context and limitations.  How do we ensure that this knowledge is maintained in a use regulation system?
    • Former President William Clinton was admitted under a pseudonym for a procedure at a New York Hospital in 2004.  The hospital detected 1,500 attempts by its own employees to access the President’s records.  With snooping such a tempting activity, how could incentives be crafted to cause self-policing of government data and the self-disclosure of inappropriate uses of Open Data?
    • It is clear that data privacy regulation could hamper some big data efforts.  However, many examples of big data successes hail from highly regulated environments, such as health care and financial services—areas with statutory, common law, and IRB protections.  What are the contours of privacy law that are compatible with big data and Open Data success and which are inherently inimical to it?
    • In recent years, the problem of “too much money in politics” has been addressed with increasing disclosure requirements.  Yet, distrust in government remains high, and individuals identified in donor databases have been subjected to harassment.  Is the answer to problems of distrust in government even more Open Data?
    • What are the ethical and epistemological implications of encouraging government decision-making based upon correlation analysis, without a rigorous understanding of cause and effect?  Are there decisions that should not be left to just correlational proof? While enthusiasm for data science has increased, scientific journals are elevating their standards, with special scrutiny focused on hypothesis-free, multiple comparison analysis. What could legal and policy experts learn from experts in statistics about the nature and limits of open data?…
      To submit a proposal, visit the Conference Management Toolkit (CMT) here.
      Once you have created a profile, the site will allow you to submit your proposal.
      If you have questions, please contact Chris Hoofnagle, principal investigator on this project.”

The Innovators


Kirkus Review of “The innovators. How a Group of Inventors, Hackers, Geniuses, and Geeks Created the Digital Revolution” by Walter Isaacson: “Innovation occurs when ripe seeds fall on fertile ground,” Aspen Institute CEO Isaacson (Steve Jobs, 2011, etc.) writes in this sweeping, thrilling tale of three radical innovations that gave rise to the digital age. First was the evolution of the computer, which Isaacson traces from its 19th-century beginnings in Ada Lovelace’s “poetical” mathematics and Charles Babbage’s dream of an “Analytical Engine” to the creation of silicon chips with circuits printed on them. The second was “the invention of a corporate culture and management style that was the antithesis of the hierarchical organization of East Coast companies.” In the rarefied neighborhood dubbed Silicon Valley, new businesses aimed for a cooperative, nonauthoritarian model that nurtured cross-fertilization of ideas. The third innovation was the creation of demand for personal devices: the pocket radio; the calculator, marketing brainchild of Texas Instruments; video games; and finally, the holy grail of inventions: the personal computer. Throughout his action-packed story, Isaacson reiterates one theme: Innovation results from both “creative inventors” and “an evolutionary process that occurs when ideas, concepts, technologies, and engineering methods ripen together.” Who invented the microchip? Or the Internet? Mostly, Isaacson writes, these emerged from “a loosely knit cohort of academics and hackers who worked as peers and freely shared their creative ideas….Innovation is not a loner’s endeavor.” Isaacson offers vivid portraits—many based on firsthand interviews—of mathematicians, scientists, technicians and hackers (a term that used to mean anyone who fooled around with computers), including the elegant, “intellectually intimidating,” Hungarian-born John von Neumann; impatient, egotistical William Shockley; Grace Hopper, who joined the Army to pursue a career in mathematics; “laconic yet oddly charming” J.C.R. Licklider, one father of the Internet; Bill Gates, Steve Jobs, and scores of others.
Isaacson weaves prodigious research and deftly crafted anecdotes into a vigorous, gripping narrative about the visionaries whose imaginations and zeal continue to transform our lives.”