Linda A. Hill, Greg Brandeau, Emily Truelove, and Kent Lineback in HBR Review: “Google’s astonishing success in its first decade now seems to have been almost inevitable. But step inside its systems infrastructure group, and you quickly learn otherwise. The company’s meteoric growth depended in large part on its ability to innovate and scale up its infrastructure at an unprecedented pace. Bill Coughran, as a senior vice president of engineering, led the group from 2003 to 2011. His 1,000-person organization built Google’s “engine room,” the systems and equipment that allow us all to use Google and its many services 24/7. “We were doing work that no one else in the world was doing,” he says. “So when a problem happened, we couldn’t just go out and buy a solution. We had to create it.”
Coughran joined Google in 2003, just five years after its founding. By then it had already reinvented the way it handled web search and data storage multiple times. His group was using Google File System (GFS) to store the massive amount of data required to support Google searches. Given Google’s ferocious appetite for growth, Coughran knew that GFS—once a groundbreaking innovation—would have to be replaced within a couple of years. The number of searches was growing dramatically, and Google was adding Gmail and other applications that needed not just more storage but storage of a kind different from what GFS had been optimized to handle.
Building the next-generation system—and the next one, and the one after that—was the job of the systems infrastructure group. It had to create the new engine room, in-house, while simultaneously refining the current one. Because this was Coughran’s top priority—and given that he had led the storied Bell Labs and had a PhD in computer science from Stanford and degrees in mathematics from Caltech—one might expect that he would first focus on developing a technical solution for Google’s storage problems and then lead his group through its implementation.
But that’s not how Coughran proceeded. To him, there was a bigger problem, a perennial challenge that many leaders inevitably come to contemplate: How do I build an organization capable of innovating continually over time? Coughran knew that the role of a leader of innovation is not to set a vision and motivate others to follow it. It’s to create a community that is willing and able to generate new ideas…”
In Tests, Scientists Try to Change Behaviors
Wall Street Journal: “Behavioral scientists look for environmental ‘nudges’ to influence how people act. Pelle Guldborg Hansen, a behavioral scientist, is trying to figure out how to board passengers on a plane with less fuss.
The goal is to make plane-boarding more efficient by coaxing passengers to want to be more orderly, not by telling them they must. It is one of many projects in which Dr. Hansen seeks to encourage people, when faced with options, to make better choices. Among these: prompting people to properly dispose of cigarette butts outside of bars and clubs and inducing hospital workers to use hand sanitizers.
Dr. Hansen, 37 years old, is director of the Initiative for Science, Society & Policy, a collaboration of the University of Southern Denmark and Roskilde University. The concept behind his work is known commonly as a nudge, dubbed such because of the popular 2008 book of the same name by U.S. academics Richard Thaler and Cass Sunstein that examined how people make decisions.
At the Copenhagen airport, Dr. Hansen recently deployed a team of three young researchers to mill about a gate in terminal B. The trio was dressed casually in jeans and wore backpacks. They blended in with the passengers, except for the badges they wore displaying airport credentials, and the clipboards and pens they carried to record how the boarding process unfolds.
Thirty-five minutes before a flight departed, the team got into position. Andreas Rathmann Jensen stood in one corner, then moved to another, so he could survey the entire gate area. He mapped where people were sitting and where they placed their bags. This behavior can vary depending, for example, if people are flying alone, with a partner or in a group.
Johannes Schuldt-Jensen circulated among the rows and counted how many bags were blocking seats and how many seats were empty as boarding time approached. He wore headphones, though he wasn’t listening to music, because people seem less suspicious of behavior when a person has headphones on, he says. Another researcher, Kasper Hulgaard, counted how many people were standing versus sitting.
The researchers are mapping out gate-seating patterns for a total of about 500 flights. Some early observations: The more people who are standing, the more chaotic boarding tends to be. Copenhagen airport seating areas are designed for groups, even though most travelers come solo or in pairs. Solo flyers like to sit in a corner and put their bag on an adjacent seat. Pairs of travelers tend to perch anywhere as long as they can sit side-by-side….”
Complexity, Governance, and Networks: Perspectives from Public Administration
Paper by Naim Kapucu: “Complex public policy problems require a productive collaboration among different actors from multiple sectors. Networks are widely applied as a public management tool and strategy. This warrants a deeper analysis of networks and network management in public administration. There is a strong interest in both in practice and theory of networks in public administration. This requires an analysis of complex networks within public governance settings. In this this essay I briefly discuss research streams on complex networks, network governance, and current research challenges in public administration.”
Quantifying the Interoperability of Open Government Datasets
Paper by Pieter Colpaert, Mathias Van Compernolle, Laurens De Vocht, Anastasia Dimou, Miel Vander Sande, Peter Mechant, Ruben Verborgh, and Erik Mannens, to be published in Computer: “Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.”
Policy bubbles: What factors drive their birth, maturity and death?
Moshe Maor at LSE Blog: “A policy bubble is a real or perceived policy overreaction that is reinforced by positive feedback over a relatively long period of time. This type of policy imposes objective and/or perceived social costs without producing offsetting objective and/or perceived benefits over a considerable length of time. A case in point is when government spending over a policy problem increases due to public demand for more policy while the severity of the problem decreases over an extended period of time. Another case is when governments raise ‘green’ or other standards due to public demand while the severity of the problem does not justify this move…
Drawing on insights from a variety of fields – including behavioural economics, psychology, sociology, political science and public policy – three phases of the life-cycle of a policy bubble may be identified: birth, maturity and death. A policy bubble may emerge when certain individuals perceive opportunities to gain from public policy or to exploit it by rallying support for the policy, promoting word-of-mouth enthusiasm and widespread endorsement of the policy, heightening expectations for further policy, and increasing demand for this policy….
How can one identify a policy bubble? A policy bubble may be identified by measuring parliamentary concerns, media concerns, public opinion regarding the policy at hand, and the extent of a policy problem, against the budget allocation to said policy over the same period, preferably over 50 years or more. Measuring the operation of different transmission mechanisms in emotional contagion and human herding, particularly the spread of social influence and feeling, can also work to identify a policy bubble.
Here, computer-aided content analysis of verbal and non-verbal communication in social networks, especially instant messaging, may capture emotional and social contagion. A further way to identify a policy bubble revolves around studying bubble expectations and individuals’ confidence over time by distributing a questionnaire to a random sample of the population, experts in the relevant policy sub-field, as well as decision makers, and comparing the results across time and nations.
To sum up, my interpretation of the process that leads to the emergence of policy bubbles allows for the possibility that different modes of policy overreaction lead to different types of human herding, thereby resulting in different types of policy bubbles. This interpretation has the added benefit of contributing to the explanation of economic, financial, technological and social bubbles as well”
OkCupid reveals it’s been lying to some of its users. Just to see what’ll happen.
Brian Fung in the Washington Post: “It turns out that OkCupid has been performing some of the same psychological experiments on its users that landed Facebook in hot water recently.
In a lengthy blog post, OkCupid cofounder Christian Rudder explains that OkCupid has on occasion played around with removing text from people’s profiles, removing photos, and even telling some users they were an excellent match when in fact they were only a 30 percent match according to the company’s systems. Just to see what would happen.
OkCupid defends this behavior as something that any self-respecting Web site would do.
“OkCupid doesn’t really know what it’s doing. Neither does any other Web site,” Rudder wrote. “But guess what, everybody: if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.”…
we have a bigger problem on our hands: A problem about how to reconcile the sometimes valuable lessons of data science with the creep factor — particularly when you aren’t notified about being studied. But as I’ve written before, these kinds of studies happen all the time; it’s just rare that the public is presented with the results.
Short of banning the practice altogether, which seems totally unrealistic, corporate data science seems like an opportunity on a number of levels, particularly if it’s disclosed to the public. First, it helps us understand how human beings tend to behave at Internet scale. Second, it tells us more about how Internet companies work. And third, it helps consumers make better decisions about which services they’re comfortable using.
I suspect that what bothers us most of all is not that the research took place, but that we’re slowly coming to grips with how easily we ceded control over our own information — and how the machines that collect all this data may all know more about us than we do ourselves. We had no idea we were even in a rabbit hole, and now we’ve discovered we’re 10 feet deep. As many as 62.5 percent of Facebook users don’t know the news feed is generated by a company algorithm, according to a recent study conducted by Christian Sandvig, an associate professor at the University of Michigan, and Karrie Karahalios, an associate professor at the University of Illinois.
OkCupid’s blog post is distinct in several ways from Facebook’s psychological experiment. OkCupid didn’t try to publish its findings in a scientific journal. It isn’t even claiming that what it did was science. Moreover, OkCupid’s research is legitimately useful to users of the service — in ways that Facebook’s research is arguably not….
But in any case, there’s no such motivating factor when it comes to Facebook. Unless you’re a page administrator or news organization, understanding how the newsfeed works doesn’t really help the average user in the way that understanding how OkCupid works does. That’s because people use Facebook for all kinds of reasons that have nothing to do with Facebook’s commercial motives. But people would stop using OkCupid if they discovered it didn’t “work.”
If you’re lying to your users in an attempt to improve your service, what’s the line between A/B testing and fraud?”
Request for Proposals: Exploring the Implications of Government Release of Large Datasets
“The Berkeley Center for Law & Technology and Microsoft are issuing this request for proposals (RFP) to fund scholarly inquiry to examine the civil rights, human rights, security and privacy issues that arise from recent initiatives to release large datasets of government information to the public for analysis and reuse. This research may help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.
This RFP seeks to:
- Gain knowledge of the impact of the online release of large amounts of data generated by citizens’ interactions with government
- Imagine new possibilities for technical, legal, and regulatory interventions that avoid abuse
- Begin building a body of research that addresses these issues
– BACKGROUND –
Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.” Using Open Data, entrepreneurs may create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. Sometimes governments release large datasets in order to encourage the development of unimagined new applications. For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.
Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen. While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry. The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever. Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities. Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern. A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a “central finding” of a recent policy review by White House adviser John Podesta.
The Berkeley Center for Law & Technology (BCLT) and Microsoft are issuing this request for proposals in an effort to better understand the implications and potential impact of the release of data related to U.S. citizens’ interactions with their local, state and federal governments. BCLT and Microsoft will fund up to six grants, with a combined total of $300,000. Grantees will be required to participate in a workshop to present and discuss their research at the Berkeley Technology Law Journal (BTLJ) Spring Symposium. All grantees’ papers will be published in a dedicated monograph. Grantees’ papers that approach the issues from a legal perspective may also be published in the BTLJ. We may also hold a followup workshop in New York City or Washington, DC.
While we are primarily interested in funding proposals that address issues related to the policy impacts of Open Data, many of these issues are intertwined with general societal implications of “big data.” As a result, proposals that explore Open Data from a big data perspective are welcome; however, proposals solely focused on big data are not. We are open to proposals that address the following difficult question. We are also open to methods and disciplines, and are particularly interested in proposals from cross-disciplinary teams.
- To what extent does existing Open Data made available by city and state governments affect individual profiling? Do the effects change depending on the level of aggregation (neighborhood vs. cities)? What releases of information could foreseeably cause discrimination in the future? Will different groups in society be disproportionately impacted by Open Data?
- Should the use of Open Data be governed by a code of conduct or subject to a review process before being released? In order to enhance citizen privacy, should governments develop guidelines to release sampled or perturbed data, instead of entire datasets? When datasets contain potentially identifiable information, should there be a notice-and-comment proceeding that includes proposed technological solutions to anonymize, de-identify or otherwise perturb the data?
- Is there something fundamentally different about government services and the government’s collection of citizen’s data for basic needs in modern society such as power and water that requires governments to exercise greater due care than commercial entities?
- Companies have legal and practical mechanisms to shield data submitted to government from public release. What mechanisms do individuals have or should have to address misuse of Open Data? Could developments in the constitutional right to information policy as articulated in Whalen and Westinghouse Electric Co address Open Data privacy issues?
- Collecting data costs money, and its release could affect civil liberties. Yet it is being given away freely, sometimes to immensely profitable firms. Should governments license data for a fee and/or impose limits on its use, given its value?
- The privacy principle of “collection limitation” is under siege, with many arguing that use restrictions will be more efficacious for protecting privacy and more workable for big data analysis. Does the potential of Open Data justify eroding state and federal privacy act collection limitation principles? What are the ethical dimensions of a government system that deprives the data subject of the ability to obscure or prevent the collection of data about a sensitive issue? A move from collection restrictions to use regulation raises a number of related issues, detailed below.
- Are use restrictions efficacious in creating accountability? Consumer reporting agencies are regulated by use restrictions, yet they are not known for their accountability. How could use regulations be implemented in the context of Open Data efficaciously? Can a self-learning algorithm honor data use restrictions?
- If an Open Dataset were regulated by a use restriction, how could individuals police wrongful uses? How would plaintiffs overcome the likely defenses or proof of facts in a use regulation system, such as a burden to prove that data were analyzed and the product of that analysis was used in a certain way to harm the plaintiff? Will plaintiffs ever be able to beat first amendment defenses?
- The President’s Council of Advisors on Science and Technology big data report emphasizes that analysis is not a “use” of data. Such an interpretation suggests that NSA metadata analysis and large-scale scanning of communications do not raise privacy issues. What are the ethical and legal implications of the “analysis is not use” argument in the context of Open Data?
- Open Data celebrates the idea that information collected by the government can be used by another person for various kinds of analysis. When analysts are not involved in the collection of data, they are less likely to understand its context and limitations. How do we ensure that this knowledge is maintained in a use regulation system?
- Former President William Clinton was admitted under a pseudonym for a procedure at a New York Hospital in 2004. The hospital detected 1,500 attempts by its own employees to access the President’s records. With snooping such a tempting activity, how could incentives be crafted to cause self-policing of government data and the self-disclosure of inappropriate uses of Open Data?
- It is clear that data privacy regulation could hamper some big data efforts. However, many examples of big data successes hail from highly regulated environments, such as health care and financial services—areas with statutory, common law, and IRB protections. What are the contours of privacy law that are compatible with big data and Open Data success and which are inherently inimical to it?
- In recent years, the problem of “too much money in politics” has been addressed with increasing disclosure requirements. Yet, distrust in government remains high, and individuals identified in donor databases have been subjected to harassment. Is the answer to problems of distrust in government even more Open Data?
- What are the ethical and epistemological implications of encouraging government decision-making based upon correlation analysis, without a rigorous understanding of cause and effect? Are there decisions that should not be left to just correlational proof? While enthusiasm for data science has increased, scientific journals are elevating their standards, with special scrutiny focused on hypothesis-free, multiple comparison analysis. What could legal and policy experts learn from experts in statistics about the nature and limits of open data?…
To submit a proposal, visit the Conference Management Toolkit (CMT) here.
Once you have created a profile, the site will allow you to submit your proposal.
If you have questions, please contact Chris Hoofnagle, principal investigator on this project.”
The Innovators
Kirkus Review of “The innovators. How a Group of Inventors, Hackers, Geniuses, and Geeks Created the Digital Revolution” by Walter Isaacson: “Innovation occurs when ripe seeds fall on fertile ground,” Aspen Institute CEO Isaacson (Steve Jobs, 2011, etc.) writes in this sweeping, thrilling tale of three radical innovations that gave rise to the digital age. First was the evolution of the computer, which Isaacson traces from its 19th-century beginnings in Ada Lovelace’s “poetical” mathematics and Charles Babbage’s dream of an “Analytical Engine” to the creation of silicon chips with circuits printed on them. The second was “the invention of a corporate culture and management style that was the antithesis of the hierarchical organization of East Coast companies.” In the rarefied neighborhood dubbed Silicon Valley, new businesses aimed for a cooperative, nonauthoritarian model that nurtured cross-fertilization of ideas. The third innovation was the creation of demand for personal devices: the pocket radio; the calculator, marketing brainchild of Texas Instruments; video games; and finally, the holy grail of inventions: the personal computer. Throughout his action-packed story, Isaacson reiterates one theme: Innovation results from both “creative inventors” and “an evolutionary process that occurs when ideas, concepts, technologies, and engineering methods ripen together.” Who invented the microchip? Or the Internet? Mostly, Isaacson writes, these emerged from “a loosely knit cohort of academics and hackers who worked as peers and freely shared their creative ideas….Innovation is not a loner’s endeavor.” Isaacson offers vivid portraits—many based on firsthand interviews—of mathematicians, scientists, technicians and hackers (a term that used to mean anyone who fooled around with computers), including the elegant, “intellectually intimidating,” Hungarian-born John von Neumann; impatient, egotistical William Shockley; Grace Hopper, who joined the Army to pursue a career in mathematics; “laconic yet oddly charming” J.C.R. Licklider, one father of the Internet; Bill Gates, Steve Jobs, and scores of others.
Isaacson weaves prodigious research and deftly crafted anecdotes into a vigorous, gripping narrative about the visionaries whose imaginations and zeal continue to transform our lives.”
A Different Idea of Our Declaration
Gordon S. Wood reviews Our Declaration: A Reading of the Declaration of Independence in Defense of Equality by Danielle Allen in the New York Review of Books: “If we read the Declaration of Independence slowly and carefully, Danielle Allen believes, then the document can become a basic primer for our democracy. It can be something that all of us—not just scholars and educated elites but common ordinary people—can participate in, and should participate in if we want to be good democratic citizens.
Allen, who is a professor of social science at the Institute for Advanced Study in Princeton, came to this extraordinary conclusion when she was teaching for a decade at the University of Chicago. But it was not the young bright-eyed undergraduates whom she taught by day who inspired her. Instead, it was the much older, life-tested adults whom she taught by night who created “the single most transformative experience” of her teaching career.
As she slowly worked her way through the 1,337 words of the Declaration of Independence with her night students, many of whom had no job or were working two jobs or were stuck in dead-end part-time jobs, Allen discovered that the document had meaning for them and that it was accessible to any reader or hearer of its words. By teaching the document to these adult students in the way that she did, she experienced “a personal metamorphosis.” For the first time in her life she came to realize that the Declaration makes a coherent philosophical argument about equality, an argument that could be made comprehensible to ordinary people who had no special training…”
'Big Data' Will Change How You Play, See the Doctor, Even Eat
We’re entering an age of personal big data, and its impact on our lives will surpass that of the Internet. Data will answer questions we could never before answer with certainty—everyday questions like whether that dress actually makes you look fat, or profound questions about precisely how long you will live.
Every 20 years or so, a powerful technology moves from the realm of backroom expertise and into the hands of the masses. In the late-1970s, computing made that transition—from mainframes in glass-enclosed rooms to personal computers on desks. In the late 1990s, the first web browsers made networks, which had been for science labs and the military, accessible to any of us, giving birth to the modern Internet.
Each transition touched off an explosion of innovation and reshaped work and leisure. In 1975, 50,000 PCs were in use worldwide. Twenty years later: 225 million. The number of Internet users in 1995 hit 16 million. Today it’s more than 3 billion. In much of the world, it’s hard to imagine life without constant access to both computing and networks.
The 2010s will be the coming-out party for data. Gathering, accessing and gleaning insights from vast and deep data has been a capability locked inside enterprises long enough. Cloud computing and mobile devices now make it possible to stand in a bathroom line at a baseball game while tapping into massive computing power and databases. On the other end, connected devices such as the Nest thermostat or Fitbit health monitor and apps on smartphones increasingly collect new kinds of information about everyday personal actions and habits, turning it into data about ourselves.
More than 80 percent of data today is unstructured: tangles of YouTube videos, news stories, academic papers, social network comments. Unstructured data has been almost impossible to search for, analyze and mix with other data. A new generation of computers—cognitive computing systems that learn from data—will read tweets or e-books or watch video, and comprehend its content. Somewhat like brains, these systems can link diverse bits of data to come up with real answers, not just search results.
Such systems can work in natural language. The progenitor is the IBM Watson computer that won on Jeopardy in 2011. Next-generation Watsons will work like a super-powered Google. (Google today is a data-searching wimp compared with what’s coming.)
Sports offers a glimpse into the data age. Last season the NBA installed in every arena technology that can “watch” a game and record, in 48 minutes of action, more than 4 million data points about every movement and shot. That alone could yield new insights for NBA coaches, such as which group of five players most efficiently passes the ball around….
Think again about life before personal computing and the Internet. Even if someone told you that you’d eventually carry a computer in your pocket that was always connected to global networks, you would’ve had a hard time imagining what that meant—imagining WhatsApp, Siri, Pandora, Uber, Evernote, Tinder.
As data about everything becomes ubiquitous and democratized, layered on top of computing and networks, it will touch off the most spectacular technology explosion yet. We can see the early stages now. “Big data” doesn’t even begin to describe the enormity of what’s coming next.”