Big Data Becomes a Mirror


Book Review of ‘Uncharted,’ by Erez Aiden and Jean-Baptiste Michel in the New York Times: “Why do English speakers say “drove” rather than “drived”?

As graduate students at the Harvard Program for Evolutionary Dynamics about eight years ago, Erez Aiden and Jean-Baptiste Michel pondered the matter and decided that something like natural selection might be at work. In English, the “-ed” past-tense ending of Proto-Germanic, like a superior life form, drove out the Proto-Indo-European system of indicating tenses by vowel changes. Only the small class of verbs we know as irregular managed to resist.

To test this evolutionary premise, Mr. Aiden and Mr. Michel wound up inventing something they call culturomics, the use of huge amounts of digital information to track changes in language, culture and history. Their quest is the subject of “Uncharted: Big Data as a Lens on Human Culture,” an entertaining tour of the authors’ big-data adventure, whose implications they wildly oversell….

Invigorated by the great verb chase, Mr. Aiden and Mr. Michel went hunting for bigger game. Given a large enough storehouse of words and a fine filter, would it be possible to see cultural change at the micro level, to follow minute fluctuations in human thought processes and activities? Tiny factoids, multiplied endlessly, might assume imposing dimensions.

By chance, Google Books, the megaproject to digitize every page of every book ever printed — all 130 million of them — was starting to roll just as the authors were looking for their next target of inquiry.

Meetings were held, deals were struck and the authors got to it. In 2010, working with Google, they perfected the Ngram Viewer, which takes its name from the computer-science term for a word or phrase. This “robot historian,” as they call it, can search the 30 million volumes already digitized by Google Books and instantly generate a usage-frequency timeline for any word, phrase, date or name, a sort of stock-market graph illustrating the ups and downs of cultural shares over time.

Mr. Aiden, now director of the Center for Genome Architecture at Rice University, and Mr. Michel, who went on to start the data-science company Quantified Labs, play the Ngram Viewer (books.google.com/ngrams) like a Wurlitzer…

The Ngram Viewer delivers the what and the when but not the why. Take the case of specific years. All years get attention as they approach, peak when they arrive, then taper off as succeeding years occupy the attention of the public. Mentions of the year 1872 had declined by half in 1896, a slow fade that took 23 years. The year 1973 completed the same trajectory in less than half the time.

“What caused that change?” the authors ask. “We don’t know. For now, all we have are the naked correlations: what we uncover when we look at collective memory through the digital lens of our new scope.” Someone else is going to have to do the heavy lifting.”

Web Science: Understanding the Emergence of Macro-Level Features on the World Wide Web


Monograph by Kieron O’Hara, Noshir S. Contractor, Wendy Hall, James A. Hendler and Nigel Shadbolt in Foundations and Trends in Web Sciences: “Web Science considers the development of Web Science since the publication of ‘A Framework for Web Science’ (Berners-Lee et al., 2006). This monograph argues that the requirement for understanding should ideally be accompanied by some measure of control, which makes Web Science crucial in the future provision of tools for managing our interactions, our politics, our economics, our entertainment, and – not least – our knowledge and data sharing…
In this monograph we consider the development of Web Science since the launch of this journal and its inaugural publication ‘A Framework for Web Science’ [44]. The theme of emergence is discussed as the characteristic phenomenon of Web-scale applications, where many unrelated micro-level actions and decisions, uninformed by knowledge about the macro-level, still produce noticeable and coherent effects at the scale of the Web. A model of emergence is mapped onto the multitheoretical multilevel (MTML) model of communication networks explained in [252]. Four specific types of theoretical problem are outlined. First, there is the need to explain local action. Second, the global patterns that form when local actions are repeated at scale have to be detected and understood. Third, those patterns feed back into the local, with intricate and often fleeting causal connections to be traced. Finally, as Web Science is an engineering discipline, issues of control of this feedback must be addressed. The idea of a social machine is introduced, where networked interactions at scale can help to achieve goals for people and social groups in civic society; an important aim of Web Science is to understand how such networks can operate, and how they can control the effects they produce on their own environment.”

Are Smart Cities Empty Hype?


Irving Wladawsky-Berger in the Wall Street Journal: “A couple of weeks ago I participated in an online debate sponsored by The Economist around the question: Are Smart Cities Empty Hype? Defending the motion was Anthony Townsend, research director at the Institute for the Future and adjunct faculty member at NYU’s Wagner School of Public Service. I took the opposite side, arguing the case against the motion.
The debate consisted of three phases spread out over roughly 10 days. We each first stated our respective positions in our opening statements, followed a few days later by our rebuttals, and then finally our closing statements.  It was moderated by Ludwig Siegele, online business and finance editor at The Economist. Throughout the process, people were invited to vote on the motion, as well as to post their own comments.
The debate was inspired, I believe, by The Multiplexed Metropolis, an article Mr. Siegele published in the September 7 issue of The Economist which explored the impact of Big Data on cities. He wrote that the vast amounts of data generated by the many social interactions taking place in cities might lead to a kind of second electrification, transforming 21st century cities much as electricity did in the past. “Enthusiasts think that data services can change cities in this century as much as electricity did in the last one,” he noted. “They are a long way from proving their case.”
In my opening statement, I said that I strongly believe that digital technologies and the many data services they are enabling will make cities smarter and help transform them over time. My position is not surprising, given my affiliations with NYU’s Center for Urban Science and Progress (CUSP) and Imperial College’s Digital City Exchange, as well as my past involvements with IBM’s Smarter Cities and with Citigroup’s Citi for Cities initiatives. But, I totally understand why so many– almost half of those voting and quite a few who left comments–feel that smart cities are mostly hype. The case for smart cities is indeed far from proven.
Cities are the most complex social organisms created by humans. Just about every aspect of human endeavor is part of the mix of cities, and they all interact with each other leading to a highly dynamic system of systems. Moreover, each city has its own unique style and character. As is generally the case with transformative changes to highly complex systems, the evolution toward smart cities will likely take quite a bit longer than we anticipate, but the eventual impact will probably be more transformative than we can currently envision.
Electrification, for example, started in the U.S., Britain and other advanced nations around the 1880s and took decades to deploy and truly transform cities. The hype around smart cities that I worry the most about is underestimating their complexity and the amount of research, experimentation, and plain hard work that it will take to realize the promise. Smart cities projects are still in their very early stages. Some will work and some will fail. We have much to learn. Highly complex systems need time to evolve.
Commenting on the opening statements, Mr. Siegele noted: “Despite the motion being Are smart cities empty hype?, both sides have focused on whether these should be implemented top-down or bottom-up. Most will probably agree that digital technology can make cities smarter–meaning more liveable, more efficient, more sustainable and perhaps even more democratic.  But the big question is how to get there and how smart cities will be governed.”…

Philosophical Engineering: Toward a Philosophy of the Web


New book by Harry Halpin (Editor) and Alexandre Monnin (Editor) : “This is the first interdisciplinary exploration of the philosophical foundations of the Web, a new area of inquiry that has important implications across a range of domains.

  • Contains twelve essays that bridge the fields of philosophy, cognitive science, and phenomenology
  • Tackles questions such as the impact of Google on intelligence and epistemology, the philosophical status of digital objects, ethics on the Web, semantic and ontological changes caused by the Web, and the potential of the Web to serve as a genuine cognitive extension
  • Brings together insightful new scholarship from well-known analytic and continental philosophers, such as Andy Clark and Bernard Stiegler, as well as rising scholars in “digital native” philosophy and engineering
  • Includes an interview with Tim Berners-Lee, the inventor of the Web”…

Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation


New paper by Henry Sauermann and Chiara Franzoni: “Crowd-based knowledge production is attracting growing attention from scholars and practitioners. One key premise is that participants who have an intrinsic “interest” in a topic or activity are willing to expend effort at lower pay than in traditional employment relationships. However, it is not clear how strong and sustainable interest is as a source of motivation. We draw on research in psychology to discuss important static and dynamic features of interest and derive a number of research questions regarding interest-based effort in crowd-based projects. Among others, we consider the specific versus general nature of interest, highlight the potential role of matching between projects and individuals, and distinguish the intensity of interest at a point in time from the development and sustainability of interest over time. We then examine users’ participation patterns within and across 7 different crowd science projects that are hosted on a shared platform. Our results provide novel insights into contribution dynamics in crowd science projects. Moreover, given that extrinsic incentives such as pay, status, self-use, or career benefits are largely absent in these particular projects, the data also provide unique insights into the dynamics of interest-based motivation and into its potential as a driver of effort.”

Google Global Impact Award Expands Zooniverse


Press Release: “A $1.8 million Google Global Impact Award will enable Zooniverse, a nonprofit collaboration led by the Adler Planetarium and the University of Oxford, to make setting up a citizen science project as easy as starting a blog and could lead to thousands of innovative new projects around the world, accelerating the pace of scientific research.
The award supports the further development of the Zooniverse, the world’s leading ‘citizen science’ platform, which has already given more than 900,000 online volunteers the chance to contribute to science by taking part in activities including discovering planets, classifying plankton or searching through old ship’s logs for observations of interest to climate scientists. As part of the Global Impact Award, the Adler will receive $400,000 to support the Zooniverse platform.
With the Google Global Impact Award, Zooniverse will be able to rebuild their platform so that research groups with no web development expertise can build and launch their own citizen science projects.
“We are entering a new era of citizen science – this effort will enable prolific development of science projects in which hundreds of thousands of additional volunteers will be able to work alongside professional scientists to conduct important research – the potential for discovery is limitless,” said Michelle B. Larson, Ph.D., Adler Planetarium president and CEO. “The Adler is honored to join its fellow Zooniverse partner, the University of Oxford, as a Google Global Impact Award recipient.”
The Zooniverse – the world’s leading citizen science platform – is a global collaboration across several institutions that design and build citizen science projects. The Adler is a founding partner of the Zooniverse, which has already engaged more than 900,000 online volunteers as active scientists by discovering planets, mapping the surface of Mars and detecting solar flares. Adler-directed citizen science projects include: Galaxy Zoo (astronomy), Solar Stormwatch (solar physics), Moon Zoo (planetary science), Planet Hunters (exoplanets) and The Milky Way Project (star formation). The Zooniverse (zooniverse.org) also includes projects in environmental, biological and medical sciences. Google’s investment in the Adler and its Zooniverse partner, the University of Oxford, will further the global reach, making thousands of new projects possible.”

Selected Readings on Crowdsourcing Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of crowdsourcing data was originally published in 2013.

As institutions seek to improve decision-making through data and put public data to use to improve the lives of citizens, new tools and projects are allowing citizens to play a role in both the collection and utilization of data. Participatory sensing and other citizen data collection initiatives, notably in the realm of disaster response, are allowing citizens to crowdsource important data, often using smartphones, that would be either impossible or burdensomely time-consuming for institutions to collect themselves. Civic hacking, often performed in hackathon events, on the other hand, is a growing trend in which governments encourage citizens to transform data from government and other sources into useful tools to benefit the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Baraniuk, Chris. “Power Politechs.” New Scientist 218, no. 2923 (June 29, 2013): 36–39. http://bit.ly/167ul3J.

  • In this article, Baraniuk discusses civic hackers, “an army of volunteer coders who are challenging preconceptions about hacking and changing the way your government operates. In a time of plummeting budgets and efficiency drives, those in power have realised they needn’t always rely on slow-moving, expensive outsourcing and development to improve public services. Instead, they can consider running a hackathon, at which tech-savvy members of the public come together to create apps and other digital tools that promise to enhace the provision of healthcare, schools or policing.”
  • While recognizing that “civic hacking has established a pedigree that demonstrates its potential for positive impact,” Baraniuk argues that a “more rigorous debate over how this activity should evolve, or how authorities ought to engage in it” is needed.

Barnett, Brandon, Muki Hansteen Izora, and Jose Sia. “Civic Hackathon Challenges Design Principles: Making Data Relevant and Useful for Individuals and Communities.” Hack for Change, https://bit.ly/2Ge6z09.

  • In this paper, researchers from Intel Labs offer “guiding principles to support the efforts of local civic hackathon organizers and participants as they seek to design actionable challenges and build useful solutions that will positively benefit their communities.”
  • The authors proposed design principles are:
    • Focus on the specific needs and concerns of people or institutions in the local community. Solve their problems and challenges by combining different kinds of data.
    • Seek out data far and wide (local, municipal, state, institutional, non-profits, companies) that is relevant to the concern or problem you are trying to solve.
    • Keep it simple! This can’t be overstated. Focus [on] making data easily understood and useful to those who will use your application or service.
    • Enable users to collaborate and form new communities and alliances around data.

Buhrmester, Michael, Tracy Kwang, and Samuel D. Gosling. “Amazon’s Mechanical Turk A New Source of Inexpensive, Yet High-Quality, Data?” Perspectives on Psychological Science 6, no. 1 (January 1, 2011): 3–5. http://bit.ly/H56lER.

  • This article examines the capability of Amazon’s Mechanical Turk to act a source of data for researchers, in addition to its traditional role as a microtasking platform.
  • The authors examine the demographics of MTurkers and find that “MTurk participants are slightly more demographically diverse than are standard Internet samples and are significantly more diverse than typical American college samples; (b) participation is affected by compensation rate and task length, but participants can still be recruited rapidly and inexpensively; (c) realistic compensation rates do not affect data quality; and (d) the data obtained are at least as reliable as those obtained via traditional methods.”
  • The paper concludes that, just as MTurk can be a strong tool for crowdsourcing tasks, data derived from MTurk can be high quality while also being inexpensive and obtained rapidly.

Goodchild, Michael F., and J. Alan Glennon. “Crowdsourcing Geographic Information for Disaster Response: a Research Frontier.” International Journal of Digital Earth 3, no. 3 (2010): 231–241. http://bit.ly/17MBFPs.

  • This article examines issues of data quality in the face of the new phenomenon of geographic information being generated by citizens, in order to examine whether this data can play a role in emergency management.
  • The authors argue that “[d]ata quality is a major concern, since volunteered information is asserted and carries none of the assurances that lead to trust in officially created data.”
  • Due to the fact that time is crucial during emergencies, the authors argue that, “the risks associated with volunteered information are often outweighed by the benefits of its use.”
  • The paper examines four wildfires in Santa Barbara in 2007-2009 to discuss current challenges with volunteered geographical data, and concludes that further research is required to answer how volunteer citizens can be used to provide effective assistance to emergency managers and responders.

Hudson-Smith, Andrew, Michael Batty, Andrew Crooks, and Richard Milton. “Mapping for the Masses Accessing Web 2.0 Through Crowdsourcing.” Social Science Computer Review 27, no. 4 (November 1, 2009): 524–538. http://bit.ly/1c1eFQb.

  • This article describes the way in which “we are harnessing the power of web 2.0 technologies to create new approaches to collecting, mapping, and sharing geocoded data.”
  • The authors examine GMapCreator and MapTube, which allow users to do a range of map-related functions such as create new maps, archive existing maps, and share or produce bottom-up maps through crowdsourcing.
  • They conclude that “these tools are helping to define a neogeography that is essentially ‘mapping for the masses,’ while noting that there are many issues of quality, accuracy, copyright, and trust that will influence the impact of these tools on map-based communication.”

Kanhere, Salil S. “Participatory Sensing: Crowdsourcing Data from Mobile Smartphones in Urban Spaces.” In Distributed Computing and Internet Technology, edited by Chittaranjan Hota and Pradip K. Srimani, 19–26. Lecture Notes in Computer Science 7753. Springer Berlin Heidelberg. 2013. https://bit.ly/2zX8Szj.

  • This paper provides a comprehensive overview of participatory sensing — a “new paradigm for monitoring the urban landscape” in which “ordinary citizens can collect multi-modal data streams from the surrounding environment using their mobile devices and share the same using existing communications infrastructure.”
  • In addition to examining a number of innovative applications of participatory sensing, Kanhere outlines the following key research challenges:
    • Dealing with incomplete samples
    •  Inferring user context
    • Protecting user privacy
    • Evaluating data trustworthiness
    • Conserving energy

We must create a culture of “open data makers”


Rufus Pollock (@rufuspollock), Founder and Director of the Open Knowledge Foundation: “Open data and open knowledge are fundamentally about empowerment, about giving people – citizens, journalists, NGOs, companies and policy-makers – access to the information they need to understand and shape the world around them.

Through openness, we can ensure that technology and data improve science, governance, and society. Without it, we may see the increasing centralisation of knowledge – and therefore power – in the hands of the few, and a huge loss in our potential, individually and collectively, to innovate, understand, and improve the world around us.

Open data is data that can be freely accessed, used, built upon and shared by anyone, for any purpose. With digital technology – from mobiles to the internet – increasingly everywhere, we’re seeing a data revolution. Its a revolution both in the amount of data available and in our ability to use, and share, that data. And it’s changing everything we do – from how we travel home from work to how scientists do research, to how government set policy….

its about people, the people who use data, and the people who use the insights from that data to drive change. We need to create a culture of “open data makers”, people able and ready to make apps and insights with open data. We need to connect open data with those who have the best questions and the biggest needs – a healthcare worker in Zambia, the London commuter travelling home – and go beyond the data geeks and the tech savvy.”

Britain’s Ministry of Nudges


in the New York Times: “A 24-year-old psychologist working for the British government, Mr. Gyani was supposed to come up with new ways to help people find work. He was intrigued by an obscure 1994 study that tracked a group of unemployed engineers in Texas. One group of engineers, who wrote about how it felt to lose their jobs, were twice as likely to find work as the ones who didn’t. Mr. Gyani took the study to a job center in Essex, northeast of London, where he was assigned for several months. Sure, it seemed crazy, but would it hurt to give it a shot? Hayley Carney, one of the center’s managers, was willing to try.

Ms. Carney walked up to a man slumped in a plastic chair in the waiting area as Mr. Gyani watched from across the room. The man — 28, recently separated and unemployed for most of his adult life — was “our most difficult case,” Ms. Carney said later.

“How would you like to write about your feelings” about being out of a job? she asked the man. Write for 20 minutes. Once a week. Whatever pops into your head.

An awkward silence followed. Maybe this was a bad idea, Mr. Gyani remembers thinking.

But then the man shrugged. Why not? And so, every week, after seeing a job adviser, he would stay and write. He wrote about applying for dozens of jobs and rarely hearing back, about not having anything to get up for in the morning, about his wife who had left him. He would reread what he had written the week before, and then write again.

Over several weeks, his words became less jumbled. He started to gain confidence, and his job adviser noticed the change. Before the month was out, he got a full-time job in construction — his first.

An Idea Born in America

Did the writing exercise help the man find a job? Even now it’s hard for Mr. Gyani to say for sure. But it was the start of a successful research trial at the Essex job center — one that is part of a much larger social experiment underway in Britain. A small band of psychologists and economists is quietly working to transform the nation’s policy making. Inspired by behavioral science, the group fans out across the country to job centers, schools and local government offices and tweaks bureaucratic processes to better suit human nature. The goal is to see if small interventions that don’t cost much can change behavior in large ways that serve both individuals and society.

It is an American idea, refined in American universities and popularized in 2008 with the best seller “Nudge,” by Richard H. Thaler and Cass R. Sunstein. Professor Thaler, a contributor to the Economic View column in Sunday Business, is an economist at the University of Chicago, and Mr. Sunstein was a senior regulatory official in the Obama administration, where he applied behavioral findings to a range of regulatory policies, but didn’t have the mandate or resources to run experiments.

But it is in Britain that such experiments have taken root.  Prime Minister David Cameron has embraced the idea of testing the power of behavioral change to devise effective policies, seeing it not just as a way to help people make better decisions, but also to help government do more for less.

In 2010, Mr. Cameron set up the Behavioral Insights Team or nudge unit, as it’s often called. Three years later, the team has doubled in size and is about to announce a joint venture with an external partner to expand the program.

The unit has been nudging people to pay taxes on time, insulate their attics, sign up for organ donation, stop smoking during pregnancy and give to charity — and has saved taxpayers tens of millions of pounds in the process, said David Halpern, its director. Every civil servant in Britain is now being trained in behavioral science. The nudge unit has a waiting list of government departments eager to work with it, and other countries, from Denmark to Australia, have expressed interest.

In fact, five years after it arrived in Washington, nudging appears to be entering the next stage, with a new team in the White House planning to run policy trials inspired in part by Britain’s program. “First the idea traveled to Britain and now the lessons are traveling back,” said Professor Thaler, who is an official but unpaid adviser to the nudge unit. “Britain is the first country that has mainstreamed this on a national level.”

See also: A Few Findings of Britain’s Nudge