Visual Insights: A Practical Guide to Making Sense of Data


New book by Katy Börner and David E. Polley: “In the age of Big Data, the tools of information visualization offer us a macroscope to help us make sense of the avalanche of data available on every subject. This book offers a gentle introduction to the design of insightful information visualizations. It is the only book on the subject that teaches nonprogrammers how to use open code and open data to design insightful visualizations. Readers will learn to apply advanced data mining and visualization techniques to make sense of temporal, geospatial, topical, and network data.

The book, developed for use in an information visualization MOOC, covers data analysis algorithms that enable extraction of patterns and trends in data, with chapters devoted to “when” (temporal data), “where” (geospatial data), “what” (topical data), and “with whom” (networks and trees); and to systems that drive research and development. Examples of projects undertaken for clients include an interactive visualization of the success of game player activity in World of Warcraft; a visualization of 311 number adoption that shows the diffusion of non-emergency calls in the United States; a return on investment study for two decades of HIV/AIDS research funding by NIAID; and a map showing the impact of the HiveNYC Learning Network.
Visual Insights will be an essential resource on basic information visualization techniques for scholars in many fields, students, designers, or anyone who works with data.”

Check out also the Information Visualization MOOC at http://ivmooc.cns.iu.edu/
 

The Impact of the Social Sciences


New book:  The Impact of the Social Sciences: How Academics and Their Research Make a Difference by Simon Bastow, Jane Tinkler and Patrick Dunleavy. 
The three-year Impact of Social Sciences Project has culminated in a monograph published by SAGE. The book presents thorough analysis of how academic research in the social sciences achieves public policy impacts, contributes to economic prosperity, and informs public understanding of policy issues as well as economic and social changes. This book is essential reading for academics, researchers, university administrators, government and private funders, and anyone interested in the global conversation about joining up and increasing the societal value and impact of social science knowledge and research.
figure214Resources:

  • View the data visualisations that appear in the book here.
  • Browse our Living Bibliography with links to further resources.
  • Research Design and Methods Appendix [PDF]
  • “Assessing the Impacts of Academic Social Science Research: Modelling the economic impact on the UK economy of UK-based academic social science research” [PDF] A report prepared for the LSE Public Policy Group by Cambridge Econometrics.

Research Blogs:

The Moneyball Effect: How smart data is transforming criminal justice, healthcare, music, and even government spending


TED: “When Anne Milgram became the Attorney General of New Jersey in 2007, she was stunned to find out just how little data was available on who was being arrested, who was being charged, who was serving time in jails and prisons, and who was being released. It turns out that most big criminal justice agencies like my own didn’t track the things that matter,” she says in today’s talk, filmed at TED@BCG. “We didn’t share data, or use analytics, to make better decisions and reduce crime.”
Milgram’s idea for how to change this: “I wanted to moneyball criminal justice.”
Moneyball, of course, is the name of a 2011 movie starring Brad Pitt and the book it’s based on, written by Michael Lewis in 2003. The term refers to a practice adopted by the Oakland A’s general manager Billy Beane in 2002 — the organization began basing decisions not on star power or scout instinct, but on statistical analysis of measurable factors like on-base and slugging percentages. This worked exceptionally well. On a tiny budget, the Oakland A’s made it to the playoffs in 2002 and 2003, and — since then — nine other major league teams have hired sabermetric analysts to crunch these types of numbers.
Milgram is working hard to bring smart statistics to criminal justice. To hear the results she’s seen so far, watch this talk. And below, take a look at a few surprising sectors that are getting the moneyball treatment as well.

Moneyballing music. Last year, Forbes magazine profiled the firm Next Big Sound, a company using statistical analysis to predict how musicians will perform in the market. The idea is that — rather than relying on the instincts of A&R reps — past performance on Pandora, Spotify, Facebook, etc can be used to predict future potential. The article reads, “For example, the company has found that musicians who gain 20,000 to 50,000 Facebook fans in one month are four times more likely to eventually reach 1 million. With data like that, Next Big Sound promises to predict album sales within 20% accuracy for 85% of artists, giving labels a clearer idea of return on investment.”
Moneyballing human resources. In November, The Atlantic took a look at the practice of “people analytics” and how it’s affecting employers. (Billy Beane had something to do with this idea — in 2012, he gave a presentation at the TLNT Transform Conference called “The Moneyball Approach to Talent Management.”) The article describes how Bloomberg reportedly logs its employees’ keystrokes and the casino, Harrah’s, tracks employee smiles. It also describes where this trend could be going — for example, how a video game called Wasabi Waiter could be used by employers to judge potential employees’ ability to take action, solve problems and follow through on projects. The article looks at the ways these types of practices are disconcerting, but also how they could level an inherently unequal playing field. After all, the article points out that gender, race, age and even height biases have been demonstrated again and again in our current hiring landscape.
Moneyballing healthcare. Many have wondered: what about a moneyball approach to medicine? (See this call out via Common Health, this piece in Wharton Magazine or this op-ed on The Huffington Post from the President of the New York State Health Foundation.) In his TED Talk, “What doctors can learn from each other,” Stefan Larsson proposed an idea that feels like something of an answer to this question. In the talk, Larsson gives a taste of what can happen when doctors and hospitals measure their outcomes and share this data with each other: they are able to see which techniques are proving the most effective for patients and make adjustments. (Watch the talk for a simple way surgeons can make hip surgery more effective.) He imagines a continuous learning process for doctors — that could transform the healthcare industry to give better outcomes while also reducing cost.
Moneyballing government. This summer, John Bridgeland (the director of the White House Domestic Policy Council under President George W. Bush) and Peter Orszag (the director of the Office of Management and Budget in Barack Obama’s first term) teamed up to pen a provocative piece for The Atlantic called, “Can government play moneyball?” In it, the two write, “Based on our rough calculations, less than $1 out of every $100 of government spending is backed by even the most basic evidence that the money is being spent wisely.” The two explain how, for example, there are 339 federally-funded programs for at-risk youth, the grand majority of which haven’t been evaluated for effectiveness. And while many of these programs might show great results, some that have been evaluated show troubling results. (For example, Scared Straight has been shown to increase criminal behavior.) Yet, some of these ineffective programs continue because a powerful politician champions them. While Bridgeland and Orszag show why Washington is so averse to making data-based appropriation decisions, the two also see the ship beginning to turn around. They applaud the Obama administration for a 2014 budget with an “unprecendented focus on evidence and results.” The pair also gave a nod to the nonprofit Results for America, which advocates that for every $99 spent on a program, $1 be spent on evaluating it. The pair even suggest a “Moneyball Index” to encourage politicians not to support programs that don’t show results.
In any industry, figuring out what to measure, how to measure it and how to apply the information gleaned from those measurements is a challenge. Which of the applications of statistical analysis has you the most excited? And which has you the most terrified?”

Belonging: Solidarity and Division in Modern Societies


New book by Montserrat Guibernau: “It is commonly assumed that we live in an age of unbridled individualism, but in this important new book Montserrat Guibernau argues that the need to belong to a group or community – from peer groups and local communities to ethnic groups and nations – is a pervasive and enduring feature of modern social life.
The power of belonging stems from the potential to generate an emotional attachment capable of fostering a shared identity, loyalty and solidarity among members of a given community. It is this strong emotional dimension that enables belonging to act as a trigger for political mobilization and, in extreme cases, to underpin collective violence.
Among the topics examined in this book are identity as a political instrument; emotions and political mobilization; the return of authoritarianism and the rise of the new radical right; symbols and the rituals of belonging; loyalty, the nation and nationalism. It includes case studies from Britain, Spain, Catalonia, Germany, the Middle East and the United States.”

Sharing and Caring


Tom Slee: “A new wave of technology companies claims to be expanding the possibilities of sharing and collaboration, and is clashing with established industries such as hospitality and transit. These companies make up what is being called the “sharing economy”: they provide web sites and applications through which individual residents or drivers can offer to “share” their apartment or car with a guest, for a price.
The industries they threaten have long been subject to city-level consumer protection and zoning regulations, but sharing economy advocates claim that these rules are rendered obsolete by the Internet. Battle lines are being drawn between the new companies and city governments. Where’s a good leftist to stand in all of this?
To figure this out, we need to look at the nature of the sharing economy. Some would say it fits squarely into an ideology of unregulated free markets, as described recently by David Golumbia here in Jacobin. Others note that the people involved in American technology industries lean liberal. There’s also a clear Euro/American split in the sharing economy: while the Americans are entrepreneurial and commercial in the way they drive the initiative, the Europeans focus more on the civic, the collaborative, and the non-commercial.
The sharing economy invokes values familiar to many on the Left: decentralization, sustainability, community-level connectedness, and opposition to hierarchical and rigid regulatory regimes, seen mostly clearly in the movement’s bible What’s Mine is Yours: The Rise of Collaborative Consumption by Rachel Botsman and Roo Rogers. It’s the language of co-operatives and of civic groups.
There’s a definite green slant to the movement, too: ideas of “sharing rather than owning” make appeals to sustainability, and the language of sharing also appeals to anti-consumerist sentiments popular on the Left: property and consumption do not make us happy, and we should put aside the pursuit of possessions in favour of connections and experiences. All of which leads us to ideas of community: the sharing economy invokes images of neighbourhoods, villages, and “human-scale” interactions. Instead of buying from a mega-store, we get to share with neighbours.
These ideals have been around for centuries, but the Internet has given them a new slant. An influential line of thought emphasizes that the web lowers the “transaction costs” of group formation and collaboration. The key text is Yochai Benkler’s 2006 book The Wealth of Networks, which argues that the Internet brings with it an alternative style of economic production: networked rather than managed, self-organized rather than ordered. It’s a language associated strongly with both the Left (who see it as an alternative to monopoly capital), and the free-market libertarian right (who see it as an alternative to the state).
Clay Shirky’s 2008 book Here Comes Everybody popularized the ideas further, and in 2012 Steven Johnson announced the appearance of the “Peer Progressive” in his book Future Perfect. The idea of internet-enabled collaboration in the “real” world is a next step from online collaboration in the form of open source software, open government data, and Wikipedia, and the sharing economy is its manifestation.
As with all things technological, there’s an additional angle: the involvement of capital…”

Predictive Modeling With Big Data: Is Bigger Really Better?


New Paper by Junqué de Fortuny, Enric, Martens, David, and Provost, Foster in Big Data :“With the increasingly widespread collection and processing of “big data,” there is natural interest in using these data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics. An important, open question is: to what extent do larger data actually lead to better predictive models? In this article we empirically demonstrate that when predictive models are built from sparse, fine-grained data—such as data on low-level human behavior—we continue to see marginal increases in predictive performance even to very large scale. The empirical results are based on data drawn from nine different predictive modeling applications, from book reviews to banking transactions. This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets—plus the skill to take advantage of them—potentially can obtain substantial competitive advantage over institutions without such access or skill. Moreover, the results suggest that it is worthwhile for companies with access to such fine-grained data, in the context of a key predictive task, to gather both more data instances and more possible data features. As an additional contribution, we introduce an implementation of the multivariate Bernoulli Naïve Bayes algorithm that can scale to massive, sparse data.”

How Government Can Make Open Data Work


Joel Gurin in Information Week: “At the GovLab at New York University, where I am senior adviser, we’re taking a different approach than McKinsey’s to understand the evolving value of government open data: We’re studying open data companies from the ground up. I’m now leading the GovLab’s Open Data 500 project, funded by the John S. and James L. Knight Foundation, to identify and examine 500 American companies that use government open data as a key business resource.
Our preliminary results show that government open data is fueling companies both large and small, across the country, and in many sectors of the economy, including health, finance, education, energy, and more. But it’s not always easy to use this resource. Companies that use government open data tell us it is often incomplete, inaccurate, or trapped in hard-to-use systems and formats.
It will take a thorough and extended effort to make government data truly useful. Based on what we are hearing and the research I did for my book, here are some of the most important steps the federal government can take, starting now, to make it easier for companies to add economic value to the government’s data.
1. Improve data quality
The Open Data Policy not only directs federal agencies to release more open data; it also requires them to release information about data quality. Agencies will have to begin improving the quality of their data simply to avoid public embarrassment. We can hope and expect that they will do some data cleanup themselves, demand better data from the businesses they regulate, or use creative solutions like turning to crowdsourcing for help, as USAID did to improve geospatial data on its grantees.
 
 

2. Keep improving open data resources
The government has steadily made Data.gov, the central repository of federal open data, more accessible and useful, including a significant relaunch last week. To the agency’s credit, the GSA, which administers Data.gov, plans to keep working to make this key website still better. As part of implementing the Open Data Policy, the administration has also set up Project Open Data on GitHub, the world’s largest community for open-source software. These resources will be helpful for anyone working with open data either inside or outside of government. They need to be maintained and continually improved.
3. Pass DATA
The Digital Accountability and Transparency Act would bring transparency to federal government spending at an unprecedented level of detail. The Act has strong bipartisan support. It passed the House with only one dissenting vote and was unanimously approved by a Senate committee, but still needs full Senate approval and the President’s signature to become law. DATA is also supported by technology companies who see it as a source of new open data they can use in their businesses. Congress should move forward and pass DATA as the logical next step in the work that the Obama administration’s Open Data Policy has begun.
4. Reform the Freedom of Information Act
Since it was passed in 1966, the federal Freedom of Information Act has gone through two major revisions, both of which strengthened citizens’ ability to access many kinds of government data. It’s time for another step forward. Current legislative proposals would establish a centralized web portal for all federal FOIA requests, strengthen the FOIA ombudsman’s office, and require agencies to post more high-interest information online before they receive formal requests for it. These changes could make more information from FOIA requests available as open data.
5. Engage stakeholders in a genuine way
Up to now, the government’s release of open data has largely been a one-way affair: Agencies publish datasets that they hope will be useful without consulting the organizations and companies that want to use it. Other countries, including the UK, France, and Mexico, are building in feedback loops from data users to government data providers, and the US should, too. The Open Data Policy calls for agencies to establish points of contact for public feedback. At the GovLab, we hope that the Open Data 500 will help move that process forward. Our research will provide a basis for new, productive dialogue between government agencies and the businesses that rely on them.
6. Keep using federal challenges to encourage innovation
The federal Challenge.gov website applies the best principles of crowdsourcing and collective intelligence. Agencies should use this approach extensively, and should pose challenges using the government’s open data resources to solve business, social, or scientific problems. Other approaches to citizen engagement, including federally sponsored hackathons and the White House Champions of Change program, can play a similar role.
Through the Open Data Policy and other initiatives, the Obama administration has set the right goals. Now it’s time to implement and move toward what US CTO Todd Park calls “data liberation.” Thousands of companies, organizations, and individuals will benefit.”

Introduction to Computational Social Science: Principles and Applications


New book by Claudio Cioffi-Revilla: “This reader-friendly textbook is the first work of its kind to provide a unified Introduction to Computational Social Science (CSS). Four distinct methodological approaches are examined in detail, namely automated social information extraction, social network analysis, social complexity theory and social simulation modeling. The coverage of these approaches is supported by a discussion of the historical context, as well as by a list of texts for further reading. Features: highlights the main theories of the CSS paradigm as causal explanatory frameworks that shed new light on the nature of human and social dynamics; explains how to distinguish and analyze the different levels of analysis of social complexity using computational approaches; discusses a number of methodological tools; presents the main classes of entities, objects and relations common to the computational analysis of social complexity; examines the interdisciplinary integration of knowledge in the context of social phenomena.”

Social Media: A Critical Introduction


New book: “Now more than ever, we need to understand social media – the good as well as the bad. We need critical knowledge that helps us to navigate the controversies and contradictions of this complex digital media landscape. Only then can we make informed judgements about what’s
happening in our media world, and why.
Showing the reader how to ask the right kinds of questions about social media, Christian Fuchs takes us on a journey across social media,
delving deep into case studies on Google, Facebook, Twitter, WikiLeaks and Wikipedia. The result lays bare the structures and power relations
at the heart of our media landscape.
This book is the essential, critical guide for understanding social media and for all students of media studies and sociology. Readers will
never look at social media the same way again.
Sample chapter:
Twitter and Democracy: A New Public Sphere?
Introduction: What is a Critical Introduction to Social Media?

How should we analyse our lives?


Gillian Tett in the Financial Times on the challenge of using the new form of data science: “A few years ago, Alex “Sandy” Pentland, a professor of computational social sciences at MIT Media Lab, conducted a curious experiment at a Bank of America call centre in Rhode Island. He fitted 80 employees with biometric devices to track all their movements, physical conversations and email interactions for six weeks, and then used a computer to analyse “some 10 gigabytes of behaviour data”, as he recalls.
The results showed that the workers were isolated from each other, partly because at this call centre, like others of its ilk, the staff took their breaks in rotation so that the phones were constantly manned. In response, Bank of America decided to change its system to enable staff to hang out together over coffee and swap ideas in an unstructured way. Almost immediately there was a dramatic improvement in performance. “The average call-handle time decreased sharply, which means that the employees were much more productive,” Pentland writes in his forthcoming book Social Physics. “[So] the call centre management staff converted the break structure of all their call centres to this new system and forecast a $15m per year productivity increase.”
When I first heard Pentland relate this tale, I was tempted to give a loud cheer on behalf of all long-suffering call centre staff and corporate drones. Pentland’s data essentially give credibility to a point that many people know instinctively: that it is horribly dispiriting – and unproductive – to have to toil in a tiny isolated cubicle by yourself all day. Bank of America deserves credit both for letting Pentland’s team engage in this people-watching – and for changing its coffee-break schedule in response.
But there is a bigger issue at stake here too: namely how academics such as Pentland analyse our lives. We have known for centuries that cultural and social dynamics influence how we behave but until now academics could usually only measure this by looking at micro-level data, which were often subjective. Anthropology (a discipline I know well) is a case in point: anthropologists typically study cultures by painstakingly observing small groups of people and then extrapolating this in a subjective manner.

Pentland and others like him are now convinced that the great academic divide between “hard” and “soft” sciences is set to disappear, since researchers these days can gather massive volumes of data about human behaviour with precision. Sometimes this information is volunteered by individuals, on sites such as Facebook; sometimes it can be gathered from the electronic traces – the “digital breadcrumbs” – that we all deposit (when we use a mobile phone, say) or deliberately collected with biometric devices like the ones used at Bank of America. Either way, it can enable academics to monitor and forecast social interaction in a manner we could never have dreamed of before. “Social physics helps us understand how ideas flow from person to person . . . and ends up shaping the norms, productivity and creative output of our companies, cities and societies,” writes Pentland. “Just as the goal of traditional physics is to understand how the flow of energy translates into change in motion, social physics seems to understand how the flow of ideas and information translates into changes in behaviour….

But perhaps the most important point is this: whether you love or hate this new form of data science, the genie cannot be put back in the bottle. The experiments that Pentland and many others are conducting at call centres, offices and other institutions across America are simply the leading edge of a trend.

The only question now is whether these powerful new tools will be mostly used for good (to predict traffic queues or flu epidemics) or for more malevolent ends (to enable companies to flog needless goods, say, or for government control). Sadly, “social physics” and data crunching don’t offer any prediction on this issue, even though it is one of the dominant questions of our age.”