The Field Guide to Data Science


Booz Allen Hamilton: “Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Booz Allen Hamilton created The Field Guide to Data Science to help organizations of all types and missions understand how to make use of data as a resource. The text spells out what Data Science is and why it matters to organizations as well as how to create Data Science teams. Along the way, our team of experts provides field-tested approaches, personal tips and tricks, and real-life case studies. Senior leaders will walk away with a deeper understanding of the concepts at the heart of Data Science. Practitioners will add to their toolboxes.
In The Field Guide to Data Science, our Booz Allen experts provide their insights in the following areas:

  • Start Here for the Basics provides an introduction to Data Science, including what makes Data Science unique from other analysis approaches. We will help you understand Data Science maturity within an organization and how to create a robust Data Science capability.
  • Take Off the Training Wheels is the practitioners guide to Data Science. We share our established processes, including our approach to decomposing complex Data Science problems, the Fractal Analytic Model. We conclude with the Guide to Analytic Selection to help you select the right analytic techniques to conquer your toughest challenges.
  • Life in the Trenches gives a first hand account of life as a Data Scientist. We share insights on a variety of Data Science topics through illustrative case studies. We provide tips and tricks from our own experiences on these real-life analytic challenges.
  • Putting it All Together highlights our successes creating Data Science solutions for our clients. It follows several projects from data to insights and see the impact Data Science can have on your organization…”

The Emerging Science of Computational Anthropology


Emerging Technology From the arXiv: The increasing availability of big data from mobile phones and location-based apps has triggered a revolution in the understanding of human mobility patterns. This data shows the ebb and flow of the daily commute in and out of cities, the pattern of travel around the world and even how disease can spread through cities via their transport systems.
So there is considerable interest in looking more closely at human mobility patterns to see just how well it can be predicted and how these predictions might be used in everything from disease control and city planning to traffic forecasting and location-based advertising.
Today we get an insight into the kind of detailed that is possible thanks to the work of Zimo Yang at Microsoft research in Beijing and a few pals. These guys start with the hypothesis that people who live in a city have a pattern of mobility that is significantly different from those who are merely visiting. By dividing travelers into locals and non-locals, their ability to predict where people are likely to visit dramatically improves.
Zimo and co begin with data from a Chinese location-based social network called Jiepang.com. This is similar to Foursquare in the US. It allows users to record the places they visit and to connect with friends at these locations and to find others with similar interests.
The data points are known as check-ins and the team downloaded more than 1.3 million of them from five big cities in China: Beijing, Shanghai, Nanjing, Chengdu and Hong Kong. They then used 90 per cent of the data to train their algorithms and the remaining 10 per cent to test it. The Jiapang data includes the users’ hometowns so it’s easy to see whether an individual is checking in in their own city or somewhere else.
The question that Zimo and co want to answer is the following: given a particular user and their current location, where are they most likely to visit in the near future? In practice, that means analysing the user’s data, such as their hometown and the locations recently visited, and coming up with a list of other locations that they are likely to visit based on the type of people who visited these locations in the past.
Zimo and co used their training dataset to learn the mobility pattern of locals and non-locals and the popularity of the locations they visited. The team then applied this to the test dataset to see whether their algorithm was able to predict where locals and non-locals were likely to visit.
They found that their best results came from analysing the pattern of behaviour of a particular individual and estimating the extent to which this person behaves like a local. That produced a weighting called the indigenization coefficient that the researchers could then use to determine the mobility patterns this person was likely to follow in future.
In fact, Zimo and co say they can spot non-locals in this way without even knowing their home location. “Because non-natives tend to visit popular locations, like the Imperial Palace in Beijing and the Bund in Shanghai, while natives usually check in around their homes and workplaces,” they add.
The team say this approach considerably outperforms the mixed algorithms that use only individual visiting history and location popularity. “To our surprise, a hybrid algorithm weighted by the indigenization coefficients outperforms the mixed algorithm accounting for additional demographical information.”
It’s easy to imagine how such an algorithm might be useful for businesses who want to target certain types of travelers or local people. But there is a more interesting application too.
Zimo and co say that it is possible to monitor the way an individual’s mobility patterns change over time. So if a person moves to a new city, it should be possible to see how long it takes them to settle in.
One way of measuring this is in their mobility patterns: whether they are more like those of a local or a non-local. “We may be able to estimate whether a non-native person will behave like a native person after a time period and if so, how long in average a person takes to become a native-like one,” say Zimo and co.
That could have a fascinating impact on the way anthropologists study migration and the way immigrants become part of a local community. This is computational anthropology a science that is clearly in its early stages but one that has huge potential for the future.”
Ref: arxiv.org/abs/1405.7769 : Indigenization of Urban Mobility

Big Data, new epistemologies and paradigm shifts


Paper by Rob Kitchin in the Journal “Big Data and Society”: This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology”

OSTP’s Own Open Government Plan


Nick Sinai and Corinna Zarek: “The White House Office of Science and Technology Policy (OSTP) today released its 2014 Open Government Plan. The OSTP plan highlights three flagship efforts as well as the team’s ongoing work to embed the open government principles of transparency, participation, and collaboration into its activities.
OSTP advises the President on the effects of science and technology on domestic and international affairs. The work of the office includes policy efforts encompassing science, environment, energy, national security, technology, and innovation. This plan builds off of the 2010 and 2012 Open Government Plans, updating progress on past initiatives and adding new subject areas based on 2014 guidance.
Agencies began releasing biennial Open Government Plans in 2010, with direction from the 2009 Open Government Directive. These plans serve as a roadmap for agency openness efforts, explaining existing practices and announcing new endeavors to be completed over the coming two years. Agencies build these plans in consultation with civil society stakeholders and the general public. Open government is a vital component of the President’s Management Agenda and our overall effort to ensure the government is expanding economic growth and opportunity for all Americans.
OSTP’s 2014 flagship efforts include:

  • Access to Scientific Collections: OSTP is leading agencies in developing policies that will improve the management of and access to scientific collections that agencies own or support. Scientific collections are assemblies of physical objects that are valuable for research and education—including drilling cores from the ocean floor and glaciers, seeds, space rocks, cells, mineral samples, fossils, and more. Agency policies will help make scientific collections and information about scientific collections more transparent and accessible in the coming years.
  • We the Geeks: We the Geeks Google+ Hangouts feature informal conversations with experts to highlight the future of science, technology, and innovation in the United States. Participants can join the conversation on Twitter by using the hashtag #WeTheGeeks and asking questions of the presenters throughout the hangout.
  • “All Hands on Deck” on STEM Education: OSTP is helping lead President Obama’s commitment to an “all-hands-on-deck approach” to providing students with skills they need to excel in science, technology, engineering, and math (STEM). In support of this goal, OSTP is bringing together government, industry, non-profits, philanthropy, and others to expand STEM education engagement and awareness through events like the annual White House Science Fair and the upcoming White House Maker Faire.

OSTP looks forward to implementing the 2014 Open Government Plan over the coming two years to continue building on its strong tradition of transparency, participation, and collaboration—with and for the American people.”

Cataloging the World


New book on “Paul Otlet and the Birth of the Information Age”: “The dream of capturing and organizing knowledge is as old as history. From the archives of ancient Sumeria and the Library of Alexandria to the Library of Congress and Wikipedia, humanity has wrestled with the problem of harnessing its intellectual output. The timeless quest for wisdom has been as much about information storage and retrieval as creative genius.
In Cataloging the World, Alex Wright introduces us to a figure who stands out in the long line of thinkers and idealists who devoted themselves to the task. Beginning in the late nineteenth century, Paul Otlet, a librarian by training, worked at expanding the potential of the catalog card, the world’s first information chip. From there followed universal libraries and museums, connecting his native Belgium to the world by means of a vast intellectual enterprise that attempted to organize and code everything ever published. Forty years before the first personal computer and fifty years before the first browser, Otlet envisioned a network of “electric telescopes” that would allow people everywhere to search through books, newspapers, photographs, and recordings, all linked together in what he termed, in 1934, a réseau mondial–essentially, a worldwide web.
Otlet’s life achievement was the construction of the Mundaneum–a mechanical collective brain that would house and disseminate everything ever committed to paper. Filled with analog machines such as telegraphs and sorters, the Mundaneum–what some have called a “Steampunk version of hypertext”–was the embodiment of Otlet’s ambitions. It was also short-lived. By the time the Nazis, who were pilfering libraries across Europe to collect information they thought useful, carted away Otlet’s collection in 1940, the dream had ended. Broken, Otlet died in 1944.
Wright’s engaging intellectual history gives Otlet his due, restoring him to his proper place in the long continuum of visionaries and pioneers who have struggled to classify knowledge, from H.G. Wells and Melvil Dewey to Vannevar Bush, Ted Nelson, Tim Berners-Lee, and Steve Jobs. Wright shows that in the years since Otlet’s death the world has witnessed the emergence of a global network that has proved him right about the possibilities–and the perils–of networked information, and his legacy persists in our digital world today, captured for all time…”

CollaborativeScience.org: Sustaining Ecological Communities Through Citizen Science and Online Collaboration


David Mellor at CommonsLab: “In any endeavor, there can be a tradeoff between intimacy and impact. The same is true for science in general and citizen science in particular. Large projects with thousands of collaborators can have incredible impact and robust, global implications. On the other hand, locally based projects can foster close-knit ties that encourage collaboration and learning, but face an uphill battle when it comes to creating rigorous and broadly relevant investigations. Online collaboration has the potential to harness the strengths of both of these strategies if a space can be created that allows for the easy sharing of complex ideas and conservation strategies.
CollaborativeScience.org was created by researchers from five different universities to train Master Naturalists in ecology, scientific modeling and adaptive management, and then give these capable volunteers a space to put their training to work and create conservation plans in collaboration with researchers and land managers.
We are focusing on scientific modeling throughout this process because environmental managers and ecologists have been trained to intuitively create explanations based on a very large number of related observations. As new data are collected, these explanations are revised and are put to use in generating new, testable hypotheses. The modeling tools that we are providing to our volunteers allow them to formalize this scientific reasoning by adding information, sources and connections, then making predictions based on possible changes to the system. We integrate their projects into the well-established citizen science tools at CitSci.org and guide them through the creation of an adaptive management plan, a proven conservation project framework…”

Innovation And Inequality


Edited book on “Emerging Technologies in an Unequal World”: “Susan Cozzens, Dhanaraj Thakur, and the other co-authors ask how the benefits and costs of emerging technologies are distributed amongst different countries – some rich and some poor. Examining the case studies of five technologies across eight countries in Africa, Europe and the Americas, the book finds that the distributional dynamics around a given technology are influenced by the way entrepreneurs and others package the technology, how governments promote it and the existing local skills and capacity to use it. These factors create social and economic boundaries where the technology stops diffusing between and within countries. The book presents a series of recommendations for policy-makers and private sector actors to move emerging technologies beyond these boundaries and improve their distributional outcomes.
Offering a broad range of mature and relatively new emerging technologies from a diverse set of countries, the study will strongly appeal to policy-makers in science, technology and innovation policy. It will also benefit students and academics interested in innovation, science, technology and innovation policy, the economics of innovation, as well as the history and sociology of technology.
Full table of contents

The Weird, Wild World of Citizen Science Is Already Here


David Lang in Wired: “Up and down the west coast of North America, countless numbers of starfish are dying. The affliction, known as Sea Star Wasting Syndrome, is already being called the biggest die-off of sea stars in recorded history, and we’re still in the dark as to what’s causing it or what it means. It remains an unsolved scientific mystery. The situation is also shaping up as a case study of an unsung scientific opportunity: the rise of citizen science and exploration.
The sea star condition was first noticed by Laura James, a diver and underwater videographer based in Seattle. As they began washing up on the shore near her home with lesions and missing limbs, she became concerned and notified scientists. Similar sightings started cropping up all along the West Coast, with gruesome descriptions of sea stars that were disintegrating in a matter of days, and populations that had been decimated. As scientists race to understand what’s happening, they’ve enlisted the help of amateurs like James, to move faster. Pete Raimondi’s lab at UC Santa Cruz has created the Sea Star Wasting Map, the baseline for monitoring the issue, to capture the diverse set of contributors and collaborators.
The map is one of many new models of citizen-powered science–a blend of amateurs and professionals, looking and learning together–that are beginning to emerge. Just this week, NASA endorsed a group of amateur astronomers to attempt to rescue a vintage U.S. spacecraft. NASA doesn’t have the money to do it, and this passionate group of citizen scientists can handle it.
Unfortunately, the term “citizen science” is terrible. It’s vague enough to be confusing, yet specific enough to seem exclusive. It’s too bad, too, because the idea of citizen science is thrilling. I love the notion that I can participate in the expanding pool of human knowledge and understanding, even though the extent of my formal science education is a high school biology class. To me, it seemed a genuine invitation to be curious. A safe haven for beginners. A license to explore.
Not everyone shares my romantic perspective, though. If you ask a university researcher, they’re likely to explain citizen science as a way for the public to contribute data points to larger, professionally run studies, like participating in the galaxy-spotting website Zooniverse or taking part in the annual Christmas Bird Count with the Audubon Society. It’s a model on the scientific fringes; using broad participation to fill the gaps in necessary data.
There’s power in this diffuse definition, though, as long as new interpretations are welcomed and encouraged. By inviting and inspiring people to ask their own questions, citizen science can become much more than a way of measuring bird populations. From the drone-wielding conservationists in South Africa to the makeshift biolabs in Brooklyn, a widening circle of participants are wearing the amateur badge with honor. And all of these groups–the makers, the scientists, the hobbyists–are converging to create a new model for discovery. In other words, the maker movement and the traditional science world are on a collision course.
To understand the intersection, it helps to know where each of those groups is coming from….”

The Cultural Imaginary of the Internet


Book by Majid Yar on Virtual Utopias and Dystopias: “Contemporary culture offer contradictory views of the internet and new media technologies, painting them in extremes of optimistic enthusiasm and pessimistic foreboding. While some view them as a repository of hopes for democracy, freedom and self-realisation, others consider these developments as sources of alienation, dehumanisation and danger. This book explores such representations, and situates them within the traditions of utopian and dystopian thought that have shaped the Western cultural imaginary. Ranging from ancient poetry to post-humanism, and classical sociology to science fiction, it uncovers the roots of our cultural responses to the internet, which are centred upon a profoundly ambivalent reaction to technological modernity. Majid Yar argues that it is only by better understanding our society’s reactions to technological innovation that we can develop a balanced and considered response to the changes and challenges that the internet brings in its wake.”

Introducing the Data Visualization Checklist


Stephanie Evergreen: “This post has been a long time coming. Ann Emery and I knew some time ago that evaluators and social scientists had a thirst for better graphs, a clear understanding of why better graphs were necessary, but they lacked efficient guidance on how, exactly, to make a graph better. Introducing the Data Visualization Checklist.
DataVizChecklist
Download this checklist and refer to it when you are constructing your next data visualization so that what you produce rocks worlds. Use the checklist to gauge the effectiveness of graphs you’ve already made and adjust places where you don’t score full points. Make copies and slip them into your staff mailboxes.
What’s in the Checklist?
We compiled a set of best practices based on extensive research, tested against the practical day-to-day realities of evaluation practice and the pragmatic needs of our stakeholders. This guidance may not apply to other fields. In fact, we pilot-tested the checklist with a dozen data visualists and found that those who were not in a social science field found more areas of disagreement. That’s ok. Their dissemination purposes are different from ours. Their audiences are not our audiences. You, evaluator, will find clear guidelines on how to make the best use of a graph’s text, color, arrangement, and overall design. We also included a data visualization anatomy chart on the last page of the checklist to illustrate key concepts and point out terminology…”