Analyzing the Analyzers


catAn Introspective Survey of Data Scientists and Their Work,By Harlan Harris, Sean Murphy, Marck Vaisman: “There has been intense excitement in recent years around activities labeled “data science,” “big data,” and “analytics.” However, the lack of clarity around these terms and, particularly, around the skill sets and capabilities of their practitioners has led to inefficient communication between “data scientists” and the organizations requiring their services. This lack of clarity has frequently led to missed opportunities. To address this issue, we surveyed several hundred practitioners via the Web to explore the varieties of skills, experiences, and viewpoints in the emerging data science community.

We used dimensionality reduction techniques to divide potential data scientists into five categories based on their self-ranked skill sets (Statistics, Math/Operations Research, Business, Programming, and Machine Learning/Big Data), and four categories based on their self-identification (Data Researchers, Data Businesspeople, Data Engineers, and Data Creatives). Further examining the respondents based on their division into these categories provided additional insights into the types of professional activities, educational background, and even scale of data used by different types of Data Scientists.
In this report, we combine our results with insights and data from others to provide a better understanding of the diversity of practitioners, and to argue for the value of clearer communication around roles, teams, and careers.”

Visualizing 3 Billion Tweets


9080460045_cb6c84283e_bEric Gundersen from Mapbox: “This is a look at 3 billion tweets – every geotagged tweet since September 2011, mapped, showing facets of Twitter’s ecosystem and userbase in incredible new detail, revealing demographic, cultural, and social patterns down to city level detail, across the entire world. We were brought in by the data team at Gnip, who have awesome APIs and raw access to the Twitter firehose, and together Tom and data artist Eric Fischer used our open source tools to visualize the data and build interfaces that let you explore the stories of space, language, and access to technology.
This is big data, and there’s a significant level of geographic overlap between tweets, so Eric wrote an open-source tool that de-duplicated 2.7 billion overlapping datapoints, leaving 280 million unique locations…”
 

Visualizing the Stunning Growth of 8 Years of OpenStreetMap


new yorkEmily Badger in Atlantic Cities: “The U.S. OpenStreetMap community gathered in San Francisco over the weekend for its annual conference, the State of the Map. The loose citizen-cartography collective has now been incrementally mapping the world since 2004. While they were taking stock, it turns out the global open mapping effort has now mapped data on more than 78 million buildings and 21 million miles of road (if you wanted to drive all those roads at, say, 60 miles an hour, it would take you some 40 years to do it).
And more than a million people have chipped away at this in an impressively democratic manner: 83.6 percent of the changes in the whole database have been made by 99.9 percent of contributors.
These numbers come from the OpenStreetMap 2013 Data Report, which also contains, of course, more maps. The report, created by MapBox, includes a beautiful worldwide visualization of all the road updates made as OpenStreetMap has grown, with some of the earliest imports of data shown in green and blue, and more recent ones in white. You can navigate the full map here (scroll down), but we’ve grabbed a couple of snapshots for you as well.”

Data-Smart City Solutions


Press Release: “Today the Ash Center for Democratic Governance and Innovation at Harvard Kennedy School announced the launch of Data-Smart City Solutions, a new initiative aimed at using big data and analytics to transform the way local government operates. Bringing together leading industry, academic, and government officials, the initiative will offer city leaders a national depository of cases and best practice examples where cities and private partners use analytics to solve city problems. Data-Smart City Solutions is funded by Bloomberg Philanthropies and the John D. and Catherine T. MacArthur Foundation.

Data-Smart City Solutions highlights best practices, curates resources, and supports cities embarking on new data projects. The initiative’s website contains feature-length articles on how data drives innovation in different policy areas, profile pieces on municipal leaders at the forefront of implementing data analytics in their cities, and resources for interested officials to begin data projects in their own communities.
Recent articles include an assessment of Boston’s Adopt-a-Hydrant program as a potential harbinger of future city work promoting civic engagement and infrastructure maintenance, and a feature on how predictive technology is transforming police work. The site also spotlights municipal use of data such as San Francisco’s efforts to integrate data from different social service departments to better identify and serve at-risk youth. In addition to visiting the initiative’s website, Data-Smart City Solutions’ work is chronicled in their newsletter as well as on their Twitter page.”

The Use of Data Visualization in Government


Report by Genie Stowers for The IBM Center for The Business of Government: “The purpose of this report is to help public sector managers understand one of the more important areas of data analysis today—data visualization. Data visualizations are more sophisticated, fuller graphic designs than the traditional spreadsheet charts, usually with more than two variables and, typically, incorporating interactive features. Data are here to stay, growing exponentially, and data analysis is taking off, pushed forward as a result of the convergence of:
• New technologies
• Open data and big data movements
• The drive to more effectively engage citizens
• The creation and distribution of more and more data…
This report contains numerous examples of visualizations that include geographical and health data, or population and time data, or financial data represented in both absolute and relative terms—and each communicates more than simply the data that underpin it.In addition to these many examples of visualizations, the report discusses the history of this technique, and describes tools that can be used to create visualizations from many different kinds of data sets. Government managers can use these tools—including Many Eyes, Tableau, and HighCharts—to create their own visualizations from their agency’s data.
The report presents case studies on how visualization techniques are now being used by two local governments, one state government,and three federal government agencies. Each case study discusses the audience for visualization. Understanding audience is important, as government organizations provide useful visualizations to different audiences, including the media, political oversight organizations, constituents, and internal program teams.To assist in effectively communicating to these audiences, the report details attributes of meaningful visualizations: relevance,meaning, beauty, ease of use, legibility, truthfulness, accuracy,and consistency among them.”

Big Data Is Not Our Master. Humans create technology. Humans can control it.


Chris Hughes in New Republic: “We’ve known for a long time that big companies can stalk our every digital move and customize our every Web interaction. Our movements are tracked by credit cards, Gmail, and tollbooths, and we haven’t seemed to care all that much.
That is, until this week’s news of government eavesdropping, with the help of these very same big companies—Verizon, Facebook, and Google, among others. For the first time, America is waking up to the realities of what all this information—known in the business as “big data”—enables governments and corporations to do….
We are suddenly wondering, Can the rise of enormous data systems that enable this surveillance be stopped or controlled? Is it possible to turn back the clock?
Technologists see the rise of big data as the inevitable march of history, impossible to prevent or alter. Viktor Mayer-Schönberger and Kenneth Cukier’s recent book Big Data is emblematic of this argument: They say that we must cope with the consequences of these changes, but they never really consider the role we play in creating and supporting these technologies themselves….
But these well-meaning technological advocates have forgotten that as a society, we determine our own future and set our own standards, norms, and policy. Talking about technological advancements as if they are pre-ordained science erases the role of human autonomy and decision-making in inventing our own future. Big data is not a Leviathan that must be coped with, but a technological trend that we have made possible and support through social and political policy.”

Smart Citizen Kit enables crowdsourced environmental monitoring


Emma Hutchings at PSFK: “The Smart Citizen Kit is a crowdsourced environmental monitoring platform. By scattering devices around the world, the creators hope to build a global network of sensors that report local environmental conditions like CO and NO2 levels, light, noise, temperature and humidity.
Organized by the Fab Lab at the Institute for Advanced Architecture of Catalonia, a team of scientists, architects, and engineers are paving the way to humanize environmental monitoring. The open-source platform consists of arduino-compatible hardware, data visualization web API and a mobile app. Users are invited to take part in the interactive global environmental database, visualizing their data and comparing it with others around the world.”
Smart Citizen Kit Calls For Environmental Monitoring

Why Big Data Is Not Truth


Quentin Hardy in the New York Times: “Kate Crawford, a researcher at Microsoft Research, calls the problem “Big Data fundamentalism — the idea with larger data sets, we get closer to objective truth.” Speaking at a conference in Berkeley, Calif., on Thursday, she identified what she calls “six myths of Big Data.”
Myth 1: Big Data is New
In 1997, there was a paper that discussed the difficulty of visualizing Big Data, and in 1999, a paper that discussed the problems of gaining insight from the numbers in Big Data. That indicates that two prominent issues today in Big Data, display and insight, had been around for awhile…..
Myth 2: Big Data Is Objective
Over 20 million Twitter messages about Hurricane Sandy were posted last year. … “These were very privileged urban stories.” And some people, privileged or otherwise, put information like their home addresses on Twitter in an effort to seek aid. That sensitive information is still out there, even though the threat is gone.
Myth 3: Big Data Doesn’t Discriminate
“Big Data is neither color blind nor gender blind,” Ms. Crawford said. “We can see how it is used in marketing to segment people.” …
Myth 4: Big Data Makes Cities Smart
…, moving cities toward digital initiatives like predictive policing, or creating systems where people are seen, whether they like it or not, can promote lots of tension between individuals and their governments.
Myth 5: Big Data Is Anonymous
A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. …
Myth 6: You Can Opt Out
… given the ways that information can be obtained in these big systems, “what are the chances that your personal information will never be used?”
Before Big Data disappears into the background as another fact of life, Ms. Crawford said, “We need to think about how we will navigate these systems. Not just individually, but as a society.”

Techs and the City


Alec Appelbaum, who teaches at Pratt Institute in The New York Times: “THIS spring New York City is rolling out its much-ballyhooed bike-sharing program, which relies on a sophisticated set of smartphone apps and other digital tools to manage it. The city isn’t alone: across the country, municipalities are buying ever more complicated technological “solutions” for urban life.

But higher tech is not always essential tech. Cities could instead be making savvier investments in cheaper technology that may work better to stoke civic involvement than the more complicated, expensive products being peddled by information-technology developers….

To be sure, big tech can zap some city weaknesses. According to I.B.M., its predictive-analysis technology, which examines historical data to estimate the next crime hot spots, has helped Memphis lower its violent crime rate by 30 percent.

But many problems require a decidedly different approach. Take the seven-acre site in Lower Manhattan called the Seward Park Urban Renewal Area, where 1,000 mixed-income apartments are set to rise. A working-class neighborhood that fell to bulldozers in 1969, it stayed bare as co-ops nearby filled with affluent families, including my own.

In 2010, with the city ready to invite developers to bid for the site, long-simmering tensions between nearby public-housing tenants and wealthier dwellers like me turned suddenly — well, civil.

What changed? Was it some multimillion-dollar “open democracy” platform from Cisco, or a Big Data program to suss out the community’s real priorities? Nope. According to Dominic Pisciotta Berg, then the chairman of the local community board, it was plain old e-mail, and the dialogue it facilitated. “We simply set up an e-mail box dedicated to receiving e-mail comments” on the renewal project, and organizers would then “pull them together by comment type and then consolidate them for display during the meetings,” he said. “So those who couldn’t be there had their voices considered and those who were there could see them up on a screen and adopted, modified or rejected.”

Through e-mail conversations, neighbors articulated priorities — permanently affordable homes, a movie theater, protections for small merchants — that even a supercomputer wouldn’t necessarily have identified in the data.

The point is not that software is useless. But like anything else in a city, it’s only as useful as its ability to facilitate the messy clash of real human beings and their myriad interests and opinions. And often, it’s the simpler software, the technology that merely puts people in contact and steps out of the way, that works best.”

The Dictatorship of Data


Kenneth Cukier and Viktor Mayer-Schönberger in MIT Technology Review: “Big data is poised to transform society, from how we diagnose illness to how we educate children, even making it possible for a car to drive itself. Information is emerging as a new economic input, a vital resource. Companies, governments, and even individuals will be measuring and optimizing everything possible.
But there is a dark side. Big data erodes privacy. And when it is used to make predictions about what we are likely to do but haven’t yet done, it threatens freedom as well. Yet big data also exacerbates a very old problem: relying on the numbers when they are far more fallible than we think. Nothing underscores the consequences of data analysis gone awry more than the story of Robert McNamara.”