Learning to See Data


Benedict Carey in the New York Times: “FOR the past year or so genetic scientists at the Albert Einstein College of Medicine in New York have been collaborating with a specialist from another universe: Daniel Kohn, a Brooklyn-based painter and conceptual artist.

Mr. Kohn has no training in computers or genetics, and he’s not there to conduct art therapy classes. His role is to help the scientists with a signature 21st-century problem: Big Data overload.

Advanced computing produces waves of abstract digital data that in many cases defy interpretation; there’s no way to discern a meaningful pattern in any intuitive way. To extract some order from this chaos, analysts need to continually reimagine the ways in which they represent their data — which is where Mr. Kohn comes in. He spent 10 years working with scientists and knows how to pose useful questions. He might ask, for instance, What if the data were turned sideways? Or upside down? Or what if you could click on a point on the plotted data and see another dimension?….

And so it is in many fields, whether predicting climate, flagging potential terrorists or making economic forecasts. The information is all there, great expanding mountain ranges of it. What’s lacking is the tracker’s instinct for picking up a trail, the human gut feeling for where to start looking to find patterns and meaning. But can such creative instincts really be trained systematically? And even if they could, wouldn’t it take years to do so?

The answers are yes and no, at least when it comes to some advanced skills. And that should give analysts drowning in data some cause for optimism.

Scientists working in a little-known branch of psychology called perceptual learning have shown that it is possible to fast-forward a person’s gut instincts both in physical fields, like flying an airplane, and more academic ones, like deciphering advanced chemical notation. The idea is to train specific visual skills, usually with computer-game-like modules that require split-second decisions. Over time, a person develops a “good eye” for the material, and with it an ability to extract meaningful patterns instantaneously.

Perceptual learning is such an elementary skill that people forget they have it. It’s what we use as children to make distinctions between similar-looking letters, like U and V, long before we can read. It’s the skill needed to distinguish an A sharp from a B flat (both the notation and the note), or between friendly insurgents and hostiles in a fast-paced video game. By the time we move on to sentences and melodies and more cerebral gaming — “chunking” the information into larger blocks — we’ve forgotten how hard it was to learn all those subtle distinctions in the first place….(More)

Can Big Data Measure Livability in Cities?


PlaceILive: “Big data helps us measure and predict consumer behavior, hurricanes and even pregnancies. It has revolutionized the way we access and use information. That being said, so far big data has not been able to tackle bigger issues like urbanization or improve the livability of cities.

A new startup, www.placeilive.com thinks big data should and can be used to measure livability. They aggregated open data from government institutions and social media to create a tool that can calculate just that. ….PlaceILive wants to help people and governments better understand their cities, so that they can make smarter decisions. Cities can be more sustainable, while its users save money and time when they are choosing a new home.

Not everyone is eager to read long lists of raw data. Therefore they created appealing user-friendly maps that visualize the statistics. Offering the user fast and accessible information on the neighborhoods that matter to them.

Another cornerstone of PlaceILive is their Life Quality Index: an algorithm that takes aspects like transportation, safety, and affordability into account. Making it possible for people to easily compare the livability of different houses. You can read more on the methodology and sources here.

life quality index press release

In its beta form, the site features five cities—New York City, Chicago, San Francisco, London and Berlin. When you click on the New York portal, for instance, you can search for the place you want to know more about by borough, zip code, or address. Using New York as an example, it looks like this….(More)

Modeling Mobility with Open Data


Book edited by Behrisch, Michael, and Weber, Melanie: “This contributed volume contains the conference proceedings of the Simulation of Urban Mobility (SUMO) conference 2014, Berlin. The included research papers cover a wide range of topics in traffic planning and simulation, including open data, vehicular communication, e-mobility, urban mobility, multimodal traffic as well as usage approaches. The target audience primarily comprises researchers and experts in the field, but the book may also be beneficial for graduate students.

Eight ways to make government more experimental


Jonathan Breckon et al at NESTA: “When the banners and bunting have been tidied away after the May election, and a new bunch of ministers sit at their Whitehall desks, could they embrace a more experimental approach to government?

Such an approach requires a degree of humility.  Facing up to the fact that we don’t have all the answers for the next five years.  We need to test things out, evaluate new ways of doing things with the best of social science, and grow what works.  And drop policies that fail.

But how best to go about it?  Here are our 8 ways to make it a reality:

  1. Make failure OK. A more benign attitude to risk is central to experimentation.  As a 2003 Cabinet Office review entitled Trying it Out said, a pilot that reveals a policy to be flawed should be ‘viewed as a success rather than a failure, having potentially helped to avert a potentially larger political and/or financial embarrassment’. Pilots are particularly important in fast moving areas such as technology to try promising fresh ideas in real-time. Our ‘Visible Classroom’ pilot tried an innovative approach to teacher CPD developed from technology for television subtitling.
  2. Avoid making policies that are set in stone.  Allowing policy to be more project–based, flexible and time-limited could encourage room for manoeuvre, according to a previous Nesta report State of Uncertainty; Innovation policy through experimentation.  The Department for Work and Pensions’ Employment Retention and Advancement pilot scheme to help people back to work was designed to influence the shape of legislation. It allowed for amendments and learning as it was rolled out.  We need more policy experiments like this.
  3. Work with the grain of current policy environment. Experimenters need to be opportunists. We need to be nimble and flexible. Ready to seize windows of opportunity to  experiment. Some services have to be rolled out in stages due to budget constraints. This offers opportunities to try things out before going national. For instance, The Mexican Oportunidades anti-poverty experiments which eventually reached 5.8 million households in all Mexican states, had to be trialled first in a handful of areas. Greater devolution is creating a patchwork of different policy priorities, funding and delivery models – so-called ‘natural experiments’. Let’s seize the opportunity to deliberately test and compare across different jurisdictions. What about a trial of basic income in Northern Ireland, for example, along the lines of recent Finnish proposals, or universal free childcare in Scotland?
  4. Experiments need the most robust and appropriate evaluation methods such as, if appropriate, Randomised Controlled Trials. Other methods, such as qualitative research may be needed to pry open the ‘black box’ of policies – to learn about why and how things are working. Civil servants should use the government trial advice panel as a source of expertise when setting up experiments.
  5. Grow the public debate about the importance of experimentation. Facebook had to apologise after a global backlash to psychological experiments on their 689,000 users web-users. Approval by ethics committees – normal practice for trials in hospitals and universities – is essential, but we can’t just rely on experts. We need a dedicated public understanding of experimentation programmes, perhaps run by Evidence Matters or Ask for Evidence campaigns at Sense about Science. Taking part in an experiment in itself can be a learning opportunity creating  an appetite amongt the public, something we have found from running an RCT with schools.
  6. Create ‘Skunkworks’ institutions. New or improved institutional structures within government can also help with experimentation.   The Behavioural Insights Team, located in Nesta,  operates a classic ‘skunkworks’ model, semi-detached from day-to-day bureaucracy. The nine UK What Works Centres help try things out semi-detached from central power, such as the The Education Endowment Foundation who source innovations widely from across the public and private sectors- including Nesta-  rather than generating ideas exclusively in house or in government.
  7. Find low-cost ways to experiment. People sometimes worry that trials are expensive and complicated.  This does not have to be the case. Experiments to encourage organ donation by the Government Digital Service and Behavioural Insights Team involved an estimated cost of £20,000.  This was because the digital experiments didn’t involve setting up expensive new interventions – just changing messages on  web pages for existing services. Some programmes do, however, need significant funding to evaluate and budgets need to be found for it. A memo from the White House Office for Management and Budget has asked for new Government schemes seeking funding to allocate a proportion of their budgets to ‘randomized controlled trials or carefully designed quasi-experimental techniques’.
  8. Be bold. A criticism of some experiments is that they only deal with the margins of policy and delivery. Government officials and researchers should set up more ambitious experiments on nationally important big-ticket issues, from counter-terrorism to innovation in jobs and housing….(More)

New York Police to Use Social Media to Connect With Residents


Benjamin Mueller And Jeffrey E. Singer at the New York Times: “The New York Police Department has faced its share of pushback on social media, most memorably when it solicited photos of police interactions on Twitter under the hashtag #myNYPD. Images of aggression by officers upended that campaign.

Now, the department is seeking to turn New Yorkers’ penchant for online complaints to its gain by crowdsourcing their concerns. It has even consulted another sector troubled by social media gripes — the airline industry — to become more responsive to problems voiced online.

“They’re very good at managing customer complaints,” said Zachary Tumin, deputy commissioner for strategic initiatives and leader of the department’s social media efforts, who visited Delta Air Lines’ Atlanta headquarters this month. “That’s an area we need to explore.”

The department’s fleet of commanding officers has found its footing on Twitter in recent months, using the site to herald arrests, announce transportation delays and spread information about suspects. Now, the officers are planning to use that online visibility to draw ground-level information on crimes and conditions, a potential boost to a department seeking to align its “broken windows” crime-fighting objectives with local communities’ needs….

In a pilot program starting next month in the 109th Precinct in Queens, police officials will use a platform called IdeaScale to solicit tips and concerns from residents. The platform, which some government agencies have used internally as a brainstorming tool, promotes the posts that other users agree deserve attention.

In that way, officials argue, the police will be able to look beyond departmentwide priorities and focus on concerns that resonate in smaller communities….(More)”

Twitter for government: Indonesians get social media for public services


Medha Basu at FutureGov: “One of the largest users of social media in the world, Indonesians are taking it a step further with a new social network just for public services.

Enda Nasution and his team have built an app called Sebangsa, or Same Nation, featuring Facebook-like timelines (or Twitter-like feeds) for citizens to share about public services.

They want to introduce an idea they call “social government” in Indonesia, Nasution told FutureGov, going beyond e-government and open government to build a social relationship between the government and citizens….

It has two features that stand out. One called Sebangsa911 is for Indonesians to post emergencies, much like they might on Twitter or Facebook when they see an accident on the road or a crowd getting violent, for instance. Indonesia does not have any single national emergency number.

Another feature is called Sebangsa1800 which is a channel for people to post reviews, questions and complaints on public services and consumer products.

Why another social network?
But why build another social network when there are millions of users on Facebook and Twitter already? One reason is to provide a service that focuses on Indonesians, Nasution said – the app is in Bahasa.

Another is because existing social networks are not built specifically for public services. If you post a photo of an accident on Twitter, how many and how fast people see it depends on how many followers you have, Nasution said. These reports are also unstructured because they are “scattered all over Twitter”, he said. The app “introduces a little bit of structure to the reports”….(More)”

Why Google’s Waze Is Trading User Data With Local Governments


Parmy Olson at Forbes: “In Rio de Janeiro most eyes are on the final, nail-biting matches of the World Cup. Over in the command center of the city’s department of transport though, they’re on a different set of screens altogether.

Planners there are watching the aggregated data feeds of thousands of smartphones being walked or driven around a city, thanks to two popular travel apps, Waze and Moovit.

The goal is traffic management, and it involves swapping data for data. More cities are lining up to get access, and while the data the apps are sharing is all anonymous for now, identifying details could get more specific if cities like what they see, and people become more comfortable with being monitored through their smartphones in return for incentives.

Rio is the first city in the world to collect real-time data both from drivers who use the Waze navigation app and pedestrians who use the public-transportation app Moovit, giving it an unprecedented view on thousands of moving points across the sprawling city. Rio is also talking to the popular cycling app Strava to start monitoring how cyclists are moving around the city too.

All three apps are popular, consumer services which, in the last few months, have found a new way to make their crowdsourced data useful to someone other than advertisers. While consumers use Waze and Moovit to get around, both companies are flipping the use case and turning those millions of users into a network of sensors that municipalities can tap into for a better view on traffic and hazards. Local governments can also use these apps as a channel to send alerts.

On an average day in June, Rio’s transport planners could get an aggregated view of 110,000 drivers (half a million over the course of the month), and see nearly 60,000 incidents being reported each day – everything from built-up traffic, to hazards on the road, Waze says. Till now they’ve been relying on road cameras and other basic transport-department information.

What may be especially tantalizing for planners is the super-accurate read Waze gets on exactly where drivers are going, by pinging their phones’ GPS once every second. The app can tell how fast a driver is moving and even get a complete record of their driving history, according to Waze spokesperson Julie Mossler. (UPDATE: Since this story was first published Waze has asked to clarify that it separates users’ names and their 30-day driving info. The driving history is categorized under an alias.)

This passively-tracked GPS data “is not something we share,” she adds. Waze, which Google bought last year for $1.3 billion, can turn the data spigots on and off through its application programing interface (API).

Waze has been sharing user data with Rio since summer 2013 and it just signed up the State of Florida. It says more departments of transport are in the pipeline.

But none of these partnerships are making Waze any money. The app’s currency of choice is data. “It’s a two-way street,” says Mossler. “Literally.”

In return for its user updates, Waze gets real-time information from Rio on highways, from road sensors and even from cameras, while Florida will give the app data on construction projects or city events.

Florida’s department of transport could not be reached for comment, but one of its spokesmen recently told a local news station: “We’re going to share our information, our camera images, all of our information that comes from the sensors on the roadway, and Waze is going to share its data with us.”…

To get Moovit’s data, municipalities download a web interface that gives them an aggregated view of where pedestrians using Moovit are going. In return, the city feeds Moovit’s database with a stream of real-time GPS data for buses and trains, and can issue transport alerts to Moovit’s users. Erez notes the cities aren’t allowed to make “any sort of commercial approach to the users.”

Erez may be saving that for advertisers, an avenue he says he’s still exploring. For now getting data from cities is the bigger priority. It gives Moovit “a competitive advantage,” he says.

Cycling app Strava also recently started sharing its real-time user data as part of a paid-for service called Strava Metro.

Municipalities pay 80 cents a year for every Strava member being tracked. Metro only launched in May, but it already counts the state of Oregon; London, UK; Glasgow, Scotland; Queensland, Austalia and Evanston, Illinois as customers.
….
Privacy advocates will naturally want to keep a wary eye on what data is being fed to cities, and that it doesn’t leak or get somehow misused by City Hall. The data-sharing might not be ubiquitous enough for that to be a problem yet, and it should be noted that any kind of deal making with the public sector can get wrapped up in bureaucracy and take years to get off the ground.

For now Waze says it’s acting for the public good….(More)

Methods to Protect and Secure “Big Data” May Be Unknowingly Corrupting Research


New paper by John M. Abowd and Ian M. Schmutte: “…As the government and private companies increase the amount of data made available for public use (e.g. Census data, employment surveys, medical data), efforts to protect privacy and confidentiality (through statistical disclosure limitation or SDL) can often cause misleading and compromising effects on economic research and analysis, particularly in cases where data properties are unclear for the end-user.

Data swapping is a particularly insidious method of SDL and is frequently used by important data aggregators like the Census Bureau, the National Center for Health Statistics and others, which interferes with the results of empirical analysis in ways that few economists and other social scientists are aware of.

To encourage more transparency, the authors call for both government statistical agencies as well as the private sector (Amazon, Google, Microsoft, Netfix, Yahoo!, etc.) to release more information about parameters used in SDL methods, and insist that journals and editors publishing such research require documentation of the author’s entire methodological process….(More)

VIDEO:

Why governments need guinea pigs for policies


Jonathan Breckon in the Guardian:”People are unlikely to react positively to the idea of using citizens as guinea pigs; many will be downright disgusted. But there are times when government must experiment on us in the search for knowledge and better policy….

Though history calls into question the ethics of experimentation, unless we try things out, we will never learn. The National Audit Office says that £66bn worth of government projects have no plans to evaluate their impact. It is unethical to roll out policies in this arbitrary way. We have to experiment on a small scale to have a better understanding of how things work before rolling out policies across the UK. This is just as relevant to social policy, as it is to science and medicine, as set out in a new report by the Alliance for Useful Evidence.

Whether it’s the best ways to teach our kids to read, designing programmes to get unemployed people back to work, or encouraging organ donation – if the old ways don’t work, we have to test new ones. And that testing can’t always be done by a committee in Whitehall or in a university lab.

Experimentation can’t happen in isolation. What works in Lewisham or Londonnery, might not work in Lincoln – or indeed across the UK. For instance, there is a huge amount debate around the current practice of teaching children to read and spell using phonics, which was based on a small-scale study in Clackmannanshire, as well as evidence from the US. A government-commissioned review on the evidence for phonics led professor Carole Torgerson, then at York University, to warn against making national policy off the back of just one small Scottish trial.

One way round this problem is to do larger experiments. The increasing use of the internet in public services allows for more and faster experimentation, on a larger scale for lower cost – the randomised controlled trial on voter mobilisation that went to 61 million users in the 2010 US midterm elections, for example. However, the use of the internet doesn’t get us off the ethical hook. Facebook had to apologise after a global backlash to secret psychological tests on their 689,000 users.

Contentious experiments should be approved by ethics committees – normal practice for trials in hospitals and universities.

We are also not interested in freewheeling trial-and-error; robust and appropriate research techniques to learn from experiments are vital. It’s best to see experimentation as a continuum, ranging from the messiness of attempts to try something new to experiments using the best available social science, such as randomised controlled trials.

Experimental government means avoiding an approach where everything is fixed from the outset. What we need is “a spirit of experimentation, unburdened by promises of success”, as recommended by the late professor Roger Jowell, author of the 2003 Cabinet Office report, Trying it out [pdf]….(More)”

Big Data for Social Good


Introduction to a Special Issue of the Journal “Big Data” by Catlett Charlie and Ghani Rayid: “…organizations focused on social good are realizing the potential as well but face several challenges as they seek to become more data-driven. The biggest challenge they face is a paucity of examples and case studies on how data can be used for social good. This special issue of Big Data is targeted at tackling that challenge and focuses on highlighting some exciting and impactful examples of work that uses data for social good. The special issue is just one example of the recent surge in such efforts by the data science community. …

This special issue solicited case studies and problem statements that would either highlight (1) the use of data to solve a social problem or (2) social challenges that need data-driven solutions. From roughly 20 submissions, we selected 5 articles that exemplify this type of work. These cover five broad application areas: international development, healthcare, democracy and government, human rights, and crime prevention.

“Understanding Democracy and Development Traps Using a Data-Driven Approach” (Ranganathan et al.) details a data-driven model between democracy, cultural values, and socioeconomic indicators to identify a model of two types of “traps” that hinder the development of democracy. They use historical data to detect causal factors and make predictions about the time expected for a given country to overcome these traps.

“Targeting Villages for Rural Development Using Satellite Image Analysis” (Varshney et al.) discusses two case studies that use data and machine learning techniques for international economic development—solar-powered microgrids in rural India and targeting financial aid to villages in sub-Saharan Africa. In the process, the authors stress the importance of understanding the characteristics and provenance of the data and the criticality of incorporating local “on the ground” expertise.

In “Human Rights Event Detection from Heterogeneous Social Media Graphs,” Chen and Neil describe efficient and scalable techniques to use social media in order to detect emerging patterns in human rights events. They test their approach on recent events in Mexico and show that they can accurately detect relevant human rights–related tweets prior to international news sources, and in some cases, prior to local news reports, which could potentially lead to more timely, targeted, and effective advocacy by relevant human rights groups.

“Finding Patterns with a Rotten Core: Data Mining for Crime Series with Core Sets” (Wang et al.) describes a case study with the Cambridge Police Department, using a subspace clustering method to analyze the department’s full housebreak database, which contains detailed information from thousands of crimes from over a decade. They find that the method allows human crime analysts to handle vast amounts of data and provides new insights into true patterns of crime committed in Cambridge…..(More)