Can We Use Data to Stop Deadly Car Crashes?


Allison Shapiro in Pacific Standard Magazine: “In 2014, New York City Mayor Bill de Blasio decided to adopt Vision Zero, a multi-national initiative dedicated to eliminating traffic-related deaths. Under Vision Zero, city services, including the Department of Transportation, began an engineering and public relations plan to make the streets safer for drivers, pedestrians, and cyclists. The plan included street re-designs, improved accessibility measures, and media campaigns on safer driving.

The goal may be an old one, but the approach is innovative: When New York City officials wanted to reduce traffic deaths, they crowdsourced and used data.

Many cities in the United States—from Washington, D.C., all the way to Los Angeles—have adopted some version of Vision Zero, which began in Sweden in 1997. It’s part of a growing trend to make cities “smart” by integrating data collection into things like infrastructure and policing.

Map of high crash corridors in Portland, Oregon. (Map: Portland Bureau of Transportation)
Map of high crash corridors in Portland, Oregon. (Map: Portland Bureau of Transportation)

Cities have access to an unprecedented amount of data about traffic patterns, driving violations, and pedestrian concerns. Although advocacy groups say Vision Zero is moving too slowly, de Blasio has invested another $115 million in this data-driven approach.

Interactive safety map. (Map: District Department of Transportation)
Interactive safety map. (Map: District Department of Transportation)

De Blasio may have been vindicated. A 2015 year-end report released by the city last week analyzes the successes and shortfalls of data-driven city life, and the early results look promising. In 2015, fewer New Yorkers lost their lives in traffic accidents than in any year since 1910, according to the report, despite the fact that the population has almost doubled in those 105 years.

Below are some of the project highlights.

New Yorkers were invited to add to this public dialogue map, where they could list information ranging from “not enough time to cross” to “red light running.” The Department of Transportation ended up with over 10,000 comments, which led to 80 safety projects in 2015, including the creation of protected bike lanes, the introduction of leading pedestrian intervals, and the simplifying of complex intersections….

Data collected from the public dialogue map, town hall meetings, and past traffic accidents led to “changes to signals, street geometry and markings and regulations that govern actions like turning and parking. These projects simplify driving, walking and bicycling, increase predictability, improve visibility and reduce conflicts,” according to Vision Zero in NYC….(More)”

Anonymous hackers could be Islamic State’s online nemesis


 at the Conversation: “One of the key issues the West has had to face in countering Islamic State (IS) is the jihadi group’s mastery of online propaganda, seen in hundreds of thousands of messages celebrating the atrocities against civilians and spreading the message of radicalisation. It seems clear that efforts to counter IS online are missing the mark.

A US internal State Department assessment noted in June 2015 how the violent narrative of IS had “trumped” the efforts of the world’s richest and most technologically advanced nations. Meanwhile in Europe, Interpol was to track and take down social media accounts linked to IS, as if that would solve the problem – when in fact doing so meant potentially missing out on intelligence gathering opportunities.

Into this vacuum has stepped Anonymous, a fragmented loose network of hacktivists that has for years launched occasional cyberattacks against government, corporate and civil society organisations. The group announced its intention to take on IS and its propaganda online, using its networks to crowd-source the identity of IS-linked accounts. Under the banner of #OpIsis and #OpParis, Anonymous published lists of thousands of Twitter accounts claimed to belong to IS members or sympathisers, claiming more than 5,500 had been removed.

The group pursued a similar approach following the attacks on Charlie Hebdo magazine in January 2015, with @OpCharlieHebdo taking down more than 200 jihadist Twitter acounts, bringing down the website Ansar-Alhaqq.net and publishing a list of 25,000 accounts alongside a guide on how to locate pro-IS material online….

Anonymous has been prosecuted for cyber attacks in many countries under cybercrime laws, as their activities are not seen as legitimate protest. It is worth mentioning the ethical debate around hacktivism, as some see cyber attacks that take down accounts or websites as infringing on others’ freedom of expression, while others argue that hacktivism should instead create technologies to circumvent censorship, enable digital equality and open access to information….(More)”

The promise and perils of predictive policing based on big data


H. V. Jagadish in the Conversation: “Police departments, like everyone else, would like to be more effective while spending less. Given the tremendous attention to big data in recent years, and the value it has provided in fields ranging from astronomy to medicine, it should be no surprise that police departments are using data analysis to inform deployment of scarce resources. Enter the era of what is called “predictive policing.”

Some form of predictive policing is likely now in force in a city near you.Memphis was an early adopter. Cities from Minneapolis to Miami have embraced predictive policing. Time magazine named predictive policing (with particular reference to the city of Santa Cruz) one of the 50 best inventions of 2011. New York City Police Commissioner William Bratton recently said that predictive policing is “the wave of the future.”

The term “predictive policing” suggests that the police can anticipate a crime and be there to stop it before it happens and/or apprehend the culprits right away. As the Los Angeles Times points out, it depends on “sophisticated computer analysis of information about previous crimes, to predict where and when crimes will occur.”

At a very basic level, it’s easy for anyone to read a crime map and identify neighborhoods with higher crime rates. It’s also easy to recognize that burglars tend to target businesses at night, when they are unoccupied, and to target homes during the day, when residents are away at work. The challenge is to take a combination of dozens of such factors to determine where crimes are more likely to happen and who is more likely to commit them. Predictive policing algorithms are getting increasingly good at such analysis. Indeed, such was the premise of the movie Minority Report, in which the police can arrest and convict murderers before they commit their crime.

Predicting a crime with certainty is something that science fiction can have a field day with. But as a data scientist, I can assure you that in reality we can come nowhere close to certainty, even with advanced technology. To begin with, predictions can be only as good as the input data, and quite often these input data have errors.

But even with perfect, error-free input data and unbiased processing, ultimately what the algorithms are determining are correlations. Even if we have perfect knowledge of your troubled childhood, your socializing with gang members, your lack of steady employment, your wacko posts on social media and your recent gun purchases, all that the best algorithm can do is to say it is likely, but not certain, that you will commit a violent crime. After all, to believe such predictions as guaranteed is to deny free will….

What data can do is give us probabilities, rather than certainty. Good data coupled with good analysis can give us very good estimates of probability. If you sum probabilities over many instances, you can usually get a robust estimate of the total.

For example, data analysis can provide a probability that a particular house will be broken into on a particular day based on historical records for similar houses in that neighborhood on similar days. An insurance company may add this up over all days in a year to decide how much to charge for insuring that house….(More)”

Beyond the Quantified Self: Thematic exploration of a dataistic paradigm


Minna Ruckenstein and Mika Pantzar in New Media and Society: “This article investigates the metaphor of the Quantified Self (QS) as it is presented in the magazine Wired (2008–2012). Four interrelated themes—transparency, optimization, feedback loop, and biohacking—are identified as formative in defining a new numerical self and promoting a dataist paradigm. Wired captures certain interests and desires with the QS metaphor, while ignoring and downplaying others, suggesting that the QS positions self-tracking devices and applications as interfaces that energize technological engagements, thereby pushing us to rethink life in a data-driven manner. The thematic analysis of the QS is treated as a schematic aid for raising critical questions about self-quantification, for instance, detecting the merging of epistemological claims, technological devices, and market-making efforts. From this perspective, another definition of the QS emerges: a knowledge system that remains flexible in its aims and can be used as a resource for epistemological inquiry and in the formation of alternative paradigms….(More)”

The Magazine of Early American Datasets


Mark Boonshoft at The Junto: “Data. Before postmodernism, or environmental history, or the cultural turn, or the geographic turn, and even before the character on the old Star Trek series, historians began to gather and analyze quantitative evidence to understand the past. As computers became common during the 1970s and 1980s, scholars responded by painstakingly compiling and analyzing datasets, using that evidence to propose powerful new historical interpretations. Today, much of that information (as well as data compiled since) is in danger of disappearing. For that and other reasons, we have developed a website designed to preserve and share the datasets permanently (or at least until aliens destroy our planet). We appeal to all early American historians (not only the mature ones from earlier decades) to take the time both to preserve and to share their statistical evidence with present and future scholars. It will not only be a legacy to the profession but also will encourage historians to share their data more openly and to provide a foundation on which scholars can build.

In coordination with the McNeil Center for Early American Studies and specialists at the University of Pennsylvania Libraries, in addition to bepress, we have established the Magazine of Early American Datasets (MEAD), available athttp://repository.upenn.edu/mead/. We’d love to have your datasets, your huddled 1’s and 0’s (and other numbers and letters) yearning to be free.  The best would be in either .csv or, if you have commas in your data, .txt, because both of those are non-proprietary and somewhat close to universal.  However, if the data is in other forms, like Access Excel or SPSS, that will do fine as well. Ultimately, we should be able to convert files to a more permanent database and to preserve those files in perpetuity.  In addition, we are asking scholars, out of the goodness of their heart and commitment to the profession, to load a separate document as a codebook explaining the meaning of the variables.  The files will all be available to any scholar regardless of their academic affiliation.

How will a free, open centralized data center benefit Early American Historians and why should you participate in using and sharing data? Let us count just a few ways. In our experience, most historians of early America are extremely generous in sharing not only their expertise but also their evidence with other scholars. However, that generally occurs on an individual, case-by-case basis in a somewhat serendipitous fashion. A centralized website would permit scholars quickly to investigate rather quantitative evidence was available on which they might begin to construct their own research. Ideally, scholars setting out on a new topic might be guided somewhat the existence and availability of data. Moreover, it would set a precedent that future historians might follows—routinely sharing their evidence, either before or after their publications analyzing the data have appeared in print or online….(More)”

Want to Invest in Your City? Try the New Kickstarter for Municipal Bonds


Kyle Chayka’ in Pacific Standard Magazine:“… The San Francisco-based Neighborly launched in 2013 as a kind of community-based Kickstarter, helping users fund projects close to home. But the site recently pivoted toward presenting a better interface for municipal bonds, highlighting investment opportunities with a slick, Silicon Valley-style interface that makes supporting a local infrastructure project as cool as backing a new model of wrist-wearable computer. It’s bringing innovation to a dusty, though increasingly popular, sector. “You’d be shocked to find how much of the [municipal bonds] process is still being done by email and phone calls,” says Rodrigo Davies, Neighborly’s chief product officer. “This market is really not as modern as you would think.”….Neighborly enters into a gray space between crowdfunding and crowd-investing. The former is what we associate with Kickstarter and Indiegogo, which lump together many small donations into totals that can reach into the millions. In crowdfunding, donations are often made for no guaranteed return. Contrary to what it might suggest, Kickstarter isn’t selling any products; it’s just giving users the opportunity to freely give away money for a legally non-binding promise of a reward, often in the form of a theoretical product. …

Crowd-investing, in contrast, exchanges money for equity in a company, or in Neighborly’s case, a city. Shares of stock or debt purchased through crowd-investing ideally result in profit for the holder, though they can hold as much risk as any vaporware crowdfunding project. But crowd-investing remains largely illegal, despite President Obama’s passing of the JOBS Act in early 2012 that was supposed to clear its path to legitimacy.

The obstacle is that the government’s job is to mitigate the financial risks its citizens can take. That’s why Quire, a start-up that allows fans of popular tech businesses to invest in them themselves, is still only open to “accredited investors,” defined by the government as someone “with income exceeding $200,000 in each of the two most recent years” or who has an individual net worth of over $1 million. Legally, a large investment is categorized as too much risk for anyone under that threshold.

That’s exactly the demographic Neighborly is targeting for municipal bonds, which start in minimum denominations of $5,000. “Bond brokers wouldn’t even look at you unless you have $50-100,000 to invest,” Davies says. The new platform, however, doesn’t discriminate. “We’re looking at people who live in the cities where the projects are happening … in their mid-20s to early 40s, who have some money that they want to invest for the future,” he says. “They put it in a bank savings account or invest it in some funds that they don’t necessarily understand. They should be investing to earn better returns, but they’re not necessarily experienced with financial markets. Those people could benefit a ton from investing in their cities.”…(More)

Advances in Crowdsourcing


New book edited by Garrigos-Simon, Fernando J., Gil-Pechuán, Ignacio, Estelles-Miguel, Sofia: “This book attempts to link some of the recent advances in crowdsourcing with advances in innovation and management. It contributes to the literature in several ways. First, it provides a global definition, insights and examples of this managerial perspective resulting in a theoretical framework. Second, it explores the relationship between crowdsourcing and technological innovation, the development of social networks and new behaviors of Internet users. Third, it explores different crowdsourcing applications in various sectors such as medicine, tourism, information and communication technology (ICT), and marketing. Fourth, it observes the ways in which crowdsourcing can improve production, finance, management and overall managerial performance.

Crowdsourcing, also known as “massive outsourcing” or “voluntary outsourcing,” is the act of taking a job or a specific task usually performed by an employee of a company or contractors, and outsourcing it to a large group of people or a community (crowd or mass) via the Internet, through an open call. The term was coined by Jeff Howe in a 2006 issue of Wired magazine. It is being developed in different sciences (i.e., medicine, engineering, ICT, management) and is used in the most successful companies of the modern era (i.e., Apple, Facebook, Inditex, Starbucks). The developments in crowdsourcing has theoretical and practical implications, which will be explored in this book.

Including contributions from international academics, scholars and professionals within the field, this book provides a global, multidimensional perspective on crowdsourcing.​…(More)”

Crowdsourced website flags up sexism in the workplace


Springwise: “Female jobseekers can now review the treatment of women in their potential workplace via an online platform called InHerSight. The website collates anonymous reviews from former and current employees — both male and female — so that women can find out more about the company’s policies, office culture and other potential issues before applying for or accepting a job there.

A recent survey by Cosmopolitan magazine found that one in three women are sexually harassed at work and InHerSight enables those women to communicate misconduct and other problematic corporate policies. Importantly, they can do so without fear of recrimination or consequence, since the scorecards are entirely anonymous. Users can complete surveys about their experience at any given company — either adding to an existing score or creating a new profile — by scoring them on 14 categories including their stance on maternity leave, flexible work hours and female representation in top positions. They can also leave a written review of the company. The crowdsourced data is then used to create comprehensive scorecards for other users to view.

Founder Ursula Mead envisions the site as a TripAdvisor for women in the workplace and hopes that by holding companies accountable for their support for women, it will encourage them to review and improve their treatment….(More)”

Wittgenstein, #TheDress and Google’s search for a bigger truth


Robert Shrimsley at the Financial Times: “As the world burnt with a BuzzFeed-prompted debate over whether a dress was black and blue or white and gold, the BBC published a short article posing the question everyone was surely asking: “What would Wittgenstein say about that dress?

Wittgenstein died in 1951, so we cannot know if the philosopher of language, truth and context would have been a devotee of BuzzFeed. (I guess it depends on whether we are talking of the early or the late Ludwig. The early Wittgenstein, it is well known, was something of an enthusiast for LOLs, whereas the later was more into WTFs and OMGs.)

The dress will now join the pantheon of web phenomena such as “Diet Coke and Mentos” and “Charlie bit my finger”. But this trivial debate on perceived truth captured in miniature a wider issue for the web: how to distil fact from noise when opinion drowns out information and value is determined by popularity.

At about the same time as the dress was turning the air blue — or was it white? — the New Scientist published a report on how one web giant might tackle this problem, a development in which Wittgenstein might have been very interested. The magazine reported on a Google research paper about how the company might reorder its search rankings to promote sites that could be trusted to tell the truth. (Google produces many such papers a year so this is a long way short of official policy.) It posits a formula for finding and promoting sites with a record of reliability.

This raises an interesting question over how troubled we should be by the notion that a private company with its own commercial interests and a huge concentration of power could be the arbiter of truth. There is no current reason to see sinister motives in Google’s search for a better web: it is both honourable and good business. But one might ask how, for example, Google Truth might determine established truths on net neutrality….

The paper suggests using fidelity to proved facts as a proxy for trust. This is easiest with single facts, such as a date or place of birth. For example, it suggests claiming Barack Obama was born in Kenya would push a site down the rankings. This would be good for politics but facts are not always neutral. Google would risk being depicted as part of “the mainstream media”. Fox Search here we come….(More)”

CrowdFlower Launches Open Data Project


Anthony Ha at Techcrunch: “Crowdsourcing company CrowdFlower allows businesses to tap into a distributed workforce of 5 million contributors for basic tasks like sentiment analysis. Today it’s releasing some of that data to the public through its new Data for Everyone initiative…. hope is to turn CrowdFlower into a central repository where open data can be found by researchers and entrepreneurs. (Factual was another startup trying to become a hub for open data, though in recent years, it’s become more focused on gathering location data to power mobile ads.)…

As for the data that’s available now, …There’s a lot of Twitter sentiment analysis covering things like from attitudes towards brands and products, yogurt (?), and climate change. Among the more recent data sets, I was particularly taken in the gender breakdown of who’s been on the cover of Time magazine and, yes, the analysis of who thought the dress (you know the one) was gold and white versus blue and black…. (More)”