The promise and perils of predictive policing based on big data

H. V. Jagadish in the Conversation: “Police departments, like everyone else, would like to be more effective while spending less. Given the tremendous attention to big data in recent years, and the value it has provided in fields ranging from astronomy to medicine, it should be no surprise that police departments are using data analysis to inform deployment of scarce resources. Enter the era of what is called “predictive policing.”

Some form of predictive policing is likely now in force in a city near you.Memphis was an early adopter. Cities from Minneapolis to Miami have embraced predictive policing. Time magazine named predictive policing (with particular reference to the city of Santa Cruz) one of the 50 best inventions of 2011. New York City Police Commissioner William Bratton recently said that predictive policing is “the wave of the future.”

The term “predictive policing” suggests that the police can anticipate a crime and be there to stop it before it happens and/or apprehend the culprits right away. As the Los Angeles Times points out, it depends on “sophisticated computer analysis of information about previous crimes, to predict where and when crimes will occur.”

At a very basic level, it’s easy for anyone to read a crime map and identify neighborhoods with higher crime rates. It’s also easy to recognize that burglars tend to target businesses at night, when they are unoccupied, and to target homes during the day, when residents are away at work. The challenge is to take a combination of dozens of such factors to determine where crimes are more likely to happen and who is more likely to commit them. Predictive policing algorithms are getting increasingly good at such analysis. Indeed, such was the premise of the movie Minority Report, in which the police can arrest and convict murderers before they commit their crime.

Predicting a crime with certainty is something that science fiction can have a field day with. But as a data scientist, I can assure you that in reality we can come nowhere close to certainty, even with advanced technology. To begin with, predictions can be only as good as the input data, and quite often these input data have errors.

But even with perfect, error-free input data and unbiased processing, ultimately what the algorithms are determining are correlations. Even if we have perfect knowledge of your troubled childhood, your socializing with gang members, your lack of steady employment, your wacko posts on social media and your recent gun purchases, all that the best algorithm can do is to say it is likely, but not certain, that you will commit a violent crime. After all, to believe such predictions as guaranteed is to deny free will….

What data can do is give us probabilities, rather than certainty. Good data coupled with good analysis can give us very good estimates of probability. If you sum probabilities over many instances, you can usually get a robust estimate of the total.

For example, data analysis can provide a probability that a particular house will be broken into on a particular day based on historical records for similar houses in that neighborhood on similar days. An insurance company may add this up over all days in a year to decide how much to charge for insuring that house….(More)”

Beyond the Quantified Self: Thematic exploration of a dataistic paradigm

Minna Ruckenstein and Mika Pantzar in New Media and Society: “This article investigates the metaphor of the Quantified Self (QS) as it is presented in the magazine Wired (2008–2012). Four interrelated themes—transparency, optimization, feedback loop, and biohacking—are identified as formative in defining a new numerical self and promoting a dataist paradigm. Wired captures certain interests and desires with the QS metaphor, while ignoring and downplaying others, suggesting that the QS positions self-tracking devices and applications as interfaces that energize technological engagements, thereby pushing us to rethink life in a data-driven manner. The thematic analysis of the QS is treated as a schematic aid for raising critical questions about self-quantification, for instance, detecting the merging of epistemological claims, technological devices, and market-making efforts. From this perspective, another definition of the QS emerges: a knowledge system that remains flexible in its aims and can be used as a resource for epistemological inquiry and in the formation of alternative paradigms….(More)”

The Magazine of Early American Datasets

Mark Boonshoft at The Junto: “Data. Before postmodernism, or environmental history, or the cultural turn, or the geographic turn, and even before the character on the old Star Trek series, historians began to gather and analyze quantitative evidence to understand the past. As computers became common during the 1970s and 1980s, scholars responded by painstakingly compiling and analyzing datasets, using that evidence to propose powerful new historical interpretations. Today, much of that information (as well as data compiled since) is in danger of disappearing. For that and other reasons, we have developed a website designed to preserve and share the datasets permanently (or at least until aliens destroy our planet). We appeal to all early American historians (not only the mature ones from earlier decades) to take the time both to preserve and to share their statistical evidence with present and future scholars. It will not only be a legacy to the profession but also will encourage historians to share their data more openly and to provide a foundation on which scholars can build.

In coordination with the McNeil Center for Early American Studies and specialists at the University of Pennsylvania Libraries, in addition to bepress, we have established the Magazine of Early American Datasets (MEAD), available at We’d love to have your datasets, your huddled 1’s and 0’s (and other numbers and letters) yearning to be free.  The best would be in either .csv or, if you have commas in your data, .txt, because both of those are non-proprietary and somewhat close to universal.  However, if the data is in other forms, like Access Excel or SPSS, that will do fine as well. Ultimately, we should be able to convert files to a more permanent database and to preserve those files in perpetuity.  In addition, we are asking scholars, out of the goodness of their heart and commitment to the profession, to load a separate document as a codebook explaining the meaning of the variables.  The files will all be available to any scholar regardless of their academic affiliation.

How will a free, open centralized data center benefit Early American Historians and why should you participate in using and sharing data? Let us count just a few ways. In our experience, most historians of early America are extremely generous in sharing not only their expertise but also their evidence with other scholars. However, that generally occurs on an individual, case-by-case basis in a somewhat serendipitous fashion. A centralized website would permit scholars quickly to investigate rather quantitative evidence was available on which they might begin to construct their own research. Ideally, scholars setting out on a new topic might be guided somewhat the existence and availability of data. Moreover, it would set a precedent that future historians might follows—routinely sharing their evidence, either before or after their publications analyzing the data have appeared in print or online….(More)”

Want to Invest in Your City? Try the New Kickstarter for Municipal Bonds

Kyle Chayka’ in Pacific Standard Magazine:“… The San Francisco-based Neighborly launched in 2013 as a kind of community-based Kickstarter, helping users fund projects close to home. But the site recently pivoted toward presenting a better interface for municipal bonds, highlighting investment opportunities with a slick, Silicon Valley-style interface that makes supporting a local infrastructure project as cool as backing a new model of wrist-wearable computer. It’s bringing innovation to a dusty, though increasingly popular, sector. “You’d be shocked to find how much of the [municipal bonds] process is still being done by email and phone calls,” says Rodrigo Davies, Neighborly’s chief product officer. “This market is really not as modern as you would think.”….Neighborly enters into a gray space between crowdfunding and crowd-investing. The former is what we associate with Kickstarter and Indiegogo, which lump together many small donations into totals that can reach into the millions. In crowdfunding, donations are often made for no guaranteed return. Contrary to what it might suggest, Kickstarter isn’t selling any products; it’s just giving users the opportunity to freely give away money for a legally non-binding promise of a reward, often in the form of a theoretical product. …

Crowd-investing, in contrast, exchanges money for equity in a company, or in Neighborly’s case, a city. Shares of stock or debt purchased through crowd-investing ideally result in profit for the holder, though they can hold as much risk as any vaporware crowdfunding project. But crowd-investing remains largely illegal, despite President Obama’s passing of the JOBS Act in early 2012 that was supposed to clear its path to legitimacy.

The obstacle is that the government’s job is to mitigate the financial risks its citizens can take. That’s why Quire, a start-up that allows fans of popular tech businesses to invest in them themselves, is still only open to “accredited investors,” defined by the government as someone “with income exceeding $200,000 in each of the two most recent years” or who has an individual net worth of over $1 million. Legally, a large investment is categorized as too much risk for anyone under that threshold.

That’s exactly the demographic Neighborly is targeting for municipal bonds, which start in minimum denominations of $5,000. “Bond brokers wouldn’t even look at you unless you have $50-100,000 to invest,” Davies says. The new platform, however, doesn’t discriminate. “We’re looking at people who live in the cities where the projects are happening … in their mid-20s to early 40s, who have some money that they want to invest for the future,” he says. “They put it in a bank savings account or invest it in some funds that they don’t necessarily understand. They should be investing to earn better returns, but they’re not necessarily experienced with financial markets. Those people could benefit a ton from investing in their cities.”…(More)

Advances in Crowdsourcing

New book edited by Garrigos-Simon, Fernando J., Gil-Pechuán, Ignacio, Estelles-Miguel, Sofia: “This book attempts to link some of the recent advances in crowdsourcing with advances in innovation and management. It contributes to the literature in several ways. First, it provides a global definition, insights and examples of this managerial perspective resulting in a theoretical framework. Second, it explores the relationship between crowdsourcing and technological innovation, the development of social networks and new behaviors of Internet users. Third, it explores different crowdsourcing applications in various sectors such as medicine, tourism, information and communication technology (ICT), and marketing. Fourth, it observes the ways in which crowdsourcing can improve production, finance, management and overall managerial performance.

Crowdsourcing, also known as “massive outsourcing” or “voluntary outsourcing,” is the act of taking a job or a specific task usually performed by an employee of a company or contractors, and outsourcing it to a large group of people or a community (crowd or mass) via the Internet, through an open call. The term was coined by Jeff Howe in a 2006 issue of Wired magazine. It is being developed in different sciences (i.e., medicine, engineering, ICT, management) and is used in the most successful companies of the modern era (i.e., Apple, Facebook, Inditex, Starbucks). The developments in crowdsourcing has theoretical and practical implications, which will be explored in this book.

Including contributions from international academics, scholars and professionals within the field, this book provides a global, multidimensional perspective on crowdsourcing.​…(More)”

Crowdsourced website flags up sexism in the workplace

Springwise: “Female jobseekers can now review the treatment of women in their potential workplace via an online platform called InHerSight. The website collates anonymous reviews from former and current employees — both male and female — so that women can find out more about the company’s policies, office culture and other potential issues before applying for or accepting a job there.

A recent survey by Cosmopolitan magazine found that one in three women are sexually harassed at work and InHerSight enables those women to communicate misconduct and other problematic corporate policies. Importantly, they can do so without fear of recrimination or consequence, since the scorecards are entirely anonymous. Users can complete surveys about their experience at any given company — either adding to an existing score or creating a new profile — by scoring them on 14 categories including their stance on maternity leave, flexible work hours and female representation in top positions. They can also leave a written review of the company. The crowdsourced data is then used to create comprehensive scorecards for other users to view.

Founder Ursula Mead envisions the site as a TripAdvisor for women in the workplace and hopes that by holding companies accountable for their support for women, it will encourage them to review and improve their treatment….(More)”

Wittgenstein, #TheDress and Google’s search for a bigger truth

Robert Shrimsley at the Financial Times: “As the world burnt with a BuzzFeed-prompted debate over whether a dress was black and blue or white and gold, the BBC published a short article posing the question everyone was surely asking: “What would Wittgenstein say about that dress?

Wittgenstein died in 1951, so we cannot know if the philosopher of language, truth and context would have been a devotee of BuzzFeed. (I guess it depends on whether we are talking of the early or the late Ludwig. The early Wittgenstein, it is well known, was something of an enthusiast for LOLs, whereas the later was more into WTFs and OMGs.)

The dress will now join the pantheon of web phenomena such as “Diet Coke and Mentos” and “Charlie bit my finger”. But this trivial debate on perceived truth captured in miniature a wider issue for the web: how to distil fact from noise when opinion drowns out information and value is determined by popularity.

At about the same time as the dress was turning the air blue — or was it white? — the New Scientist published a report on how one web giant might tackle this problem, a development in which Wittgenstein might have been very interested. The magazine reported on a Google research paper about how the company might reorder its search rankings to promote sites that could be trusted to tell the truth. (Google produces many such papers a year so this is a long way short of official policy.) It posits a formula for finding and promoting sites with a record of reliability.

This raises an interesting question over how troubled we should be by the notion that a private company with its own commercial interests and a huge concentration of power could be the arbiter of truth. There is no current reason to see sinister motives in Google’s search for a better web: it is both honourable and good business. But one might ask how, for example, Google Truth might determine established truths on net neutrality….

The paper suggests using fidelity to proved facts as a proxy for trust. This is easiest with single facts, such as a date or place of birth. For example, it suggests claiming Barack Obama was born in Kenya would push a site down the rankings. This would be good for politics but facts are not always neutral. Google would risk being depicted as part of “the mainstream media”. Fox Search here we come….(More)”

CrowdFlower Launches Open Data Project

Anthony Ha at Techcrunch: “Crowdsourcing company CrowdFlower allows businesses to tap into a distributed workforce of 5 million contributors for basic tasks like sentiment analysis. Today it’s releasing some of that data to the public through its new Data for Everyone initiative…. hope is to turn CrowdFlower into a central repository where open data can be found by researchers and entrepreneurs. (Factual was another startup trying to become a hub for open data, though in recent years, it’s become more focused on gathering location data to power mobile ads.)…

As for the data that’s available now, …There’s a lot of Twitter sentiment analysis covering things like from attitudes towards brands and products, yogurt (?), and climate change. Among the more recent data sets, I was particularly taken in the gender breakdown of who’s been on the cover of Time magazine and, yes, the analysis of who thought the dress (you know the one) was gold and white versus blue and black…. (More)”

This vending machine will deny you snacks based on medical records

Springwise: “Businesses often stand by the motto ‘the customer is always right’ — but are they? We’ve already seen a few services that deny consumers what they want based on their personal info. For example, Billboard Brasil’s Fan Check Machine only gave out copies of the music magazine if the buyer could prove they owned tracks by the artist on the cover. Now the Luce X2 Touch TV vending machine uses facial recognition and customers’ medical records to determine if they should be allowed to buy an unhealthy snack.
Created by Italy-based Rhea Vendors and recently launched in the UK, the machine features a 22-inch touchscreen display that lets customers to select an item just like a standard vending machine. However, before the snack is released customers with an account can go through a facial recognition check.
The technology detects the customer’s age, build and mood in order to determine whether the purchase is a wise decision. The machine can also be programmed to access information about the user’s medical records and purchase history. If the algorithms decide that purchasing a coffee with 3 sugars or the fourth candy bar of the day is a bad idea for their health or mood, it can refuse to vend the product.
While some customers won’t appreciate their private data being analyzed or getting rejected by a lifeless machine, the idea could be a savior for those on a diet….(More).

HyperCities: Thick Mapping in the Digital Humanities

Book by Todd Presner, David Shepard, Yoh Kawano: “The prefix “hyper” refers to multiplicity and abundance. More than a physical space, a hypercity is a real city overlaid with information networks that document the past, catalyze the present, and project future possibilities. Hypercities are always under construction.
Todd Presner, David Shepard, and Yoh Kawano put digital humanities theory into practice to chart the proliferating cultural records of places around the world. A digital platform transmogrified into a book, it explains the ambitious online project of the same name that maps the historical layers of city spaces in an interactive, hypermedia environment. The authors examine the media archaeology of Google Earth and the cultural–historical meaning of map projections, and explore recent events—the “Arab Spring” and the Fukushima nuclear power plant disaster—through social media mapping that incorporates data visualizations, photographic documents, and Twitter streams. A collaboratively authored and designed work, HyperCities includes a “ghost map” of downtown Los Angeles, polyvocal memory maps of LA’s historic Filipinotown, avatar-based explorations of ancient Rome, and hour-by-hour mappings of the Tehran election protests of 2009.
Not a book about maps in the literal sense, HyperCities describes thick mapping: the humanist project of participating and listening that transforms mapping into an ethical undertaking. Ultimately, the digital humanities do not consist merely of computer-based methods for analyzing information. They are a means of integrating scholarship with the world of lived experience, making sense of the past in the layered spaces of the present for the sake of the open future.”