Can crowdsourcing decipher the roots of armed conflict?


Stephanie Kanowitz at GCN: “Researchers at Pennsylvania State University and the University of Texas at Dallas are proving that there’s accuracy, not just safety, in numbers. The Correlates of War project, a long-standing effort that studies the history of warfare, is now experimenting with crowdsourcing as a way to more quickly and inexpensively create a global conflict database that could help explain when and why countries go to war.

The goal is to facilitate the collection, dissemination and use of reliable data in international relations, but a byproduct has emerged: the development of technology that uses machine learning and natural language processing to efficiently, cost-effectively and accurately create databases from news articles that detail militarized interstate disputes.

The project is in its fifth iteration, having released the fourth set of Militarized Dispute (MID) Data in 2014. To create those earlier versions, researchers paid subject-matter experts such as political scientists to read and hand code newswire articles about disputes, identifying features of possible militarized incidents. Now, however, they’re soliciting help from anyone and everyone — and finding the results are much the same as what the experts produced, except the results come in faster and with significantly less expense.

As news articles come across the wire, the researchers pull them and formulate questions about them that help evaluate the military events. Next, the articles and questions are loaded onto the Amazon Mechanical Turk, a marketplace for crowdsourcing. The project assigns articles to readers, who typically spend about 10 minutes reading an article and responding to the questions. The readers submit the answers to the project researchers, who review them. The project assigns the same article to multiple workers and uses computer algorithms to combine the data into one annotation.

A systematic comparison of the crowdsourced responses with those of trained subject-matter experts showed that the crowdsourced work was accurate for 68 percent of the news reports coded. More important, the aggregation of answers for each article showed that common answers from multiple readers strongly correlated with correct coding. This allowed researchers to easily flag the articles that required deeper expert involvement and process the majority of the news items in near-real time and at limited cost….(more)”

Big Data in U.S. Agriculture


Megan Stubbs at the Congressional Research Service: “Recent media and industry reports have employed the term big data as a key to the future of increased food production and sustainable agriculture. A recent hearing on the private elements of big data in agriculture suggests that Congress too is interested in potential opportunities and challenges big data may hold. While there appears to be great interest, the subject of big data is complex and often misunderstood, especially within the context of agriculture.

There is no commonly accepted definition of the term big data. It is often used to describe a modern trend in which the combination of technology and advanced analytics creates a new way of processing information that is more useful and timely. In other words, big data is just as much about new methods for processing data as about the data themselves. It is dynamic, and when analyzed can provide a useful tool in a decisionmaking process. Most see big data in agriculture at the end use point, where farmers use precision tools to potentially create positive results like increased yields, reduced inputs, or greater sustainability. While this is certainly the more intriguing part of the discussion, it is but one aspect and does not necessarily represent a complete picture.

Both private and public big data play a key role in the use of technology and analytics that drive a producer’s evidence-based decisions. Public-level big data represent records collected, maintained, and analyzed through publicly funded sources, specifically by federal agencies (e.g., farm program participant records and weather data). Private big data represent records generated at the production level and originate with the farmer or rancher (e.g., yield, soil analysis, irrigation levels, livestock movement, and grazing rates). While discussed separately in this report, public and private big data are typically combined to create a more complete picture of an agricultural operation and therefore better decisionmaking tools.

Big data may significantly affect many aspects of the agricultural industry, although the full extent and nature of its eventual impacts remain uncertain. Many observers predict that the growth of big data will bring positive benefits through enhanced production, resource efficiency, and improved adaptation to climate change. While lauded for its potentially revolutionary applications, big data is not without issues. From a policy perspective, issues related to big data involve nearly every stage of its existence, including its collection (how it is captured), management (how it is stored and managed), and use (how it is analyzed and used). It is still unclear how big data will progress within agriculture due to technical and policy challenges, such as privacy and security, for producers and policymakers. As Congress follows the issue a number of questions may arise, including a principal one—what is the federal role?…(More)”

Predictive Analytics


Revised book by Eric Siegel: “Prediction is powered by the world’s most potent, flourishing unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.

Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future drives millions of decisions more effectively, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.

In this lucid, captivating introduction — now in its Revised and Updated edition — former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction:

    • What type of mortgage risk Chase Bank predicted before the recession.
    • Predicting which people will drop out of school, cancel a subscription, or get divorced before they even know it themselves.
    • Why early retirement predicts a shorter life expectancy and vegetarians miss fewer flights.
    • Five reasons why organizations predict death — including one health insurance company.
    • How U.S. Bank and Obama for America calculated — and Hillary for America 2016 plans to calculate — the way to most strongly persuade each individual.
    • Why the NSA wants all your data: machine learning supercomputers to fight terrorism.
    • How IBM’s Watson computer used predictive modeling to answer questions and beat the human champs on TV’s Jeopardy!
    • How companies ascertain untold, private truths — how Target figures out you’re pregnant and Hewlett-Packard deduces you’re about to quit your job.
    • How judges and parole boards rely on crime-predicting computers to decide how long convicts remain in prison.
    • 183 examples from Airbnb, the BBC, Citibank, ConEd, Facebook, Ford, Google, the IRS, LinkedIn, Match.com, MTV, Netflix, PayPal, Pfizer, Spotify, Uber, UPS, Wikipedia, and more….(More)”

 

Humanity 360: World Humanitarian Data and Trends 2015


OCHA: “WORLD HUMANITARIAN DATA AND TRENDS

Highlights major trends, challenges and opportunities in the nature of humanitarian crises, showing how the humanitarian landscape is evolving in a rapidly changing world.

EXPLORE...

LEAVING NO ONE BEHIND: HUMANITARIAN EFFECTIVENESS IN THE AGE OF THE SUSTAINABLE DEVELOPMENT GOALS

Exploring what humanitarian effectiveness means in today’s world ‐ better meeting the needs of people in crisis, better moving people out of crisis.

EXPLORE

TOOLS FOR DATA COORDINATION AND COLLECTION

 

HereHere


HereHere NYC generates weekly cartoons for NYC neighborhoods based on public data. We sum up how your neighborhood, or other NYC neighborhoods you care about, are doing via weekly email digest, neighborhood-specific Twitter & Instagram feeds, and with deeper data and context.

HereHere is a research project from FUSE Labs, Microsoft Research that explores:

  • Creating compelling stories with data to engage larger communities
  • Inventing new habits for connecting to the hyperlocal
  • Using cartoons as a tool to drive data engagement

HereHere does not use sentiment analysis, but uses a research platform with the intention of surfacing the most pertinent information with a human perspective. …

How It Works

Several times a day we grab the freshest NYC 311 data. The data comes in as a long list of categorized concerns issued by people in NYC (either via phone, email, or text message) and range from heating complaints to compliments to concerns about harboring bees and everything in between.

We separate the data by neighborhood for each of the 42 neighborhoods throughout the 5 boroughs of NYC, and count the total of each concern per neighborhood.

Next, we process the data through the Sentient Data Server. SDS equips each neighborhood with a personality (like a character in a movie or videogame) and we calculate the character’s response to the latest data based on pace, position and trend. For example, a neighborhood might be delighted if after several days of more than 30 heating complaints, heating complaints drops down to 0; or a neighborhood might be ashamed to see a sudden rise in homeless person assistance requests.

 

HereHere determines the most critical 311 issues for each neighborhood each week and uses that to procedurally generate a weekly cartoon for each neighborhood.

 HereHere summarizes the 311 concerns into categories for a quick sense of what’s happening in each neighborhood…(More)

How Facebook Makes Us Dumber


 in BloombergView: “Why does misinformation spread so quickly on the social media? Why doesn’t it get corrected? When the truth is so easy to find, why do people accept falsehoods?

A new study focusing on Facebook users provides strong evidence that the explanation is confirmation bias: people’s tendency to seek out information that confirms their beliefs, and to ignore contrary information.

Confirmation bias turns out to play a pivotal role in the creation of online echo chambers. This finding bears on a wide range of issues, including the current presidential campaign, the acceptance of conspiracy theories and competing positions in international disputes.

The new study, led by Michela Del Vicario of Italy’s Laboratory of Computational Social Science, explores the behavior of Facebook users from 2010 to 2014. One of the study’s goals was to test a question that continues to be sharply disputed: When people are online, do they encounter opposing views, or do they create the virtual equivalent of gated communities?

Del Vicario and her coauthors explored how Facebook users spread conspiracy theories (using 32 public web pages); science news (using 35 such pages); and “trolls,” which intentionally spread false information (using two web pages). Their data set is massive: It covers all Facebook posts during the five-year period. They explored which Facebook users linked to one or more of the 69 web pages, and whether they learned about those links from their Facebook friends.

In sum, the researchers find a lot of communities of like-minded people. Even if they are baseless, conspiracy theories spread rapidly within such communities.

More generally, Facebook users tended to choose and share stories containing messages they accept, and to neglect those they reject. If a story fits with what people already believe, they are far more likely to be interested in it and thus to spread it….(More)”

Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues


Press Release: “A new report from the Federal Trade Commission outlines a number of questions for businesses to consider to help ensure that their use of big data analytics, while producing many benefits for consumers, avoids outcomes that may be exclusionary or discriminatory.

“Big data’s role is growing in nearly every area of business, affecting millions of consumers in concrete ways,” said FTC Chairwoman Edith Ramirez. “The potential benefits to consumers are significant, but businesses must ensure that their big data use does not lead to harmful exclusion or discrimination.”

The report, Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues, looks specifically at big data at the end of its lifecycle – how it is used after being collected and analyzed, and draws on information from the FTC’s 2014 workshop, “Big Data: A Tool for Inclusion or Exclusion?,” as well as the Commission’s seminar on Alternative Scoring Products. The Commission also considered extensive public comments and additional public research in compiling the report.

The report highlights a number of innovative uses of big data that are providing benefits to underserved populations, including increased educational attainment, access to credit through non-traditional methods, specialized health care for underserved communities, and better access to employment.

In addition, the report looks at possible risks that could result from biases or inaccuracies about certain groups, including more individuals mistakenly denied opportunities based on the actions of others, exposing sensitive information, creating or reinforcing existing disparities, assisting in the targeting of vulnerable consumers for fraud, creating higher prices for goods and services in lower-income communities and weakening the effectiveness of consumer choice.

The report outlines some of the various laws that apply to the use of big data, especially in regards to possible issues of discrimination or exclusion, including the Fair Credit Reporting Act, FTC Act and equal opportunity laws. It also provides a range of questions for businesses to consider when they examine whether their big data programs comply with these laws.

The report also proposes four key policy questions that are drawn from research into the ways big data can both present and prevent harms. The policy questions are designed to help companies determine how best to maximize the benefit of their use of big data while limiting possible harms, by examining both practical questions of accuracy and built-in bias as well as whether the company’s use of big data raises ethical or fairness concerns….(More)”

Managing Secrecy


Clare Birchall in the International Journal of Communication: “As many anthropologists and sociologists have long argued, understanding the meaning and place of secrets is central to an adequate representation of society. This article extends previous accounts of secrecy in social, governmental, and organizational settings to configure secrecy as one form of visibility management among others. Doing so helps to remove the secret from a post-Enlightenment value system that deems secrets bad and openness good. Once secrecy itself is seen as a neutral phenomenon, we can focus on the politicality or ethics of any particular distribution of the visible, sayable, and knowable. Alongside understanding the work secrecy performs in contemporary society, this article argues that we can also seek inspiration from the secret as a methodological tool and political tactic. Moving beyond the claim to privacy, a claim that has lost bite in this era of state and consumer dataveillance, a “right to opacity”—the right to not be transparent, legible, seen—might open up an experience of subjectivity and responsibility beyond the circumscribed demands of the current politicotechnological management of visibilities….(More)”

Open data set to reshape charity and activism in 2016


The Guardian: “In 2015 the EU launched the world’s first international data portal, the Chinese government pledged to make state data public, and the UK lost its open data crown to Taiwan. Troves of data were unlocked by governments around the world last year, but the usefulness of much of that data is still to be determined by the civic groups, businesses and governments who use it. So what’s in the pipeline? And how will the open data ecosystem grow in 2016? We asked the experts.

1. Data will be seen as infrastructure (Heather Savory, director general for data capability, Office for National Statistics)….

2. Journalists, charities and civil society bodies will embrace open data (Hetan Shah, executive director, the Royal Statistical Society)…3. Activists will take it upon themselves to create data

3. Activists will take it upon themselves to create data (Pavel Richter, chief executive, Open Knowledge International)….

 

4. Data illiteracy will come at a heavy price (Sir Nigel Shadbolt, principal, Jesus College, Oxford, professorial research fellow in computer science, University of Oxford and chairman and co-founder of the Open Data Institute…)

5. We’ll create better tools to build a web of data (Dr Elena Simperl, associate professor, electronics and computer science, University of Southampton) …(More)”

Privacy, security and data protection in smart cities: a critical EU law perspective


CREATe Working Paper by Lilian Edwards: “Smart cities” are a buzzword of the moment. Although legal interest is growing, most academic responses at least in the EU, are still from the technological, urban studies, environmental and sociological rather than legal, sectors2 and have primarily laid emphasis on the social, urban, policing and environmental benefits of smart cities, rather than their challenges, in often a rather uncritical fashion3 . However a growing backlash from the privacy and surveillance sectors warns of the potential threat to personal privacy posed by smart cities . A key issue is the lack of opportunity in an ambient or smart city environment for the giving of meaningful consent to processing of personal data; other crucial issues include the degree to which smart cities collect private data from inevitable public interactions, the “privatisation” of ownership of both infrastructure and data, the repurposing of “big data” drawn from IoT in smart cities and the storage of that data in the Cloud.

This paper, drawing on author engagement with smart city development in Glasgow as well as the results of an international conference in the area curated by the author, argues that smart cities combine the three greatest current threats to personal privacy, with which regulation has so far failed to deal effectively; the Internet of Things(IoT) or “ubiquitous computing”; “Big Data” ; and the Cloud. While these three phenomena have been examined extensively in much privacy literature (particularly the last two), both in the US and EU, the combination is under-explored. Furthermore, US legal literature and solutions (if any) are not simply transferable to the EU because of the US’s lack of an omnibus data protection (DP) law. I will discuss how and if EU DP law controls possible threats to personal privacy from smart cities and suggest further research on two possible solutions: one, a mandatory holistic privacy impact assessment (PIA) exercise for smart cities: two, code solutions for flagging the need for, and consequences of, giving consent to collection of data in ambient environments….(More)