Linguistic Mapping Reveals How Word Meanings Sometimes Change Overnight


Emerging Technology From the arXiv: “In October 2012, Hurricane Sandy approached the eastern coast of the United States. At the same time, the English language was undergoing a small earthquake of its own. Just months before, the word “sandy” was an adjective meaning “covered in or consisting mostly of sand” or “having light yellowish brown color.” Almost overnight, this word gained an additional meaning as a proper noun for one of the costliest storms in U.S. history.
A similar change occurred to the word “mouse” in the early 1970s when it gained the new meaning of “computer input device.” In the 1980s, the word “apple” became a proper noun synonymous with the computer company. And later, the word “windows” followed a similar course after the release of the Microsoft operating system.
All this serves to show how language constantly evolves, often slowly but at other times almost overnight. Keeping track of these new senses and meanings has always been hard. But not anymore.
Today, Vivek Kulkarni at Stony Brook University in New York and a few pals show how they have tracked these linguistic changes by mining the corpus of words stored in databases such as Google Books, movie reviews from Amazon, and of course the microblogging site Twitter.
These guys have developed three ways to spot changes in the language. The first is a simple count of how often words are used, using tools such as Google Trends. For example, in October 2012, the frequency of the words “Sandy” and “hurricane” both spiked in the runup to the storm. However, only one of these words changed its meaning, something that a frequency count cannot spot.
So Kulkarni and co have a second method in which they label all of the words in the databases according to their parts of speech, whether a noun, a proper noun, a verb, an adjective and so on. This clearly reveals a change in the way the word “Sandy” was used, from adjective to proper noun, while also showing that the word “hurricane” had not changed.
The parts of speech technique is useful but not infallible. It cannot pick up the change in meaning of the word mouse, both of which are nouns. So the team have a third approach.
This maps the linguistic vector space in which words are embedded. The idea is that words in this space are close to other words that appear in similar contexts. For example, the word “big” is close to words such as “large,” “huge,” “enormous,” and so on.
By examining the linguistic space at different points in history, it is possible to see how meanings have changed. For example, in the 1950s, the word “gay” was close to words such as “cheerful” and “dapper.” Today, however, it has moved significantly to be closer to words such as “lesbian,” homosexual,” and so on.
Kulkarni and co examine three different databases to see how words have changed: the set of five-word sequences that appear in the Google Books corpus, Amazon movie reviews since 2000, and messages posted on Twitter between September 2011 and October 2013.
Their results reveal not only which words have changed in meaning, but when the change occurred and how quickly. For example, before the 1970s, the word “tape” was used almost exclusively to describe adhesive tape but then gained an additional meaning of “cassette tape.”…”

USDA Opens VIVO Research Networking Tool to Public


 Sharon Durham at the USDA: VIVO, a Web application used internally by U.S. Department of Agriculture (USDA) scientists since 2012 to allow better national networking across disciplines and locations, is now available to the public. USDA VIVO will be a “one-stop shop” for Federal agriculture expertise and research outcomes.”USDA employs over 5,000 researchers to ensure our programs are based on sound public policy and the best available science,” said USDA Chief Scientist and Undersecretary for Research, Education, and Economics Dr. Catherine Woteki. “USDA VIVO provides a powerful Web search tool for connecting interdisciplinary researchers, research projects and outcomes with others who might bring a different approach or scope to a research project. Inviting private citizens to use the system will increase the potential for collaboration to solve food- and agriculture-related problems.”
The idea behind USDA VIVO is to link researchers with peers and potential collaborators to ignite synergy among our nation’s best scientific minds and to spark unique approaches to some of our toughest agricultural problems. This efficient networking tool enables scientists to easily locate others with a particular expertise. VIVO also makes it possible to quickly identify scientific expertise and respond to emerging agricultural issues, like specific plant and animal disease or pests.
USDA’s Agricultural Research Service (ARS), Economic Research Service, National Institute of Food and Agriculture, National Agricultural Statistics Service and Forest Service are the first five USDA agencies to participate in VIVO. The National Agricultural Library, which is part of ARS, will host the Web application. USDA hopes to add other agencies in the future.
VIVO was in part developed under a $12.2 million grant from the National Center for Research Resources, part of the National Institutes of Health (NIH). The grant, made under the 2009 American Recovery and Reinvestment Act, was provided to the University of Florida and collaborators at Cornell University, Indiana University, Weill Cornell Medical College, Washington University in St. Louis, the Scripps Research Institute and the Ponce School of Medicine.
VIVO’s underlying database draws information about research being conducted by USDA scientists from official public systems of record and then makes it uniformly available for searching. The data can then be easily leveraged in other applications. In this way, USDA is also making its research projects and related impacts available to the Federal RePORTER tool, released by NIH on September 22, 2014. Federal RePORTER is part of a collaborative effort between Federal entities and other research institutions to create a repository that will be useful to assess the impact of Federal research and development investments.”

Colombia’s Data-Driven Fight Against Crime


One Monday in 1988, El Mundo newspaper of Medellín, Colombia, reported, as it did every Monday, on the violent deaths in the city of two million people over the weekend. An article giving an hour-by-hour description of the deaths from Saturday night to Sunday night was remarkable for, among other things, the journalist’s skill in finding different ways to report a murder. “Someone took the life of Luís Alberto López at knife point … Luís Alberto Patiño ceased to exist with a bullet in his head … Mario Restrepo turned up dead … An unidentified person killed Néstor Alvarez with three shots.” In reporting 27 different murders, the author repeated his phrasing only once.

….What Guerrero did to make Cali safer was remarkable because it worked, and because of the novelty of his strategy. Before becoming mayor, Guerrero was not a politician, but a Harvard-trained epidemiologist who was president of the Universidad del Valle in Cali. He set out to prevent murder the way a doctor prevents disease. What public health workers are doing now to stop the spread of Ebola, Guerrero did in Cali to stop the spread of violence.

Although his ideas have now been used in dozens of cities throughout Latin America, they are worth revisiting because they are not employed in the places that need them most. The most violent places in Latin America are Honduras, El Salvador and Guatemala — indeed, they are among the most violent countries in the world not at war. The wave of youth migration to the United States is from these countries, and the refugees are largely fleeing violence.

One small municipality in El Salvador, Santa Tecla, has employed Cali’s strategies since about 10 years ago, and the homicide rate has dropped there. But Santa Tecla is an anomaly. Most of the region’s cities have not tried to do what Guerrero did — and they are failing to protect their citizens….

Guerrero went on to spread his ideas. Working with the Pan-American Health Organization and the Inter-American Development Bank, he took his epidemiological methods to 18 other countries.

“The approach was very low-cost and pragmatic,” said Joan Serra Hoffman, a senior specialist in crime and violence prevention in Latin America and the Caribbean at the World Bank. “You could see it was conceived by someone who was an academic and a policy maker. It can be fully operational for between $50,000 and $80,000.”…

How to use the Internet to end corrupt deals between companies and governments


Stella Dawson at the Thomson Reuters Foundation: “Every year governments worldwide spend more than $9.5 trillion on public goods and services, but finding out who won those contracts, why and whether they deliver as promised is largely invisible.
Enter the Open Contracting Data Standard (OCDS).
Canada, Colombia, Costa Rica and Paraguay became the first countries to announce on Tuesday that they have adopted the new global standards for publishing contracts online as part of a project to shine a light on how public money is spent and to combat massive corruption in public procurement.
“The mission is to end secret deals between companies and governments,” said Gavin Hayman, the incoming executive director for Open Contracting Partnership.
The concept is simple. Under Open Contracting, the government publishes online the projects it is putting out for bid and the terms; companies submit bids online; the winning contract is published including the reasons why; and then citizens can monitor performance according to the terms of the contract.
The Open Contracting initiative, developed by the World Wide Web Foundation with the support of the World Bank and Omidyar Network, has been several years in the making and is part of a broader global movement to increase the accountability of governments by using Internet technologies to make them more transparent.
A pioneer in data transparency was the Extractive Industries Transparency Initiative, a global coalition of governments, companies and civil society that works on improving accountability by publishing the revenues received in 35 member countries for their natural resources.
Publish What You Fund is a similar initiative for the aid industry. It delivered a common open standards in 2011 for donor countries to publish how much money they gave in development aid and details of what projects that money funded and where.
There’s also the Open Government Partnership, an international forum of 65 countries, each of which adopts an action plan laying out how it will improve the quality of government through collaboration with civil society, frequently using new technologies.
All of these initiatives have helped crack open the door of government.
What’s important about Open Contracting is the sheer scale of impact it could have. Public procurement accounts for about 15 percent of global GDP and according to Anne Jellema, CEO of the World Wide Web Foundation which seeks to expand free access to the web worldwide and backed the OCDS project, corruption adds an estimated $2.3 trillion to the cost of those contracts every year.
A study by the Center for Global Development, a Washington-based think tank, looked at four countries already publishing their contracts online — the United Kingdom, Georgia, Colombia and Slovakia. It found open contracting increased visibility and encouraged more companies to submit bids, the quality and price competitiveness improved and citizen monitoring meant better service delivery….”
 

Good data make better cities


Stephen Goldsmith and Susan Crawford at the Boston Globe: “…Federal laws prevent sharing of information among state workers helping the same family. In one state’s public health agency, workers fighting obesity cannot receive information from another official inside the same agency assigned to a program aimed at fighting diabetes. In areas where citizens are worried about environmental justice, sensors collecting air quality information are feared — because they could monitor the movements of people. Cameras that might provide a crucial clue to the identity of a terrorist are similarly feared because they might capture images of innocent bystanders.
In order for the public to develop confidence that data tools work for its betterment, not against it, we have work to do. Leaders need to establish policies covering data access, retention, security, and transparency. Forensic capacity — to look back and see who had access to what for what reason — should be a top priority in the development of any data system. So too should clear consequences for data misuse by government employees.
If we get this right, the payoffs for democracy will be enormous. Data can provide powerful insights into the equity of public services and dramatically increase the effectiveness of social programs. Existing 311 digital systems can become platforms for citizen engagement rather than just channels for complaints. Government services can be graded by citizens and improved in response to a continuous loop of interaction. Cities can search through anonymized data in a huge variety of databases for correlations between particular facts and desired outcomes and then apply that knowledge to drive toward results — what can a city do to reduce rates of obesity and asthma? What bridges are in need of preventative maintenance? And repurposing dollars from ineffective programs and vendors to interventions that work will help cities be safer, cleaner, and more effective.
The digital revolution has finally reached inside the walls of city hall, making this the best time within living memory to be involved in local government. We believe that doing many small things right using data will build trust, making it more likely that citizens will support their city’s need to do big things — including addressing economic dislocation.
Data rules should genuinely protect individuals, not limit our ability to serve them better. When it comes to data, unreasoning fear is our greatest enemy…”

The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance


New Paper by Ming-Hsiang Tsou et al in the Journal of Medical Internet Research: “Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surveillance due to the availability of large amounts of data. In this study, tweets, or posts of 140 characters or less, from the website Twitter were collected and analyzed for their potential as surveillance for seasonal influenza.
Objective: There were three aims: (1) to improve the correlation of tweets to sentinel-provided influenza-like illness (ILI) rates by city through filtering and a machine-learning classifier, (2) to observe correlations of tweets for emergency department ILI rates by city, and (3) to explore correlations for tweets to laboratory-confirmed influenza cases in San Diego.
Methods: Tweets containing the keyword “flu” were collected within a 17-mile radius from 11 US cities selected for population and availability of ILI data. At the end of the collection period, 159,802 tweets were used for correlation analyses with sentinel-provided ILI and emergency department ILI rates as reported by the corresponding city or county health department. Two separate methods were used to observe correlations between tweets and ILI rates: filtering the tweets by type (non-retweets, retweets, tweets with a URL, tweets without a URL), and the use of a machine-learning classifier that determined whether a tweet was “valid”, or from a user who was likely ill with the flu.
Results: Correlations varied by city but general trends were observed. Non-retweets and tweets without a URL had higher and more significant (P<.05) correlations than retweets and tweets with a URL. Correlations of tweets to emergency department ILI rates were higher than the correlations observed for sentinel-provided ILI for most of the cities. The machine-learning classifier yielded the highest correlations for many of the cities when using the sentinel-provided or emergency department ILI as well as the number of laboratory-confirmed influenza cases in San Diego. High correlation values (r=.93) with significance at P<.001 were observed for laboratory-confirmed influenza cases for most categories and tweets determined to be valid by the classifier.
Conclusions: Compared to tweet analyses in the previous influenza season, this study demonstrated increased accuracy in using Twitter as a supplementary surveillance tool for influenza as better filtering and classification methods yielded higher correlations for the 2013-2014 influenza season than those found for tweets in the previous influenza season, where emergency department ILI rates were better correlated to tweets than sentinel-provided ILI rates. Further investigations in the field would require expansion with regard to the location that the tweets are collected from, as well as the availability of more ILI data…”

Off the map


The Economist: “Rich countries are deluged with data; developing ones are suffering from drought…
AFRICA is the continent of missing data. Fewer than half of births are recorded; some countries have not taken a census in several decades. On maps only big cities and main streets are identified; the rest looks as empty as the Sahara. Lack of data afflicts other developing regions, too. The self-built slums that ring many Latin American cities are poorly mapped, and even estimates of their population are vague. Afghanistan is still using census figures from 1979—and that count was cut short after census-takers were killed by mujahideen.
As rich countries collect and analyse data from as many objects and activities as possible—including thermostats, fitness trackers and location-based services such as Foursquare—a data divide has opened up. The lack of reliable data in poor countries thwarts both development and disaster-relief. When Médecins Sans Frontières (MSF), a charity, moved into Liberia to combat Ebola earlier this year, maps of the capital, Monrovia, fell far short of what was needed to provide aid or track the disease’s spread. Major roads were marked, but not minor ones or individual buildings.
Poor data afflict even the highest-profile international development effort: the Millennium Development Goals (MDGs). The targets, which include ending extreme poverty, cutting infant mortality and getting all children into primary school, were set by UN members in 2000, to be achieved by 2015. But, according to a report by an independent UN advisory group published on November 6th, as the deadline approaches, the figures used to track progress are shaky. The availability of data on 55 core indicators for 157 countries has never exceeded 70%, it found (see chart)….
Some of the data gaps are now starting to be filled from non-government sources. A volunteer effort called Humanitarian OpenStreetMap Team (HOT) improves maps with information from locals and hosts “mapathons” to identify objects shown in satellite images. Spurred by pleas from those fighting Ebola, the group has intensified its efforts in Monrovia since August; most of the city’s roads and many buildings have now been filled in (see maps). Identifying individual buildings is essential, since in dense slums without formal roads they are the landmarks by which outbreaks can be tracked and assistance targeted.
On November 7th a group of charities including MSF, Red Cross and HOT unveiled MissingMaps.org, a joint initiative to produce free, detailed maps of cities across the developing world—before humanitarian crises erupt, not during them. The co-ordinated effort is needed, says Ivan Gayton of MSF: aid workers will not use a map with too little detail, and are unlikely, without a reason, to put work into improving a map they do not use. The hope is that the backing of large charities means the locals they work with will help.
In Kenya and Namibia mobile-phone operators have made call-data records available to researchers, who have used them to combat malaria. By comparing users’ movements with data on outbreaks, epidemiologists are better able to predict where the disease might spread. mTrac, a Ugandan programme that replaces paper reports from health workers with texts sent from their mobile phones, has made data on medical cases and supplies more complete and timely. The share of facilities that have run out of malaria treatments has fallen from 80% to 15% since it was introduced.
Private-sector data are also being used to spot trends before official sources become aware of them. Premise, a startup in Silicon Valley that compiles economics data in emerging markets, has found that as the number of cases of Ebola rose in Liberia, the price of staple foods soared: a health crisis risked becoming a hunger crisis. In recent weeks, as the number of new cases fell, prices did, too. The authorities already knew that travel restrictions and closed borders would push up food prices; they now have a way to measure and track price shifts as they happen….”

Spain is trialling city monitoring using sound


Springwise: “There’s more traffic on today’s city streets than there ever has been, and managing it all can prove to be a headache for local authorities and transport bodies. In the past, we’ve seen the City of Calgary in Canada detect drivers’ Bluetooth signals to develop a map of traffic congestion. Now the EAR-IT project in Santander, Spain, is using acoustic sensors to measure the sounds of city streets and determine real time activity on the ground.
Launched as part of the autonomous community’s SmartSantander initiative, the experimental scheme placed hundreds of acoustic processing units around the region. These pick up the sounds being made in any given area and, when processed through an audio recognition engine, can provide data about what’s going on on the street. Smaller ‘motes’ were also developed to provide more accurate location information about each sound.
Created by members of Portugal’s UNINOVA institute and IT consultants EGlobalMark, the system was able to use city noises to detect things such as traffic congestion, parking availability and the location of emergency vehicles based on their sirens. It could then automatically trigger smart signs to display up-to-date information, for example.
The team particularly focused on a junction near the city hospital that’s a hotspot for motor accidents. Rather than force ambulance drivers to risk passing through a red light and into lateral traffic, the sensors were able to detect when and where an emergency vehicle was coming through and automatically change the lights in their favor.
The system could also be used to pick up ‘sonic events’ such as gunshots or explosions and detect their location. The researchers have also trialled an indoor version that can sense if an elderly resident has fallen over or to turn lights off when the room becomes silent.”

Why the World Needs Anonymous


Gabriella Coleman at MIT Technology Review: Anonymity is under attack, and yet the actions of a ragtag band of hackers, activists, and rabble-rousers reveal how important it remains.
“It’s time to end anonymous comments sections,” implored Kevin Wallsten and Melinda Tarsi in the Washington Post this August. In the U.K., a parliamentary committee has even argued for a “cultural shift” against treating pseudonymous comments as trustworthy. This assault is matched by pervasive practices of monitoring and surveillance, channeled through a stunning variety of mechanisms—from CCTV cameras to the constant harvesting of digital data.
But just as anonymity’s value has sunk to a new low in the eyes of some, a protest movement in favor of concealment has appeared. The hacker collective Anonymous is most famous for its controversial crusades against the likes of dictators, corporations, and pseudo-religions like Scientology. But the group is also the embodiment of this new spirit.
Anonymous may strike a reader as unique, but its efforts represent just the latest in experimentation with anonymous speech as a conduit for political expression. Anonymous expression has been foundational to our political culture, characterizing monumental declarations like the Federalist Papers, and the Supreme Court has repeatedly granted anonymous speech First Amendment protection.
The actions of this group are also important because anonymity remains important to us all. Universally enforcing disclosure of real identities online would limit the possibilities for whistle-blowing and voicing unpopular beliefs—processes essential to any vibrant democracy. And just as anonymity can engender disruptive and antisocial behavior such as trolling, it can provide a means of pushing back against increased surveillance.
By performing a role increasingly unavailable to most Internet users as they participate in social networks and other gated communities requiring real names, Anonymous dramatizes the existence of other possibilities. Its members symbolically incarnate struggles against the constant, blanket government surveillance revealed by Edward Snowden and many before him.
As an anthropologist who has spent half a dozen years studying Anonymous, I’ve have had the unique opportunity to witness and experience just how these activists conceive of and enact obfuscation. It is far from being implemented mindlessly. Indeed, there are important ethical lessons that we can draw from their successes and failures.
Often Anonymous activists, or “Anons,” interact online under the cover of pseudo-anonymity. Typically, this takes the form of a persistent nickname, otherwise known as a handle, around which a reputation necessarily accrues. Among the small fraction of law-breaking Anons, pseudo-anonymity is but one among a roster of tactics for achieving operational security. These include both technical solutions, such as encryption and anonymizing software, and cultivation of the restraint necessary to prevent the disclosure of personal information.
The great majority of Anonymous participants are neither hackers nor lawbreakers but must nonetheless remain circumspect in what they reveal about themselves and others. Sometimes, ignorance is the easiest way to ensure protection. A participant who helped build up one of the larger Anonymous accounts erected a self-imposed fortress between herself and the often-private Internet Relay Chat channels where law-breaking Anons cavorted and planned actions. It was a “wall,” as she put it, which she sought never to breach.
During the course of my research, I eschewed anonymity and mitigated risk by erecting the same wall, making sure not to climb over it. But some organizers were more intrepid. Since they associated with lawbreakers or even witnessed planning of illegal activity on IRC, they chose to cloak themselves for self-protection.
Regardless of the reasons for maintaining anonymity, it shaped many of the ethical norms and mores of the group. The source of this ethic is partly indebted to 4chan, a hugely popular, and deeply subversive, image board that enforced the name “Anonymous” for all users, thus hatching the idea’s potential (see “Radical Opacity”)….
See also: Hacker, Hoaxer, Whistleblower, Spy: The Many Faces of Anonymous.

Open Elections


“Welcome to OpenElections Our goal is to create the first free, comprehensive, standardized, linked set of election data for the United States, including federal and statewide offices. No freely available comprehensive source of official election results exists. The current options for election data can be difficult to find and use or financially out-of-reach for most journalists and civic hackers. We want the people who work with election data to be able to get what they need, whether that’s a CSV file for stories and data analysis or JSON usable for web applications and interactive graphics. OpenElections is generously supported by the John S. and James L. Knight Foundation’s Knight News Challenge.”