Urban Informatics


Special issue of Data Engineering: “Most data related to people and the built world originates in urban settings. There is increasing demand to capture and exploit this data to support efforts in areas such as Smart Cities, City Science and Intelligent Transportation Systems. Urban informatics deals with the collection, organization, dissemination and analysis of urban information used in such applications. However, the dramatic growth in the volume of this urban data creates challenges for existing data-management and analysis techniques. The collected data is also increasingly diverse, with a wide variety of sensor, GIS, imagery and graph data arising in cities. To address these challenges, urban informatics requires development of advanced data-management approaches, analysis methods, and visualization techniques. It also provides an opportunity to confront the “Variety” axis of Big Data head on. The contributions in this issue cross the spectrum of urban information, from its origin, to archiving and retrieval, to analysis and visualization. …

Collaborative Sensing for Urban Transportation (By Sergio Ilarri, et al)

Open Civic Data: Of the People, For the People, By the People (by Arnaud Sahuguet, et al, The GovLab)

Plenario: An Open Data Discovery and Exploration Platform for Urban Science (by Charlie Catlett et al)

Riding from Urban Data to Insight Using New York City Taxis (by Juliana Freire et al)…(More)”

 

Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism


Paper by Stefan Baack at Big Data and Society: “This article shows how activists in the open data movement re-articulate notions of democracy, participation, and journalism by applying practices and values from open source culture to the creation and use of data. Focusing on the Open Knowledge Foundation Germany and drawing from a combination of interviews and content analysis, it argues that this process leads activists to develop new rationalities around datafication that can support the agency of datafied publics. Three modulations of open source are identified: First, by regarding data as a prerequisite for generating knowledge, activists transform the sharing of source code to include the sharing of raw data. Sharing raw data should break the interpretative monopoly of governments and would allow people to make their own interpretation of data about public issues. Second, activists connect this idea to an open and flexible form of representative democracy by applying the open source model of participation to political participation. Third, activists acknowledge that intermediaries are necessary to make raw data accessible to the public. This leads them to an interest in transforming journalism to become an intermediary in this sense. At the same time, they try to act as intermediaries themselves and develop civic technologies to put their ideas into practice. The article concludes with suggesting that the practices and ideas of open data activists are relevant because they illustrate the connection between datafication and open source culture and help to understand how datafication might support the agency of publics and actors outside big government and big business….(More)

The case for data ethics


Steven Tiell at Accenture: “Personal data is the coin of the digital realm, which for business leaders creates a critical dilemma. Companies are being asked to gather more types of data faster than ever to maintain a competitive edge in the digital marketplace; at the same time, however, they are being asked to provide pervasive and granular control mechanisms over the use of that data throughout the data supply chain.

The stakes couldn’t be higher. If organizations, or the platforms they use to deliver services, fail to secure personal data, they expose themselves to tremendous risk—from eroding brand value and the hard-won trust of established vendors and customers to ceding market share, from violating laws to costing top executives their jobs.

To distinguish their businesses in this marketplace, leaders should be asking themselves two questions. What are the appropriate standards and practices our company needs to have in place to govern the handling of data? And how can our company make strong data controls a value proposition for our employees, customers and partners?

Defining effective compliance activities to support legal and regulatory obligations can be a starting point. However, mere compliance with existing regulations—which are, for the most part, focused on privacy—is insufficient. Respect for privacy is a byproduct of high ethical standards, but it is only part of the picture. Companies need to embrace data ethics, an expansive set of practices and behaviors grounded in a moral framework for the betterment of a community (however defined).

 RAISING THE BAR

Why ethics? When communities of people—in this case, the business community at large—encounter new influences, the way they respond to and engage with those influences becomes the community’s shared ethics. Individuals who behave in accordance with these community norms are said to be moral, and those who are exemplary are able to gain the trust of their community.

Over time, as ethical standards within a community shift, the bar for trustworthiness is raised on the assumption that participants in civil society must, at a minimum, adhere to the rule of law. And thus, to maintain moral authority and a high degree of trust, actors in a community must constantly evolve to adopt the highest ethical standards.

Actors in the big data community, where security and privacy are at the core of relationships with stakeholders, must adhere to a high ethical standard to gain this trust. This requires them to go beyond privacy law and existing data control measures. It will also reward those who practice strong ethical behaviors and a high degree of transparency at every stage of the data supply chain. The most successful actors will become the platform-based trust authorities, and others will depend on these platforms for disclosure, sharing and analytics of big data assets.

Data ethics becomes a value proposition only once controls and capabilities are in place to granularly manage data assets at scale throughout the data supply chain. It is also beneficial when a community shares the same behavioral norms and taxonomy to describe the data itself, the ethical decision points along the data supply chain, and how those decisions lead to beneficial or harmful impacts….(More)”

Data, Human Rights & Human Security


Paper by Mark Latonero and  Zachary Gold“In today’s global digital ecosystem, mobile phone cameras can document and distribute images of physical violence. Drones and satellites can assess disasters from afar. Big data collected from social media can provide real-time awareness about political protests. Yet practitioners, researchers, and policymakers face unique challenges and opportunities when assessing technological benefit, risk, and harm. How can these technologies be used responsibly to assist those in need, prevent abuse, and protect people from harm?”

Mark Latonero and Zachary Gold address the issues in this primer for technologists, academics, business, governments, NGOs, intergovernmental organizations — anyone interested in the future of human rights and human security in a data-saturated world….(Download PDF)”

The Data Revolution


Review of Rob Kitchin’s The Data Revolution: Big Data, Open Data, Data Infrastructures & their Consequences by David Moats in Theory, Culture and Society: “…As an industry, academia is not immune to cycles of hype and fashion. Terms like ‘postmodernism’, ‘globalisation’, and ‘new media’ have each had their turn filling the top line of funding proposals. Although they are each grounded in tangible shifts, these terms become stretched and fudged to the point of becoming almost meaningless. Yet, they elicit strong, polarised reactions. For at least the past few years, ‘big data’ seems to be the buzzword, which elicits funding, as well as the ire of many in the social sciences and humanities.

Rob Kitchin’s book The Data Revolution is one of the first systematic attempts to strip back the hype surrounding our current data deluge and take stock of what is really going on. This is crucial because this hype is underpinned by very real societal change, threats to personal privacy and shifts in store for research methods. The book acts as a helpful wayfinding device in an unfamiliar terrain, which is still being reshaped, and is admirably written in a language relevant to social scientists, comprehensible to policy makers and accessible even to the less tech savvy among us.

The Data Revolution seems to present itself as the definitive account of this phenomena but in filling this role ends up adopting a somewhat diplomatic posture. Kitchin takes all the correct and reasonable stances on the matter and advocates all the right courses of action but he is not able to, in the context of this book, pursue these propositions fully. This review will attempt to tease out some of these latent potentials and how they might be pushed in future work, in particular the implications of the ‘performative’ character of both big data narratives and data infrastructures for social science research.

Kitchin’s book starts with the observation that ‘data’ is a misnomer – etymologically data should refer to phenomena in the world which can be abstracted, measured etc. as opposed to the representations and measurements themselves, which should by all rights be called ‘capta’. This is ironic because the worst offenders in what Kitchin calls “data boosterism” seem to conflate data with ‘reality’, unmooring data from its conditions of production and making relationship between the two given or natural.

As Kitchin notes, following Bowker (2005), ‘raw data’ is an oxymoron: data are not so much mined as produced and are necessarily framed technically, ethically, temporally, spatially and philosophically. This is the central thesis of the book, that data and data infrastructures are not neutral and technical but also social and political phenomena. For those at the critical end of research with data, this is a starting assumption, but one which not enough practitioners heed. Most of the book is thus an attempt to flesh out these rapidly expanding data infrastructures and their politics….

Kitchin is at his best when revealing the gap between the narratives and the reality of data analysis such as the fallacy of empiricism – the assertion that, given the granularity and completeness of big data sets and the availability of machine learning algorithms which identify patterns within data (with or without the supervision of human coders), data can “speak for themselves”. Kitchin reminds us that no data set is complete and even these out-of-the-box algorithms are underpinned by theories and assumptions in their creation, and require context specific knowledge to unpack their findings. Kitchin also rightly raises concerns about the limits of big data, that access and interoperability of data is not given and that these gaps and silences are also patterned (Twitter is biased as a sample towards middle class, white, tech savy people). Yet, this language of veracity and reliability seems to suggest that big data is being conceptualised in relation to traditional surveys, or that our population is still the nation state, when big data could helpfully force us to reimagine our analytic objects and truth conditions and more pressingly, our ethics (Rieder, 2013).

However, performativity may again complicate things. As Kitchin observes, supermarket loyalty cards do not just create data about shopping, they encourage particular sorts of shopping; when research subjects change their behaviour to cater to the metrics and surveillance apparatuses built into platforms like Facebook (Bucher, 2012), then these are no longer just data points representing the social, but partially constitutive of new forms of sociality (this is also true of other types of data as discussed by Savage (2010), but in perhaps less obvious ways). This might have implications for how we interpret data, the distribution between quantitative and qualitative approaches (Latour et al., 2012) or even more radical experiments (Wilkie et al., 2014). Kitchin is relatively cautious about proposing these sorts of possibilities, which is not the remit of the book, though it clearly leaves the door open…(More)”

How a Mexico City Traffic Experiment Connects to Community Trust


Zoe Mendelson in Next Cities: “Last November, Gómez-Mont, Jose Castillo, an urban planning professor at Harvard’s Graduate School of Design, and Carlos Gershenson, their data analyst, won the Audi Urban Future award for their plan to use big data to solve Mexico City’s traffic problem. The plan consists of three parts, the first a data-donating platform that collects information on origin and destination, transit times, and modes of transit. The app, Living Mobs, is now in use in beta form. The plan also establishes data-sharing partnerships with companies, educational institutions and government agencies. So far, they’ve already signed on Yaxi, Microsoft, Movistar and Uber among others, and collected 14,000 datasets.

The data will be a welcome new resource for the city. “We just don’t have enough,” explains Gómez-Mont, “we call it ‘big city, little data.” The city’s last origin-destination survey conducted in 2007 only caught data from 50,000 people, which at the time was somewhat of a feat. Now, just one of their current data-sharing partners, Yaxi, has 10,000 cars circulating alone. Still, they have one major obstacle to a comprehensive citywide survey that can only be partially addressed by their data-donating platform (which also, of course, does depend on people having smartphones): 60 percent of transportation in Mexico City is on a hard-to-track informal bus system.

The data will eventually end up in an app that gives people real-time transit information. But an underlying idea — that traffic can be solved simply by asking people to take turns — is the project’s most radical and interesting component. Gómez-Mont paints a seductive alternative futuristic vision of incentivized negotiation of the city.

“Say I wake up and while getting ready for work I check and see that Périferico is packed and I say, ‘OK, today I’m going to use my bike or take public transit,’ and maybe I earn some kind of City Points, which translates into a tax break. Or maybe I’m on Périferico and earn points for getting off to relieve congestion.” She even envisions a system through which people could submit their calendar data weeks in advance. With the increasing popularity of Google Calendar and other similar systems that sync with smartphones, advanced “data donation” doesn’t seem that far-fetched.

Essentially, the app would create the opportunity for an entire city to behave as a group and solve its own problems together in real time.

Gómez-Mont insists that mobility is not just a problem for the government to solve. “It’s also very much about citizens and how we behave and what type of culture is embedded in the world outside of the government,” she notes….(More)”.

Researcher uncovers inherent biases of big data collected from social media sites


Phys.org: “With every click, Facebook, Twitter and other social media users leave behind digital traces of themselves, information that can be used by businesses, government agencies and other groups that rely on “big data.”

But while the information derived from social network sites can shed light on social behavioral traits, some analyses based on this type of data collection are prone to bias from the get-go, according to new research by Northwestern University professor Eszter Hargittai, who heads the Web Use Project.

Since people don’t randomly join Facebook, Twitter or LinkedIn—they deliberately choose to engage —the data are potentially biased in terms of demographics, socioeconomic background or Internet skills, according to the research. This has implications for businesses, municipalities and other groups who use because it excludes certain segments of the population and could lead to unwarranted or faulty conclusions, Hargittai said.

The study, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” was published last month in the journal The Annals of the American Academy of Political and Social Science and is part of a larger, ongoing study.

The buzzword “big data” refers to automatically generated information about people’s behavior. It’s called “big” because it can easily include millions of observations if not more. In contrast to surveys, which require explicit responses to questions, big data is created when people do things using a service or system.

“The problem is that the only people whose behaviors and opinions are represented are those who decided to join the site in the first place,” said Hargittai, the April McClain-Delaney and John Delaney Professor in the School of Communication. “If people are analyzing big data to answer certain questions, they may be leaving out entire groups of people and their voices.”

For example, a city could use Twitter to collect local opinion regarding how to make the community more “age-friendly” or whether more bike lanes are needed. In those cases, “it’s really important to know that people aren’t on Twitter randomly, and you would only get a certain type of person’s response to the question,” said Hargittai.

“You could be missing half the population, if not more. The same holds true for companies who only use Twitter and Facebook and are looking for feedback about their products,” she said. “It really has implications for every kind of group.”…

More information: “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” The Annals of the American Academy of Political and Social Science May 2015 659: 63-76, DOI: 10.1177/0002716215570866

A computational algorithm for fact-checking


Kurzweil News: “Computers can now do fact-checking for any body of knowledge, according to Indiana University network scientists, writing in an open-access paper published June 17 in PLoS ONE.

Using factual information from summary infoboxes from Wikipedia* as a source, they built a “knowledge graph” with 3 million concepts and 23 million links between them. A link between two concepts in the graph can be read as a simple factual statement, such as “Socrates is a person” or “Paris is the capital of France.”

In the first use of this method, IU scientists created a simple computational fact-checker that assigns “truth scores” to statements concerning history, geography and entertainment, as well as random statements drawn from the text of Wikipedia. In multiple experiments, the automated system consistently matched the assessment of human fact-checkers in terms of the humans’ certitude about the accuracy of these statements.

Dealing with misinformation and disinformation

In what the IU scientists describe as an “automatic game of trivia,” the team applied their algorithm to answer simple questions related to geography, history, and entertainment, including statements that matched states or nations with their capitals, presidents with their spouses, and Oscar-winning film directors with the movie for which they won the Best Picture awards. The majority of tests returned highly accurate truth scores.

Lastly, the scientists used the algorithm to fact-check excerpts from the main text of Wikipedia, which were previously labeled by human fact-checkers as true or false, and found a positive correlation between the truth scores produced by the algorithm and the answers provided by the fact-checkers.

Significantly, the IU team found their computational method could even assess the truthfulness of statements about information not directly contained in the infoboxes. For example, the fact that Steve Tesich — the Serbian-American screenwriter of the classic Hoosier film “Breaking Away” — graduated from IU, despite the information not being specifically addressed in the infobox about him.

Using multiple sources to improve accuracy and richness of data

“The measurement of the truthfulness of statements appears to rely strongly on indirect connections, or ‘paths,’ between concepts,” said Giovanni Luca Ciampaglia, a postdoctoral fellow at the Center for Complex Networks and Systems Research in the IU Bloomington School of Informatics and Computing, who led the study….

“These results are encouraging and exciting. We live in an age of information overload, including abundant misinformation, unsubstantiated rumors and conspiracy theories whose volume threatens to overwhelm journalists and the public. Our experiments point to methods to abstract the vital and complex human task of fact-checking into a network analysis problem, which is easy to solve computationally.”

Expanding the knowledge base

Although the experiments were conducted using Wikipedia, the IU team’s method does not assume any particular source of knowledge. The scientists aim to conduct additional experiments using knowledge graphs built from other sources of human knowledge, such as Freebase, the open-knowledge base built by Google, and note that multiple information sources could be used together to account for different belief systems….(More)”

Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source Indicators


Paper by Naren Ramakrishnan et al: “We describe the design, implementation, and evaluation of EMBERS, an automated, 24×7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings….(More)”

Big Data’s Impact on Public Transportation


InnovationEnterprise: “Getting around any big city can be a real pain. Traffic jams seem to be a constant complaint, and simply getting to work can turn into a chore, even on the best of days. With more people than ever before flocking to the world’s major metropolitan areas, the issues of crowding and inefficient transportation only stand to get much worse. Luckily, the traditional methods of managing public transportation could be on the verge of changing thanks to advances in big data. While big data use cases have been a part of the business world for years now, city planners and transportation experts are quickly realizing how valuable it can be when making improvements to city transportation. That hour long commute may no longer be something travelers will have to worry about in the future.

In much the same way that big data has transformed businesses around the world by offering greater insight in the behavior of their customers, it can also provide a deeper look at travellers. Like retail customers, commuters have certain patterns they like to keep to when on the road or riding the rails. Travellers also have their own motivations and desires, and getting to the heart of their actions is all part of what big data analytics is about. By analyzing these actions and the factors that go into them, transportation experts can gain a better understanding of why people choose certain routes or why they prefer one method of transportation over another. Based on these findings, planners can then figure out where to focus their efforts and respond to the needs of millions of commuters.

Gathering the accurate data needed to make knowledgeable decisions regarding city transportation can be a challenge in itself, especially considering how many people commute to work in a major city. New methods of data collection have made that effort easier and a lot less costly. One way that’s been implemented is through the gathering of call data records (CDR). From regular transactions made from mobile devices, information about location, time, and duration of an action (like a phone call) can give data scientists the necessary details on where people are traveling to, how long it takes them to get to their destination, and other useful statistics. The valuable part of this data is the sample size, which provides a much bigger picture of the transportation patterns of travellers.

That’s not the only way cities are using big data to improve public transportation though. Melbourne in Australia has long been considered one of the world’s best cities for public transit, and much of that is thanks to big data. With big data and ad hoc analysis, Melbourne’s acclaimed tram system can automatically reconfigure routes in response to sudden problems or challenges, such as a major city event or natural disaster. Data is also used in this system to fix problems before they turn serious.Sensors located in equipment like tram cars and tracks can detect when maintenance is needed on a specific part. Crews are quickly dispatched to repair what needs fixing, and the tram system continues to run smoothly. This is similar to the idea of the Internet of Things, wherein embedded sensors collect data that is then analyzed to identify problems and improve efficiency.

Sao Paulo, Brazil is another city that sees the value of using big data for its public transportation. The city’s efforts concentrate on improving the management of its bus fleet. With big data collected in real time, the city can get a more accurate picture of just how many people are riding the buses, which routes are on time, how drivers respond to changing conditions, and many other factors. Based off of this information, Sao Paulo can optimize its operations, providing added vehicles where demand is genuine whilst finding which routes are the most efficient. Without big data analytics, this process would have taken a very long time and would likely be hit-or-miss in terms of accuracy, but now, big data provides more certainty in a shorter amount of time….(More)”