Paper by Bailey Smith for The Wilson Center’s Science and Technology Innovation Program: “New ways to gather data are on the rise. One of these ways is through citizen science. According to a new paper by Bailey Smith, JD, federal agencies can feel confident about using citizen science for a few reasons. First, the legal system provides significant protection from liability through the Federal Torts Claim Act (FTCA) and Administrative Procedures Act (APA). Second, training and technological innovation has made it easier for the non-scientist to collect high quality data.”
When Big Data Maps Your Safest, Shortest Walk Home
Sarah Laskow at NextCity: “Boston University and University of Pittsburgh researchers are trying to do the same thing that got the creators of the app SketchFactor into so much trouble over the summer. They’re trying to show people how to avoid dangerous spots on city streets while walking from one place to another.
“What we are interested in is finding paths that offer trade-offs between safety and distance,” Esther Galbrun, a postdoc at Boston University, recently said in New York at the 3rd International Workshop on Urban Computing, held in conjunction with KDD2014.
She was presenting, “Safe Navigation in Urban Environments,” which describes a set of algorithms that would give a person walking through a city options for getting from one place to another — the shortest path, the safest path and a number of alternatives that balanced between both factors. The paper takes existing algorithms, well defined in theory — nothing new or fancy, Galbrun says — and applies them to a problem that people face everyday.
Imagine, she suggests, that a person is standing at the Philadelphia Museum of Art, and he wants to walk home, to his place on Wharton Street. (Galbrun and her colleagues looked at Philadelphia and Chicago because those cities have made their crime data openly available.) The walk is about three miles away, and one option would be to take the shortest path back. But maybe he’s worried about safety. Maybe he’s willing to take a little bit of a longer walk if it means he has to worry less about crime. What route should he take then?
Services like Google Maps have excelled at finding the shortest, most direct routes from Point A to Point B. But, increasingly, urban computing is looking to capture other aspects of moving about a place. “Fast is only one option,” says co-author Konstantinos Pelechrinis. “There are noble objectives beyond the surface path that you can put inside this navigation problem.” You might look for the path that will burn the most calories; a Yahoo! lab has considered how to send people along the most scenic route.
But working on routes that do more than give simple directions can have its pitfalls. The SketchFactor app relies both on crime data, when it’s available, and crowdsourced comments to reveal potential trouble spots to users. When it was released this summer, tech reporters and other critics immediately started talking about how it could easily become a conduit for racism. (“Sketchy” is, after all, a very subjective measure.)
So far, though, the problem with the SketchFactor app is less that it offers racially skewed perspectives than that the information it does offer is pretty useless — if entertaining. A pinpoint marked “very sketchy” is just as likely to flag an incident like a Jewish man eating pork products or hipster kids making too much noise as it is to flag a mugging.
Here, then, is a clear example of how Big Data has an advantage over Big Anecdata. The SafePath set-up measures risk more objectively and elegantly. It pulls in openly available crime data and considers simple data like time, location and types of crime. While a crime occurs at a discrete point, the researchers wanted to estimate the risk of a crime on every street, at every point. So they use a mathematical tool that smooths out the crime data over the space of the city and allows them to measure the relative risk of witnessing a crime on every street segment in a city….”
The Decalogue of Policy Making 2.0: Results from Analysis of Case Studies on the Impact of ICT for Governance and Policy Modelling
Paper by Sotirios Koussouris, Fenareti Lampathaki, Gianluca Misuraca, Panagiotis Kokkinakos, and Dimitrios Askounis: “Despite the availability of a myriad of Information and Communication Technologies (ICT) based tools and methodologies for supporting governance and the formulation of policies, including modelling expected impacts, these have proved to be unable to cope with the dire challenges of the contemporary society. In this chapter we present the results of the analysis of a set of promising cases researched in order to understand the possible impact of what we define ‘Policy Making 2.0’, which refers to ‘a set of methodologies and technological solutions aimed at enabling better, timely and participative policy-making’. Based on the analysis of these cases we suggest a bouquet of (mostly ICT-related) practical and research recommendations that are relevant to researchers, practitioners and policy makers in order to guide the introduction and implementation of Policy Making 2.0 initiatives. We argue that this ‘decalogue’ of Policy Making 2.0 could be an operational checklist for future research and policy to further explore the potential of ICT tools for governance and policy modelling, so to make next generation policy making more ‘intelligent’ and hopefully able to solve or anticipate the societal challenges we are (and will be) confronted today and in the future.
Using Crowds for Evaluation Tasks: Validity by Numbers vs. Validity by Expertise
Paper by Christoph Hienerth and Frederik Riar: “Developing and commercializing novel ideas is central to innovation processes. As the outcome of such ideas cannot fully be foreseen, the evaluation of them is crucial. With the rise of the internet and ICT, more and new kinds of evaluations are done by crowds. This raises the question whether individuals in crowds possess necessary capabilities to evaluate and whether their outcomes are valid. As empirical insights are not yet available, this paper deals with the examination of evaluation processes and general evaluation components, the discussion of underlying characteristics and mechanism of these components affecting evaluation outcomes (i.e. evaluation validity). We further investigate differences between firm- and crowd-based evaluation using different cases of applications, and develop a theoretical framework towards evaluation validity, i.e. validity by numbers vs. the validity by expertise. The identified factors that influence the validity of evaluations are: (1) the number of evaluation tasks, (2) complexity, (3) expertise, (4) costs, and (5) time to outcome. For each of these factors, hypotheses are developed based on theoretical arguments. We conclude with implications, proposing a model of evaluation validity.”
A Few Useful Things to Know about Machine Learning
A new research paper by Pedro Domingos: “Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.”
Follow the money: A study of cashtags on Twitter
Behavior Analysis in Social Media
Paper by Reza Zafarani and Huan Liu in IEEE Intelligent Systems (Volume 29, Issue 4, 2014): “With the rise of social media, information sharing has been democratized. As a result, users are given opportunities to exhibit different behaviors such as sharing, posting, liking, commenting, and befriending conveniently and on a daily basis. By analyzing behaviors observed on social media, we can categorize these behaviors into individual and collective behavior. Individual behavior is exhibited by a single user, whereas collective behavior is observed when a group of users behave together. For instance, users using the same hashtag on Twitter or migrating to another social media site are examples of collective behavior. User activities on social media generate behavioral data, which is massive, expansive, and indicative of user preferences, interests, opinions, and relationships. This behavioral data provides a new lens through which we can observe and analyze individual and collective behaviors of users.”
Federalism and Municipal Innovation: Lessons from the Fight Against Vacant Properties
New Paper by Benton Martin: “Cities possess a far greater ability to be trailblazers on a national scale than local officials may imagine. Realizing this, city advocates continue to call for renewed recognition by state and federal officials of the benefits of creative local problem-solving. The goal is admirable but warrants caution. The key to successful local initiatives lies not in woolgathering about cooperation with other levels of government but in identifying potential conflicts and using hard work and political savvy to build constituencies and head off objections. To demonstrate that point, this Article examines the legal status of local governments and recent efforts to regulate vacant property through land banking and registration ordinances.”
Assessing Social Value in Open Data Initiatives: A Framework
Paper by Gianluigi Viscusi, Marco Castelli and Carlo Batini in Future Internet Journal: “Open data initiatives are characterized, in several countries, by a great extension of the number of data sets made available for access by public administrations, constituencies, businesses and other actors, such as journalists, international institutions and academics, to mention a few. However, most of the open data sets rely on selection criteria, based on a technology-driven perspective, rather than a focus on the potential public and social value of data to be published. Several experiences and reports confirm this issue, such as those of the Open Data Census. However, there are also relevant best practices. The goal of this paper is to investigate the different dimensions of a framework suitable to support public administrations, as well as constituencies, in assessing and benchmarking the social value of open data initiatives. The framework is tested on three initiatives, referring to three different countries, Italy, the United Kingdom and Tunisia. The countries have been selected to provide a focus on European and Mediterranean countries, considering also the difference in legal frameworks (civic law vs. common law countries)”
Big Data: Google Searches Predict Unemployment in Finland
Paper by Tuhkuri, Joonas: “There are over 3 billion searches globally on Google every day. This report examines whether Google search queries can be used to predict the present and the near future unemployment rate in Finland. Predicting the present and the near future is of interest, as the official records of the state of the economy are published with a delay. To assess the information contained in Google search queries, the report compares a simple predictive model of unemployment to a model that contains a variable, Google Index, formed from Google data. In addition, cross-correlation analysis and Granger-causality tests are performed. Compared to a simple benchmark, Google search queries improve the prediction of the present by 10 % measured by mean absolute error. Moreover, predictions using search terms perform 39 % better over the benchmark for near future unemployment 3 months ahead. Google search queries also tend to improve the prediction accuracy around turning points. The results suggest that Google searches contain useful information of the present and the near future unemployment rate in Finland.”