Shane Harris in Foreign Policy: “…, Singapore has become a laboratory not only for testing how mass surveillance and big-data analysis might prevent terrorism, but for determining whether technology can be used to engineer a more harmonious society….Months after the virus abated, Ho and his colleagues ran a simulation using Poindexter’s TIA ideas to see whether they could have detected the outbreak. Ho will not reveal what forms of information he and his colleagues used — by U.S. standards, Singapore’s privacy laws are virtually nonexistent, and it’s possible that the government collected private communications, financial data, public transportation records, and medical information without any court approval or private consent — but Ho claims that the experiment was very encouraging. It showed that if Singapore had previously installed a big-data analysis system, it could have spotted the signs of a potential outbreak two months before the virus hit the country’s shores. Prior to the SARS outbreak, for example, there were reports of strange, unexplained lung infections in China. Threads of information like that, if woven together, could in theory warn analysts of pending crises.
The RAHS system was operational a year later, and it immediately began “canvassing a range of sources for weak signals of potential future shocks,” one senior Singaporean security official involved in the launch later recalled.
The system uses a mixture of proprietary and commercial technology and is based on a “cognitive model” designed to mimic the human thought process — a key design feature influenced by Poindexter’s TIA system. RAHS, itself, doesn’t think. It’s a tool that helps human beings sift huge stores of data for clues on just about everything. It is designed to analyze information from practically any source — the input is almost incidental — and to create models that can be used to forecast potential events. Those scenarios can then be shared across the Singaporean government and be picked up by whatever ministry or department might find them useful. Using a repository of information called an ideas database, RAHS and its teams of analysts create “narratives” about how various threats or strategic opportunities might play out. The point is not so much to predict the future as to envision a number of potential futures that can tell the government what to watch and when to dig further.
The officials running RAHS today are tight-lipped about exactly what data they monitor, though they acknowledge that a significant portion of “articles” in their databases come from publicly available information, including news reports, blog posts, Facebook updates, and Twitter messages. (“These articles have been trawled in by robots or uploaded manually” by analysts, says one program document.) But RAHS doesn’t need to rely only on open-source material or even the sorts of intelligence that most governments routinely collect: In Singapore, electronic surveillance of residents and visitors is pervasive and widely accepted…”
Request for Proposals: Exploring the Implications of Government Release of Large Datasets
“The Berkeley Center for Law & Technology and Microsoft are issuing this request for proposals (RFP) to fund scholarly inquiry to examine the civil rights, human rights, security and privacy issues that arise from recent initiatives to release large datasets of government information to the public for analysis and reuse. This research may help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.
This RFP seeks to:
- Gain knowledge of the impact of the online release of large amounts of data generated by citizens’ interactions with government
- Imagine new possibilities for technical, legal, and regulatory interventions that avoid abuse
- Begin building a body of research that addresses these issues
– BACKGROUND –
Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.” Using Open Data, entrepreneurs may create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. Sometimes governments release large datasets in order to encourage the development of unimagined new applications. For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.
Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen. While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry. The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever. Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities. Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern. A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a “central finding” of a recent policy review by White House adviser John Podesta.
The Berkeley Center for Law & Technology (BCLT) and Microsoft are issuing this request for proposals in an effort to better understand the implications and potential impact of the release of data related to U.S. citizens’ interactions with their local, state and federal governments. BCLT and Microsoft will fund up to six grants, with a combined total of $300,000. Grantees will be required to participate in a workshop to present and discuss their research at the Berkeley Technology Law Journal (BTLJ) Spring Symposium. All grantees’ papers will be published in a dedicated monograph. Grantees’ papers that approach the issues from a legal perspective may also be published in the BTLJ. We may also hold a followup workshop in New York City or Washington, DC.
While we are primarily interested in funding proposals that address issues related to the policy impacts of Open Data, many of these issues are intertwined with general societal implications of “big data.” As a result, proposals that explore Open Data from a big data perspective are welcome; however, proposals solely focused on big data are not. We are open to proposals that address the following difficult question. We are also open to methods and disciplines, and are particularly interested in proposals from cross-disciplinary teams.
- To what extent does existing Open Data made available by city and state governments affect individual profiling? Do the effects change depending on the level of aggregation (neighborhood vs. cities)? What releases of information could foreseeably cause discrimination in the future? Will different groups in society be disproportionately impacted by Open Data?
- Should the use of Open Data be governed by a code of conduct or subject to a review process before being released? In order to enhance citizen privacy, should governments develop guidelines to release sampled or perturbed data, instead of entire datasets? When datasets contain potentially identifiable information, should there be a notice-and-comment proceeding that includes proposed technological solutions to anonymize, de-identify or otherwise perturb the data?
- Is there something fundamentally different about government services and the government’s collection of citizen’s data for basic needs in modern society such as power and water that requires governments to exercise greater due care than commercial entities?
- Companies have legal and practical mechanisms to shield data submitted to government from public release. What mechanisms do individuals have or should have to address misuse of Open Data? Could developments in the constitutional right to information policy as articulated in Whalen and Westinghouse Electric Co address Open Data privacy issues?
- Collecting data costs money, and its release could affect civil liberties. Yet it is being given away freely, sometimes to immensely profitable firms. Should governments license data for a fee and/or impose limits on its use, given its value?
- The privacy principle of “collection limitation” is under siege, with many arguing that use restrictions will be more efficacious for protecting privacy and more workable for big data analysis. Does the potential of Open Data justify eroding state and federal privacy act collection limitation principles? What are the ethical dimensions of a government system that deprives the data subject of the ability to obscure or prevent the collection of data about a sensitive issue? A move from collection restrictions to use regulation raises a number of related issues, detailed below.
- Are use restrictions efficacious in creating accountability? Consumer reporting agencies are regulated by use restrictions, yet they are not known for their accountability. How could use regulations be implemented in the context of Open Data efficaciously? Can a self-learning algorithm honor data use restrictions?
- If an Open Dataset were regulated by a use restriction, how could individuals police wrongful uses? How would plaintiffs overcome the likely defenses or proof of facts in a use regulation system, such as a burden to prove that data were analyzed and the product of that analysis was used in a certain way to harm the plaintiff? Will plaintiffs ever be able to beat first amendment defenses?
- The President’s Council of Advisors on Science and Technology big data report emphasizes that analysis is not a “use” of data. Such an interpretation suggests that NSA metadata analysis and large-scale scanning of communications do not raise privacy issues. What are the ethical and legal implications of the “analysis is not use” argument in the context of Open Data?
- Open Data celebrates the idea that information collected by the government can be used by another person for various kinds of analysis. When analysts are not involved in the collection of data, they are less likely to understand its context and limitations. How do we ensure that this knowledge is maintained in a use regulation system?
- Former President William Clinton was admitted under a pseudonym for a procedure at a New York Hospital in 2004. The hospital detected 1,500 attempts by its own employees to access the President’s records. With snooping such a tempting activity, how could incentives be crafted to cause self-policing of government data and the self-disclosure of inappropriate uses of Open Data?
- It is clear that data privacy regulation could hamper some big data efforts. However, many examples of big data successes hail from highly regulated environments, such as health care and financial services—areas with statutory, common law, and IRB protections. What are the contours of privacy law that are compatible with big data and Open Data success and which are inherently inimical to it?
- In recent years, the problem of “too much money in politics” has been addressed with increasing disclosure requirements. Yet, distrust in government remains high, and individuals identified in donor databases have been subjected to harassment. Is the answer to problems of distrust in government even more Open Data?
- What are the ethical and epistemological implications of encouraging government decision-making based upon correlation analysis, without a rigorous understanding of cause and effect? Are there decisions that should not be left to just correlational proof? While enthusiasm for data science has increased, scientific journals are elevating their standards, with special scrutiny focused on hypothesis-free, multiple comparison analysis. What could legal and policy experts learn from experts in statistics about the nature and limits of open data?…
To submit a proposal, visit the Conference Management Toolkit (CMT) here.
Once you have created a profile, the site will allow you to submit your proposal.
If you have questions, please contact Chris Hoofnagle, principal investigator on this project.”
Sharing Data Is a Form of Corporate Philanthropy
Matt Stempeck in HBR Blog: “Ever since the International Charter on Space and Major Disasters was signed in 1999, satellite companies like DMC International Imaging have had a clear protocol with which to provide valuable imagery to public actors in times of crisis. In a single week this February, DMCii tasked its fleet of satellites on flooding in the United Kingdom, fires in India, floods in Zimbabwe, and snow in South Korea. Official crisis response departments and relevant UN departments can request on-demand access to the visuals captured by these “eyes in the sky” to better assess damage and coordinate relief efforts.
Back on Earth, companies create, collect, and mine data in their day-to-day business. This data has quickly emerged as one of this century’s most vital assets. Public sector and social good organizations may not have access to the same amount, quality, or frequency of data. This imbalance has inspired a new category of corporate giving foreshadowed by the 1999 Space Charter: data philanthropy.
The satellite imagery example is an area of obvious societal value, but data philanthropy holds even stronger potential closer to home, where a wide range of private companies could give back in meaningful ways by contributing data to public actors. Consider two promising contexts for data philanthropy: responsive cities and academic research.
The centralized institutions of the 20th century allowed for the most sophisticated economic and urban planning to date. But in recent decades, the information revolution has helped the private sector speed ahead in data aggregation, analysis, and applications. It’s well known that there’s enormous value in real-time usage of data in the private sector, but there are similarly huge gains to be won in the application of real-time data to mitigate common challenges.
What if sharing economy companies shared their real-time housing, transit, and economic data with city governments or public interest groups? For example, Uber maintains a “God’s Eye view” of every driver on the road in a city:
Imagine combining this single data feed with an entire portfolio of real-time information. An early leader in this space is the City of Chicago’s urban data dashboard, WindyGrid. The dashboard aggregates an ever-growing variety of public datasets to allow for more intelligent urban management.
Over time, we could design responsive cities that react to this data. A responsive city is one where services, infrastructure, and even policies can flexibly respond to the rhythms of its denizens in real-time. Private sector data contributions could greatly accelerate these nascent efforts.
Data philanthropy could similarly benefit academia. Access to data remains an unfortunate barrier to entry for many researchers. The result is that only researchers with access to certain data, such as full-volume social media streams, can analyze and produce knowledge from this compelling information. Twitter, for example, sells access to a range of real-time APIs to marketing platforms, but the price point often exceeds researchers’ budgets. To accelerate the pursuit of knowledge, Twitter has piloted a program called Data Grants offering access to segments of their real-time global trove to select groups of researchers. With this program, academics and other researchers can apply to receive access to relevant bulk data downloads, such as an period of time before and after an election, or a certain geographic area.
Humanitarian response, urban planning, and academia are just three sectors within which private data can be donated to improve the public condition. There are many more possible applications possible, but few examples to date. For companies looking to expand their corporate social responsibility initiatives, sharing data should be part of the conversation…
Companies considering data philanthropy can take the following steps:
- Inventory the information your company produces, collects, and analyzes. Consider which data would be easy to share and which data will require long-term effort.
- Think who could benefit from this information. Who in your community doesn’t have access to this information?
- Who could be harmed by the release of this data? If the datasets are about people, have they consented to its release? (i.e. don’t pull a Facebook emotional manipulation experiment).
- Begin conversations with relevant public agencies and nonprofit partners to get a sense of the sort of information they might find valuable and their capacity to work with the formats you might eventually make available.
- If you expect an onslaught of interest, an application process can help qualify partnership opportunities to maximize positive impact relative to time invested in the program.
- Consider how you’ll handle distribution of the data to partners. Even if you don’t have the resources to set up an API, regular releases of bulk data could still provide enormous value to organizations used to relying on less-frequently updated government indices.
- Consider your needs regarding privacy and anonymization. Strip the data of anything remotely resembling personally identifiable information (here are some guidelines).
- If you’re making data available to researchers, plan to allow researchers to publish their results without obstruction. You might also require them to share the findings with the world under Open Access terms….”
Selected Readings on Sentiment Analysis
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of sentiment analysis was originally published in 2014.
Sentiment Analysis is a field of Computer Science that uses techniques from natural language processing, computational linguistics, and machine learning to predict subjective meaning from text. The term opinion mining is often used interchangeably with Sentiment Analysis, although it is technically a subfield focusing on the extraction of opinions (the umbrella under which sentiment, evaluation, appraisal, attitude, and emotion all lie).
The rise of Web 2.0 and increased information flow has led to an increase in interest towards Sentiment Analysis — especially as applied to social networks and media. Events causing large spikes in media — such as the 2012 Presidential Election Debates — are especially ripe for analysis. Such analyses raise a variety of implications for the future of crowd participation, elections, and governance.
Selected Reading List (in alphabetical order)
- Choi, Tan, Lee, Danescu-Niculescu-Mizil, Spindel — Hedge Detection as a Lens on Framing in the GMO Debates: A Position Paper — a position paper to suggest looking at hedge detection in whether adopting a “scientific tone” indicates an opinion in the debate on GMOs.
- Christina Michael, Francesca Toni, and Krysia Broda — Sentiment Analysis for Debates — a paper looking at several techniques and applications of Sentiment Analysis on online debates.
- Akiko Murakami, Rudy Raymond — Support or Oppose? Classifying Positions in Online Debates from Reply Activities and Opinion Expressions — a paper seeking to identify the general positions of users in online debates by exploiting local information in their remarks within the debate, and using Sentiment Analysis on the text.
- Bo Pang, Lillian Lee — Opinion Mining & Sentiment Analysis — a general survey on Sentiment Analysis and approaches, with examples of applications.
- Ranade, Gupta, Varma, Mamidi — Online debate summarization using topic directed sentiment analysis — a paper aiming to summarize online debates by extracting highly topic relevant and sentiment rich sentences.
- Jodi Schneider — Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities — a paper describing a new possible domain for argumentation mining: debates in open online collaboration communities.
Annotated Selected Reading List (in alphabetical order)
Choi, Eunsol et al. “Hedge detection as a lens on framing in the GMO debates: a position paper.” Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics 13 Jul. 2012: 70-79. http://bit.ly/1wweftP
- Understanding the ways in which participants in public discussions frame their arguments is important for understanding how public opinion is formed. This paper adopts the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, the authors propose the following question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a more “scientific” tone?
- Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. The paper proposes a detailed approach to studying whether hedge detection can be used to understand scientific framing in the GMO debates, and provides corpora to facilitate this study. Some of the preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature.
Michael, Christina, Francesca Toni, and Krysia Broda. “Sentiment analysis for debates.” (Unpublished MSc thesis). Department of Computing, Imperial College London (2013). http://bit.ly/Wi86Xv
- This project aims to expand on existing solutions used for automatic sentiment analysis on text in order to capture support/opposition and agreement/disagreement in debates. In addition, it looks at visualizing the classification results for enhancing the ease of understanding the debates and for showing underlying trends. Finally, it evaluates proposed techniques on an existing debate system for social networking.
Murakami, Akiko, and Rudy Raymond. “Support or oppose?: classifying positions in online debates from reply activities and opinion expressions.” Proceedings of the 23rd International Conference on Computational Linguistics: Posters 23 Aug. 2010: 869-875. https://bit.ly/2Eicfnm
- In this paper, the authors propose a method for the task of identifying the general positions of users in online debates, i.e., support or oppose the main topic of an online debate, by exploiting local information in their remarks within the debate. An online debate is a forum where each user posts an opinion on a particular topic while other users state their positions by posting their remarks within the debate. The supporting or opposing remarks are made by directly replying to the opinion, or indirectly to other remarks (to express local agreement or disagreement), which makes the task of identifying users’ general positions difficult.
- A prior study has shown that a link-based method, which completely ignores the content of the remarks, can achieve higher accuracy for the identification task than methods based solely on the contents of the remarks. In this paper, it is shown that utilizing the textual content of the remarks into the link-based method can yield higher accuracy in the identification task.
Pang, Bo, and Lillian Lee. “Opinion mining and sentiment analysis.” Foundations and trends in information retrieval 2.1-2 (2008): 1-135. http://bit.ly/UaCBwD
- This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Its focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. It includes material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.
Ranade, Sarvesh et al. “Online debate summarization using topic directed sentiment analysis.” Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining 11 Aug. 2013: 7. http://bit.ly/1nbKtLn
- Social networking sites provide users a virtual community interaction platform to share their thoughts, life experiences and opinions. Online debate forum is one such platform where people can take a stance and argue in support or opposition of debate topics. An important feature of such forums is that they are dynamic and grow rapidly. In such situations, effective opinion summarization approaches are needed so that readers need not go through the entire debate.
- This paper aims to summarize online debates by extracting highly topic relevant and sentiment rich sentences. The proposed approach takes into account topic relevant, document relevant and sentiment based features to capture topic opinionated sentences. ROUGE (Recall-Oriented Understudy for Gisting Evaluation, which employ a set of metrics and a software package to compare automatically produced summary or translation against human-produced onces) scores are used to evaluate the system. This system significantly outperforms several baseline systems and show improvement over the state-of-the-art opinion summarization system. The results verify that topic directed sentiment features are most important to generate effective debate summaries.
Schneider, Jodi. “Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities.” http://bit.ly/1mi7ztx
- Argumentation mining, a relatively new area of discourse analysis, involves automatically identifying and structuring arguments. Following a basic introduction to argumentation, the authors describe a new possible domain for argumentation mining: debates in open online collaboration communities.
- Based on our experience with manual annotation of arguments in debates, the authors propose argumentation mining as the basis for three kinds of support tools, for authoring more persuasive arguments, finding weaknesses in others’ arguments, and summarizing a debate’s overall conclusions.
When Technologies Combine, Amazing Innovation Happens
FastCoexist: “Innovation occurs both within fields, and in combinations of fields. It’s perhaps the latter that ends up being most groundbreaking. When people of disparate expertise, mindset and ideas work together, new possibilities pop up.
In a new report, the Institute for the Future argues that “technological change is increasingly driven by the combination and recombination of foundational elements.” So, when we think about the future, we need to consider not just fundamental advances (say, in computing, materials, bioscience) but also at the intersection of these technologies.
The report uses combination-analysis in the form of a map. IFTF selects 13 “territories”–what it calls “frontiers of innovation”–and then examines the linkages and overlaps. The result is 20 “combinational forecasts.” “These are the big stories, hot spots that will shape the landscape of technology in the coming decade,” the report explains. “Each combinatorial forecast emerges from the intersection of multiple territories.”…
Quantified Experiences
Advances in brain-imaging techniques will make bring new transparency to our thoughts and feelings. “Assigning precise measurements to feelings like pain through neurofeedback and other techniques could allow for comparison, modulation, and manipulation of these feelings,” the report says. “Direct measurement of our once-private thoughts and feelings can help us understand other people’s experience but will also present challenges regarding privacy and definition of norms.”…
Code Is The Law
The law enforcement of the future may increasingly rely on sensors and programmable devices. “Governance is shifting from reliance on individual responsibility and human policing toward a system of embedded protocols and automatic rule enforcement,” the report says. That in turn means greater power for programmers who are effectively laying down the parameters of the new relationship between government and governed….”
Privacy-Invading Technologies and Privacy by Design
New book by Demetrius Klitou: “Challenged by rapidly developing privacy-invading technologies (PITs), this book provides a convincing set of potential policy recommendations and practical solutions for safeguarding both privacy and security. It shows that benefits such as public security do not necessarily come at the expense of privacy and liberty overall.
Backed up by comprehensive study of four specific PITs – Body scanners; Public space CCTV microphones; Public space CCTV loudspeakers; and Human-implantable microchips (RFID implants/GPS implants) – the author shows how laws that regulate the design and development of PITs may more effectively protect privacy than laws that only regulate data controllers and the use of such technologies. New rules and regulations should therefore incorporate fundamental privacy principles through what is known as ‘Privacy by Design’.
The numerous sources explored by the author provide a workable overview of the positions of academia, industry, government and relevant international organizations and NGOs.
- Explores a relatively novel approach of protecting privacy
- Offers a convincing set of potential policy recommendations and practical solutions
- Provides a workable overview of the positions of academia, industry, government and relevant international organizations and NGOs”
No silver bullet: De-identification still doesn’t work
Arvind Narayanan and Edward W. Felten: “Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.” In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. Specifically, we argue that:
- There is no known effective method to anonymize location data, and no evidence that it’s meaningfully achievable.
- Computing re-identification probabilities based on proof-of-concept demonstrations is silly.
- Cavoukian and Castro ignore many realistic threats by focusing narrowly on a particular model of re-identification.
- Cavoukian and Castro concede that de-identification is inadequate for high-dimensional data. But nowadays most interesting datasets are high-dimensional.
- Penetrate-and-patch is not an option.
- Computer science knowledge is relevant and highly available.
- Cavoukian and Castro apply different standards to big data and re-identification techniques.
- Quantification of re-identification probabilities, which permeates Cavoukian and Castro’s arguments, is a fundamentally meaningless exercise.
Data privacy is a hard problem. Data custodians face a choice between roughly three alternatives: sticking with the old habit of de-identification and hoping for the best; turning to emerging technologies like differential privacy that involve some trade-offs in utility and convenience; and using legal agreements to limit the flow and use of sensitive data. These solutions aren’t fully satisfactory, either individually or in combination, nor is any one approach the best in all circumstances. Change is difficult. When faced with the challenge of fostering data science while preventing privacy risks, the urge to preserve the status quo is understandable. However, this is incompatible with the reality of re-identification science. If a “best of both worlds” solution exists, de-identification is certainly not that solution. Instead of looking for a silver bullet, policy makers must confront hard choices.”
Urban Analytics (Updated and Expanded)
As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. In this edition, we explore the literature on Urban Analytics. To suggest additional readings on this or any other topic, please email [email protected].
Urban Analytics places better information in the hands of citizens as well as government officials to empower people to make more informed choices. Today, we are able to gather real-time information about traffic, pollution, noise, and environmental and safety conditions by culling data from a range of tools: from the low-cost sensors in mobile phones to more robust monitoring tools installed in our environment. With data collected and combined from the built, natural and human environments, we can develop more robust predictive models and use those models to make policy smarter.
With the computing power to transmit and store the data from these sensors, and the tools to translate raw data into meaningful visualizations, we can identify problems as they happen, design new strategies for city management, and target the application of scarce resources where they are most needed.
Selected Reading List (in alphabetical order)
- L. Amini, E. Bouillet, F. Calabrese, L. Gasparini and O. Verscheure — Challenges and Results in City-scale Sensing — a paper examining research challenges related to cities’ use of machine learning, optimization, visualization and semantic analysis.
- M. Batty, K. W. Axhausen, F. Gianotti, A. Pozdnoukhov, A. Bazzani, M. Wachowicz, G. Ouzonis and Y. Portugali — Smart Cities of the Future — a paper exploring the goals and research challenges of merging ICT with traditional city infrastructures.
- Paul Budde — Smart Cities of Tomorrow — a paper on strategies for creating smart cities with cohesive and open telecommunication and software architecture.
- G. Cardone, L. Foschini, P. Bellavista, A. Corradi, C. Borcea, M. Talasila and R. Curtmola — Fostering Participaction in Smart Cities: A Geo-social Crowdsensing Platform — a paper on employing collective intelligence in smart cities.
- Chien-Chu Chen – The Trend towards ‘Smart Cities’ – a study of existing smart city initiatives from around the world.
- A. Domingo, B. Bellalta, M. Palacin, M. Oliver and E. Almirall – Public Open Sensor Data: Revolutionizing Smart Cities – a paper proposing a platform for cities to leverage public open sensor data.
- C. Harrison, B. Eckman, R. Hamilton, P. Hartswick, J. Kalagnanam, J. Paraszczak and P. Williams — Foundations for Smarter Cities — a paper describing the information technology foundation and principles for smart cities.
- José M. Hernández-Muñoz, Jesús Bernat Vercher, Luis Muñoz, José A. Galache, Mirko Presser, Luis A. Hernández Gómez, and Jan Pettersson — Smart Cities at the Forefront of the Future Internet — a paper exploring the notion of transforming a smart city into an open innovation platform.
- Jung Hoon-Lee, Marguerite Gong Hancock, Mei-Chih Hu – Towards an effective framework for building smart cities: Lessons from Seoul and San Francisco – a paper proposing a conceptual framework for smart city initiatives.
- Maged N. Kamel Boulos and Najeeb M. Al-Shorbaji – On the Internet of Things, smart cities and the WHO Healthy Cities – an article describing the opportunity for smart city initiatives and the Internet of Things (IoT) to help improve health outcomes and the environment.
-
Sallie Ann Keller, Steven E. Koonin and Stephanie Shipp — Big Data and City Living — What Can It Do for Us? — an article exploring the benefits and challenges related to cities leveraging big data.
- Rob Kitchin — The Real-Time City? Big Data and Smart Urbanism — a paper discussing how cities’ use of big data enables real-time analysis and new modes of technocratic urban governance.
- Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum eds. – Privacy, Big Data, and the Public Good: Frameworks for Engagement – a book focusing on the legal, practical, and statistical approaches for maximizing the use of massive datasets, including those supporting urban analytics, while minimizing information risk.
- A. Mostashari, F. Arnold, M. Maurer and J. Wade — Citizens as Sensors: The Cognitive City Paradigm — a paper introducing the concept of the “cognitive city” — a city that can learn to improve its service conditions by planning, deciding and acting on perceived conditions.
- M. Oliver, M. Palacin, A. Domingo and V. Valls — Sensor Information Fueling Open Data — a paper introducing the concept of sensor networks and their role in a smart cities framework.
- Charith Perera, Arkady Zaslavsky, Peter Christen and Dimitrios Georgakopoulos – Sensing as a service model for smart cities supported by Internet of Things – a paper focused on the parallel advancements of smart city initiatives and the Internet of Things (IoT).
- Hans Schaffers, Nicos Komninos, Marc Pallot, Brigitte Trousse, Michael Nilsson and Alvaro Oliviera — Smart Cities and the Future Internet: Towards Cooperation Frameworks for Open Innovation — a paper exploring the present and future of citizen participation in smart service delivery.
- G. Suciu, A. Vulpe, S. Halunga, O. Fratu, G. Todoran and V. Suciu — Smart Cities Built on Resilient Cloud Computing and Secure Internet of Things — a paper proposing a new cloud-based platform for provision and support of ubiquitous connectivity and real-time applications and services in smart cities.
- Anthony Townsend — Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia — a book exploring the diversity of motivations, challenges and potential benefits of smart cities in our “era of mass urbanization and technological ubiquity.”
Annotated Selected Reading List (in alphabetical order)
Amini, L., E. Bouillet, F. Calabrese, L. Gasparini, and O. Verscheure. “Challenges and Results in City-scale Sensing.” In IEEE Sensors, 59–61, 2011. http://bit.ly/1doodZm.
- This paper examines “how city requirements map to research challenges in machine learning, optimization, control, visualization, and semantic analysis.”
- The authors raises several research challenges including how to extract accurate information when the data is noisy and sparse; how to represent findings from digital pervasive technologies; and how people interact with one another and their environment.
Batty, M., K. W. Axhausen, F. Giannotti, A. Pozdnoukhov, A. Bazzani, M. Wachowicz, G. Ouzounis, and Y. Portugali. “Smart Cities of the Future.” The European Physical Journal Special Topics 214, no. 1 (November 1, 2012): 481–518. http://bit.ly/HefbjZ.
- This paper explores the goals and research challenges involved in the development of smart cities that merge ICT with traditional infrastructures through digital technologies.
- The authors put forth several research objectives, including: 1) to explore the notion of the city as a laboratory for innovation; 2) to develop technologies that ensure equity, fairness and realize a better quality of city life; and 3) to develop technologies that ensure informed participation and create shared knowledge for democratic city governance.
- The paper also examines several contemporary smart city initiatives, expected paradigm shifts in the field, benefits, risks and impacts.
Budde, Paul. “Smart Cities of Tomorrow.” In Cities for Smart Environmental and Energy Futures, edited by Stamatina Th Rassia and Panos M. Pardalos, 9–20. Energy Systems. Springer Berlin Heidelberg, 2014. http://bit.ly/17MqPZW.
- This paper examines the components and strategies involved in the creation of smart cities featuring “cohesive and open telecommunication and software architecture.”
- In their study of smart cities, the authors examine smart and renewable energy; next-generation networks; smart buildings; smart transport; and smart government.
- They conclude that for the development of smart cities, information and communication technology (ICT) is needed to build more horizontal collaborative structures, useful data must be analyzed in real time and people and/or machines must be able to make instant decisions related to social and urban life.
Cardone, G., L. Foschini, P. Bellavista, A. Corradi, C. Borcea, M. Talasila, and R. Curtmola. “Fostering Participaction in Smart Cities: a Geo-social Crowdsensing Platform.” IEEE Communications
Magazine 51, no. 6 (2013): 112–119. http://bit.ly/17iJ0vZ.
- This article examines “how and to what extent the power of collective although imprecise intelligence can be employed in smart cities.”
- To tackle problems of managing the crowdsensing process, this article proposes a “crowdsensing platform with three main original technical aspects: an innovative geo-social model to profile users along different variables, such as time, location, social interaction, service usage, and human activities; a matching algorithm to autonomously choose people to involve in participActions and to quantify the performance of their sensing; and a new Android-based platform to collect sensing data from smart phones, automatically or with user help, and to deliver sensing/actuation tasks to users.”
Chen, Chien-Chu. “The Trend towards ‘Smart Cities.’” International Journal of Automation and Smart Technology. June 1, 2014. http://bit.ly/1jOOaAg.
- In this study, Chen explores the ambitions, prevalence and outcomes of a variety of smart cities, organized into five categories:
- Transportation-focused smart cities
- Energy-focused smart cities
- Building-focused smart cities
- Water-resources-focused smart cities
- Governance-focused smart cities
- The study finds that the “Asia Pacific region accounts for the largest share of all smart city development plans worldwide, with 51% of the global total. Smart city development plans in the Asia Pacific region tend to be energy-focused smart city initiatives, aimed at easing the pressure on energy resources that will be caused by continuing rapid urbanization in the future.”
- North America, on the other hand is generally more geared toward energy-focused smart city development plans. “In North America, there has been a major drive to introduce smart meters and smart electric power grids, integrating the electric power sector with information and communications technology (ICT) and replacing obsolete electric power infrastructure, so as to make cities’ electric power systems more reliable (which in turn can help to boost private-sector investment, stimulate the growth of the ‘green energy’ industry, and create more job opportunities).”
- Looking to Taiwan as an example, Chen argues that, “Cities in different parts of the world face different problems and challenges when it comes to urban development, making it necessary to utilize technology applications from different fields to solve the unique problems that each individual city has to overcome; the emphasis here is on the development of customized solutions for smart city development.”
Domingo, A., B. Bellalta, M. Palacin, M. Oliver and E. Almirall. “Public Open Sensor Data: Revolutionizing Smart Cities.” Technology and Society Magazine, IEEE 32, No. 4. Winter 2013. http://bit.ly/1iH6ekU.
- In this article, the authors explore the “enormous amount of information collected by sensor devices” that allows for “the automation of several real-time services to improve city management by using intelligent traffic-light patterns during rush hour, reducing water consumption in parks, or efficiently routing garbage collection trucks throughout the city.”
- They argue that, “To achieve the goal of sharing and open data to the public, some technical expertise on the part of citizens will be required. A real environment – or platform – will be needed to achieve this goal.” They go on to introduce a variety of “technical challenges and considerations involved in building an Open Sensor Data platform,” including:
- Scalability
- Reliability
- Low latency
- Standardized formats
- Standardized connectivity
- The authors conclude that, despite incredible advancements in urban analytics and open sensing in recent years, “Today, we can only imagine the revolution in Open Data as an introduction to a real-time world mashup with temperature, humidity, CO2 emission, transport, tourism attractions, events, water and gas consumption, politics decisions, emergencies, etc., and all of this interacting with us to help improve the future decisions we make in our public and private lives.”
Harrison, C., B. Eckman, R. Hamilton, P. Hartswick, J. Kalagnanam, J. Paraszczak, and P. Williams. “Foundations for Smarter Cities.” IBM Journal of Research and Development 54, no. 4 (2010): 1–16. http://bit.ly/1iha6CR.
- This paper describes the information technology (IT) foundation and principles for Smarter Cities.
- The authors introduce three foundational concepts of smarter cities: instrumented, interconnected and intelligent.
- They also describe some of the major needs of contemporary cities, and concludes that Creating the Smarter City implies capturing and accelerating flows of information both vertically and horizontally.
Hernández-Muñoz, José M., Jesús Bernat Vercher, Luis Muñoz, José A. Galache, Mirko Presser, Luis A. Hernández Gómez, and Jan Pettersson. “Smart Cities at the Forefront of the Future Internet.” In The Future Internet, edited by John Domingue, Alex Galis, Anastasius Gavras, Theodore Zahariadis, Dave Lambert, Frances Cleary, Petros Daras, et al., 447–462. Lecture Notes in Computer Science 6656. Springer Berlin Heidelberg, 2011. http://bit.ly/HhNbMX.
- This paper explores how the “Internet of Things (IoT) and Internet of Services (IoS), can become building blocks to progress towards a unified urban-scale ICT platform transforming a Smart City into an open innovation platform.”
- The authors examine the SmartSantander project to argue that, “the different stakeholders involved in the smart city business is so big that many non-technical constraints must be considered (users, public administrations, vendors, etc.).”
- The authors also discuss the need for infrastructures at the, for instance, European level for realistic large-scale experimentally-driven research.
Hoon-Lee, Jung, Marguerite Gong Hancock, Mei-Chih Hu. “Towards an effective framework for building smart cities: Lessons from Seoul and San Francisco.” Technological Forecasting and Social Change. Ocotober 3, 2013. http://bit.ly/1rzID5v.
- In this study, the authors aim to “shed light on the process of building an effective smart city by integrating various practical perspectives with a consideration of smart city characteristics taken from the literature.”
- They propose a conceptual framework based on case studies from Seoul and San Francisco built around the following dimensions:
- Urban openness
- Service innovation
- Partnerships formation
- Urban proactiveness
- Smart city infrastructure integration
- Smart city governance
- The authors conclude with a summary of research findings featuring “8 stylized facts”:
- Movement towards more interactive services engaging citizens;
- Open data movement facilitates open innovation;
- Diversifying service development: exploit or explore?
- How to accelerate adoption: top-down public driven vs. bottom-up market driven partnerships;
- Advanced intelligent technology supports new value-added smart city services;
- Smart city services combined with robust incentive systems empower engagement;
- Multiple device & network accessibility can create network effects for smart city services;
- Centralized leadership implementing a comprehensive strategy boosts smart initiatives.
Kamel Boulos, Maged N. and Najeeb M. Al-Shorbaji. “On the Internet of Things, smart cities and the WHO Healthy Cities.” International Journal of Health Geographics 13, No. 10. 2014. http://bit.ly/Tkt9GA.
- In this article, the authors give a “brief overview of the Internet of Things (IoT) for cities, offering examples of IoT-powered 21st century smart cities, including the experience of the Spanish city of Barcelona in implementing its own IoT-driven services to improve the quality of life of its people through measures that promote an eco-friendly, sustainable environment.”
- The authors argue that one of the central needs for harnessing the power of the IoT and urban analytics is for cities to “involve and engage its stakeholders from a very early stage (city officials at all levels, as well as citizens), and to secure their support by raising awareness and educating them about smart city technologies, the associated benefits, and the likely challenges that will need to be overcome (such as privacy issues).”
- They conclude that, “The Internet of Things is rapidly gaining a central place as key enabler of the smarter cities of today and the future. Such cities also stand better chances of becoming healthier cities.”
Keller, Sallie Ann, Steven E. Koonin, and Stephanie Shipp. “Big Data and City Living – What Can It Do for Us?” Significance 9, no. 4 (2012): 4–7. http://bit.ly/166W3NP.
- This article provides a short introduction to Big Data, its importance, and the ways in which it is transforming cities. After an overview of the social benefits of big data in an urban context, the article examines its challenges, such as privacy concerns and institutional barriers.
- The authors recommend that new approaches to making data available for research are needed that do not violate the privacy of entities included in the datasets. They believe that balancing privacy and accessibility issues will require new government regulations and incentives.
Kitchin, Rob. “The Real-Time City? Big Data and Smart Urbanism.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, July 3, 2013. http://bit.ly/1aamZj2.
- This paper focuses on “how cities are being instrumented with digital devices and infrastructure that produce ‘big data’ which enable real-time analysis of city life, new modes of technocratic urban governance, and a re-imagining of cities.”
- The authors provide “a number of projects that seek to produce a real-time analysis of the city and provides a critical reflection on the implications of big data and smart urbanism.”
Mostashari, A., F. Arnold, M. Maurer, and J. Wade. “Citizens as Sensors: The Cognitive City Paradigm.” In 2011 8th International Conference Expo on Emerging Technologies for a Smarter World (CEWIT), 1–5, 2011. http://bit.ly/1fYe9an.
- This paper argues that. “implementing sensor networks are a necessary but not sufficient approach to improving urban living.”
- The authors introduce the concept of the “Cognitive City” – a city that can not only operate more efficiently due to networked architecture, but can also learn to improve its service conditions, by planning, deciding and acting on perceived conditions.
- Based on this conceptualization of a smart city as a cognitive city, the authors propose “an architectural process approach that allows city decision-makers and service providers to integrate cognition into urban processes.”
Oliver, M., M. Palacin, A. Domingo, and V. Valls. “Sensor Information Fueling Open Data.” In Computer Software and Applications Conference Workshops (COMPSACW), 2012 IEEE 36th Annual, 116–121, 2012. http://bit.ly/HjV4jS.
- This paper introduces the concept of sensor networks as a key component in the smart cities framework, and shows how real-time data provided by different city network sensors enrich Open Data portals and require a new architecture to deal with massive amounts of continuously flowing information.
- The authors’ main conclusion is that by providing a framework to build new applications and services using public static and dynamic data that promote innovation, a real-time open sensor network data platform can have several positive effects for citizens.
Perera, Charith, Arkady Zaslavsky, Peter Christen and Dimitrios Georgakopoulos. “Sensing as a service model for smart cities supported by Internet of Things.” Transactions on Emerging Telecommunications Technologies 25, Issue 1. January 2014. http://bit.ly/1qJLDP9.
- This paper looks into the “enormous pressure towards efficient city management” that has “triggered various Smart City initiatives by both government and private sector businesses to invest in information and communication technologies to find sustainable solutions to the growing issues.”
- The authors explore the parallel advancement of the Internet of Things (IoT), which “envisions to connect billions of sensors to the Internet and expects to use them for efficient and effective resource management in Smart Cities.”
- The paper proposes the sensing as a service model “as a solution based on IoT infrastructure.” The sensing as a service model consists of four conceptual layers: “(i) sensors and sensor owners; (ii) sensor publishers (SPs); (iii) extended service providers (ESPs); and (iv) sensor data consumers. They go on to describe how this model would work in the areas of waste management, smart agriculture and environmental management.
Privacy, Big Data, and the Public Good: Frameworks for Engagement. Edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum; Cambridge University Press, 2014. http://bit.ly/UoGRca.
- This book focuses on the legal, practical, and statistical approaches for maximizing the use of massive datasets while minimizing information risk.
- “Big data” is more than a straightforward change in technology. It poses deep challenges to our traditions of notice and consent as tools for managing privacy. Because our new tools of data science can make it all but impossible to guarantee anonymity in the future, the authors question whether it possible to truly give informed consent, when we cannot, by definition, know what the risks are from revealing personal data either for individuals or for society as a whole.
- Based on their experience building large data collections, authors discuss some of the best practical ways to provide access while protecting confidentiality. What have we learned about effective engineered controls? About effective access policies? About designing data systems that reinforce – rather than counter – access policies? They also explore the business, legal, and technical standards necessary for a new deal on data.
- Since the data generating process or the data collection process is not necessarily well understood for big data streams, authors discuss what statistics can tell us about how to make greatest scientific use of this data. They also explore the shortcomings of current disclosure limitation approaches and whether we can quantify the extent of privacy loss.
Schaffers, Hans, Nicos Komninos, Marc Pallot, Brigitte Trousse, Michael Nilsson, and Alvaro Oliveira. “Smart Cities and the Future Internet: Towards Cooperation Frameworks for Open Innovation.” In The Future Internet, edited by John Domingue, Alex Galis, Anastasius Gavras, Theodore Zahariadis, Dave Lambert, Frances Cleary, Petros Daras, et al., 431–446. Lecture Notes in Computer Science 6656. Springer Berlin Heidelberg, 2011. http://bit.ly/16ytKoT.
- This paper “explores ‘smart cities’ as environments of open and user-driven innovation for experimenting and validating Future Internet-enabled services.”
- The authors examine several smart city projects to illustrate the central role of users in defining smart services and the importance of participation. They argue that, “Two different layers of collaboration can be distinguished. The first layer is collaboration within the innovation process. The second layer concerns collaboration at the territorial level, driven by urban and regional development policies aiming at strengthening the urban innovation systems through creating effective conditions for sustainable innovation.”
Suciu, G., A. Vulpe, S. Halunga, O. Fratu, G. Todoran, and V. Suciu. “Smart Cities Built on Resilient Cloud Computing and Secure Internet of Things.” In 2013 19th International Conference on Control Systems and Computer Science (CSCS), 513–518, 2013. http://bit.ly/16wfNgv.
- This paper proposes “a new platform for using cloud computing capacities for provision and support of ubiquitous connectivity and real-time applications and services for smart cities’ needs.”
- The authors present a “framework for data procured from highly distributed, heterogeneous, decentralized, real and virtual devices (sensors, actuators, smart devices) that can be automatically managed, analyzed and controlled by distributed cloud-based services.”
Townsend, Anthony. Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia. W. W. Norton & Company, 2013.
- In this book, Townsend illustrates how “cities worldwide are deploying technology to address both the timeless challenges of government and the mounting problems posed by human settlements of previously unimaginable size and complexity.”
- He also considers “the motivations, aspirations, and shortcomings” of the many stakeholders involved in the development of smart cities, and poses a new civics to guide these efforts.
- He argues that smart cities are not made smart by various, soon-to-be-obsolete technologies built into its infrastructure, but how citizens use these ever-changing technologies to be “human-centered, inclusive and resilient.”
To stay current on recent writings and developments on Urban Analytics, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to [email protected] or in the comments below.
Predicting crime, LAPD-style
The Guardian: “The Los Angeles Police Department, like many urban police forces today, is both heavily armed and thoroughly computerised. The Real-Time Analysis and Critical Response Division in downtown LA is its central processor. Rows of crime analysts and technologists sit before a wall covered in video screens stretching more than 10 metres wide. Multiple news broadcasts are playing simultaneously, and a real-time earthquake map is tracking the region’s seismic activity. Half-a-dozen security cameras are focused on the Hollywood sign, the city’s icon. In the centre of this video menagerie is an oversized satellite map showing some of the most recent arrests made across the city – a couple of burglaries, a few assaults, a shooting.
On a slightly smaller screen the division’s top official, Captain John Romero, mans the keyboard and zooms in on a comparably micro-scale section of LA. It represents just 500 feet by 500 feet. Over the past six months, this sub-block section of the city has seen three vehicle burglaries and two property burglaries – an atypical concentration. And, according to a new algorithm crunching crime numbers in LA and dozens of other cities worldwide, it’s a sign that yet more crime is likely to occur right here in this tiny pocket of the city.
The algorithm at play is performing what’s commonly referred to as predictive policing. Using years – and sometimes decades – worth of crime reports, the algorithm analyses the data to identify areas with high probabilities for certain types of crime, placing little red boxes on maps of the city that are streamed into patrol cars. “Burglars tend to be territorial, so once they find a neighbourhood where they get good stuff, they come back again and again,” Romero says. “And that assists the algorithm in placing the boxes.”
Romero likens the process to an amateur fisherman using a fish finder device to help identify where fish are in a lake. An experienced fisherman would probably know where to look simply by the fish species, time of day, and so on. “Similarly, a really good officer would be able to go out and find these boxes. This kind of makes the average guys’ ability to find the crime a little bit better.”
Predictive policing is just one tool in this new, tech-enhanced and data-fortified era of fighting and preventing crime. As the ability to collect, store and analyse data becomes cheaper and easier, law enforcement agencies all over the world are adopting techniques that harness the potential of technology to provide more and better information. But while these new tools have been welcomed by law enforcement agencies, they’re raising concerns about privacy, surveillance and how much power should be given over to computer algorithms.
P Jeffrey Brantingham is a professor of anthropology at UCLA who helped develop the predictive policing system that is now licensed to dozens of police departments under the brand name PredPol. “This is not Minority Report,” he’s quick to say, referring to the science-fiction story often associated with PredPol’s technique and proprietary algorithm. “Minority Report is about predicting who will commit a crime before they commit it. This is about predicting where and when crime is most likely to occur, not who will commit it.”…”
Privacy and Open Government
Paper by Teresa Scassa in Future Internet: “The public-oriented goals of the open government movement promise increased transparency and accountability of governments, enhanced citizen engagement and participation, improved service delivery, economic development and the stimulation of innovation. In part, these goals are to be achieved by making more and more government information public in reusable formats and under open licences. This paper identifies three broad privacy challenges raised by open government. The first is how to balance privacy with transparency and accountability in the context of “public” personal information. The second challenge flows from the disruption of traditional approaches to privacy based on a collapse of the distinctions between public and private sector actors. The third challenge is that of the potential for open government data—even if anonymized—to contribute to the big data environment in which citizens and their activities are increasingly monitored and profiled.”