Index: Crime and Criminal Justice Data


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on crime and criminal justice data and was originally published in 2015.

This index provides information about the type of crime and criminal justice data collected, shared and used in the United States. Because it is well known that data related to the criminal justice system is often times unreliable, or just plain missing, this index also highlights some of the issues that stand in the way of accessing useful and in-demand statistics.

Data Collections: National Crime Statistics

  • Number of incident-based crime datasets created by the Federal Bureau of Investigation (FBI): 2
    • Number of U.S. Statistical Agencies: 13
    • How many of those are focused on criminal justice: 1, the Bureau of Justice Statistics (BJS)
    • Number of data collections focused on criminal justice the BJS produces: 61
    • Number of federal-level APIs available for crime or criminal justice data: 1, the National Crime Victimization Survey (NCVS).
    • Frequency of the NCVS: annually
  • Number of Statistical Analysis Centers (SACs), organizations that are essentially clearinghouses for crime and criminal justice data for each state, the District of Columbia, Puerto Rico and the Northern Mariana Islands: 53

Open data, data use and the impact of those efforts

  • Number of datasets that are returned when “criminal justice” is searched for on Data.gov: 417, including federal-, state- and city-level datasets
  • Number of datasets that are returned when “crime” is searched for on Data.gov: 281
  • The percentage that public complaints dropped after officers started wearing body cameras, according to a study done in Rialto, Calif.: 88
  • The percentage that reported incidents of officer use of force fell after officers started wearing body cameras, according to a study done in Rialto, Calif.: 5
  • The percent that crime decreased during an experiment in predictive policing in Shreveport, LA: 35  
  • Number of crime data sets made available by the Seattle Police Department – generally seen as a leader in police data innovation – on the Seattle.gov website: 4
    • Major crime stats by category in aggregate
    • Crime trend reports
    • Precinct data by beat
    • State sex offender database
  • Number of datasets mapped by the Seattle Police Department: 2:
      • 911 incidents
    • Police reports
  • Number of states where risk assessment tools must be used in pretrial proceedings to help determine whether an offender is released from jail before a trial: at least 11.

Police Data

    • Number of federally mandated databases that collect information about officer use of force or officer involved shootings, nationwide: 0
    • The year a crime bill was passed that called for data on excessive force to be collected for research and statistical purposes, but has never been funded: 1994
    • Number of police departments that committed to being a part of the White House’s Police Data Initiative: 21
    • Percentage of police departments surveyed in 2013 by the Office of Community Oriented Policing within the Department of Justice that are not using body cameras, therefore not collecting body camera data: 75

The criminal justice system

  • Parts of the criminal justice system where data about an individual can be created or collected: at least 6
    • Entry into the system (arrest)
    • Prosecution and pretrial
    • Sentencing
    • Corrections
    • Probation/parole
    • Recidivism

Sources

  • Crime Mapper. Philadelphia Police Department. Accessed August 24, 2014.

Selected Readings on Data Governance


Jos Berens (Centre for Innovation, Leiden University) and Stefaan G. Verhulst (GovLab)

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data governance was originally published in 2015.

Context
The field of Data Collaboratives is premised on the idea that sharing and opening-up private sector datasets has great – and yet untapped – potential for promoting social good. At the same time, the potential of data collaboratives depends on the level of societal trust in the exchange, analysis and use of the data exchanged. Strong data governance frameworks are essential to ensure responsible data use. Without such governance regimes, the emergent data ecosystem will be hampered and the (perceived) risks will dominate the (perceived) benefits. Further, without adopting a human-centered approach to the design of data governance frameworks, including iterative prototyping and careful consideration of the experience, the responses may fail to be flexible and targeted to real needs.

Selected Readings List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Better Place Lab, “Privacy, Transparency and Trust.” Mozilla, 2015. Available from: http://www.betterplace-lab.org/privacy-report.

  • This report looks specifically at the risks involved in the social sector having access to datasets, and the main risks development organizations should focus on to develop a responsible data use practice.
  • Focusing on five specific countries (Brazil, China, Germany, India and Indonesia), the report displays specific country profiles, followed by a comparative analysis centering around the topics of privacy, transparency, online behavior and trust.
  • Some of the key findings mentioned are:
    • A general concern on the importance of privacy, with cultural differences influencing conception of what privacy is.
    • Cultural differences determining how transparency is perceived, and how much value is attached to achieving it.
    • To build trust, individuals need to feel a personal connection or get a personal recommendation – it is hard to build trust regarding automated processes.

Montjoye, Yves Alexandre de; Kendall, Jake and; Kerry, Cameron F. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, 2015. Available from: http://www.brookings.edu/research/papers/2014/11/12-enabling-humanitarian-use-mobile-phone-data.

  • Focussing in particular on mobile phone data, this paper explores ways of mitigating privacy harms involved in using call detail records for social good.
  • Key takeaways are the following recommendations for using data for social good:
    • Engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.
    • Accepting that no framework for maximizing data for the public good will offer perfect protection for privacy, but there must be a balanced application of privacy concerns against the potential for social good.
    • Establishing systems and processes for recognizing trusted third-parties and systems to manage datasets, enable detailed audits, and control the use of data so as to combat the potential for data abuse and re-identification of anonymous data.
    • Simplifying the process among developing governments in regards to the collection and use of mobile phone metadata data for research and public good purposes.

Centre for Democracy and Technology, “Health Big Data in the Commercial Context.” Centre for Democracy and Technology, 2015. Available from: https://cdt.org/insight/health-big-data-in-the-commercial-context/.

  • Focusing particularly on the privacy issues related to using data generated by individuals, this paper explores the overlap in privacy questions this field has with other data uses.
  • The authors note that although the Health Insurance Portability and Accountability Act (HIPAA) has proven a successful approach in ensuring accountability for health data, most of these standards do not apply to developers of the new technologies used to collect these new data sets.
  • For non-HIPAA covered, customer facing technologies, the paper bases an alternative framework for consideration of privacy issues. The framework is based on the Fair Information Practice Principles, and three rounds of stakeholder consultations.

Center for Information Policy Leadership, “A Risk-based Approach to Privacy: Improving Effectiveness in Practice.” Centre for Information Policy Leadership, Hunton & Williams LLP, 2015. Available from: https://www.informationpolicycentre.com/uploads/5/7/1/0/57104281/white_paper_1-a_risk_based_approach_to_privacy_improving_effectiveness_in_practice.pdf.

  • This white paper is part of a project aiming to explain what is often referred to as a new, risk-based approach to privacy, and the development of a privacy risk framework and methodology.
  • With the pace of technological progress often outstripping the capabilities of privacy officers to keep up, this method aims to offer the ability to approach privacy matters in a structured way, assessing privacy implications from the perspective of possible negative impact on individuals.
  • With the intended outcomes of the project being “materials to help policy-makers and legislators to identify desired outcomes and shape rules for the future which are more effective and less burdensome”, insights from this paper might also feed into the development of innovative governance mechanisms aimed specifically at preventing individual harm.

Centre for Information Policy Leadership, “Data Governance for the Evolving Digital Market Place”, Centre for Information Policy Leadership, Hunton & Williams LLP, 2011. Available from: http://www.huntonfiles.com/files/webupload/CIPL_Centre_Accountability_Data_Governance_Paper_2011.pdf.

  • This paper argues that as a result of the proliferation of large scale data analytics, new models governing data inferred from society will shift responsibility to the side of organizations deriving and creating value from that data.
  • It is noted that, with the reality of the challenge corporations face of enabling agile and innovative data use “In exchange for increased corporate responsibility, accountability [and the governance models it mandates, ed.] allows for more flexible use of data.”
  • Proposed as a means to shift responsibility to the side of data-users, the accountability principle has been researched by a worldwide group of policymakers. Tailing the history of the accountability principle, the paper argues that it “(…) requires that companies implement programs that foster compliance with data protection principles, and be able to describe how those programs provide the required protections for individuals.”
  • The following essential elements of accountability are listed:
    • Organisation commitment to accountability and adoption of internal policies consistent with external criteria
    • Mechanisms to put privacy policies into effect, including tools, training and education
    • Systems for internal, ongoing oversight and assurance reviews and external verification
    • Transparency and mechanisms for individual participation
    • Means of remediation and external enforcement

Crawford, Kate; Schulz, Jason. “Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harm.” NYU School of Law, 2014. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2325784&download=yes.

  • Considering the privacy implications of large-scale analysis of numerous data sources, this paper proposes the implementation of a ‘procedural data due process’ mechanism to arm data subjects against potential privacy intrusions.
  • The authors acknowledge that some privacy protection structures already know similar mechanisms. However, due to the “inherent analytical assumptions and methodological biases” of big data systems, the authors argue for a more rigorous framework.

Letouze, Emmanuel, and; Vinck, Patrick. “The Ethics and Politics of Call Data Analytics”, DataPop Alliance, 2015. Available from: http://static1.squarespace.com/static/531a2b4be4b009ca7e474c05/t/54b97f82e4b0ff9569874fe9/1421442946517/WhitePaperCDRsEthicFrameworkDec10-2014Draft-2.pdf.

  • Focusing on the use of Call Detail Records (CDRs) for social good in development contexts, this whitepaper explores both the potential of these datasets – in part by detailing recent successful efforts in the space – and political and ethical constraints to their use.
  • Drawing from the Menlo Report Ethical Principles Guiding ICT Research, the paper explores how these principles might be unpacked to inform an ethics framework for the analysis of CDRs.

Data for Development External Ethics Panel, “Report of the External Ethics Review Panel.” Orange, 2015. Available from: http://www.d4d.orange.com/fr/content/download/43823/426571/version/2/file/D4D_Challenge_DEEP_Report_IBE.pdf.

  • This report presents the findings of the external expert panel overseeing the Orange Data for Development Challenge.
  • Several types of issues faced by the panel are described, along with the various ways in which the panel dealt with those issues.

Federal Trade Commission Staff Report, “Mobile Privacy Disclosures: Building Trust Through Transparency.” Federal Trade Commission, 2013. Available from: www.ftc.gov/os/2013/02/130201mobileprivacyreport.pdf.

  • This report looks at ways to address privacy concerns regarding mobile phone data use. Specific advise is provided for the following actors:
    • Platforms, or operating systems providers
    • App developers
    • Advertising networks and other third parties
    • App developer trade associations, along with academics, usability experts and privacy researchers

Mirani, Leo. “How to use mobile phone data for good without invading anyone’s privacy.” Quartz, 2015. Available from: http://qz.com/398257/how-to-use-mobile-phone-data-for-good-without-invading-anyones-privacy/.

  • This paper considers the privacy implications of using call detail records for social good, and ways to mitigate risks of privacy intrusion.
  • Taking example of the Orange D4D challenge and the anonymization strategy that was employed there, the paper describes how classic ‘anonymization’ is often not enough. The paper then lists further measures that can be taken to ensure adequate privacy protection.

Bernholz, Lucy. “Several Examples of Digital Ethics and Proposed Practices” Stanford Ethics of Data conference, 2014, Available from: http://www.scribd.com/doc/237527226/Several-Examples-of-Digital-Ethics-and-Proposed-Practices.

  • This list of readings prepared for Stanford’s Ethics of Data conference lists some of the leading available literature regarding ethical data use.

Abrams, Martin. “A Unified Ethical Frame for Big Data Analysis.” The Information Accountability Foundation, 2014. Available from: http://www.privacyconference2014.org/media/17388/Plenary5-Martin-Abrams-Ethics-Fundamental-Rights-and-BigData.pdf.

  • Going beyond privacy, this paper discusses the following elements as central to developing a broad framework for data analysis:
    • Beneficial
    • Progressive
    • Sustainable
    • Respectful
    • Fair

Lane, Julia; Stodden, Victoria; Bender, Stefan, and; Nissenbaum, Helen, “Privacy, Big Data and the Public Good”, Cambridge University Press, 2014. Available from: http://www.dataprivacybook.org.

  • This book treats the privacy issues surrounding the use of big data for promoting the public good.
  • The questions being asked include the following:
    • What are the ethical and legal requirements for scientists and government officials seeking to serve the public good without harming individual citizens?
    • What are the rules of engagement?
    • What are the best ways to provide access while protecting confidentiality?
    • Are there reasonable mechanisms to compensate citizens for privacy loss?

Richards, Neil M, and; King, Jonathan H. “Big Data Ethics”. Wake Forest Law Review, 2014. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384174.

  • This paper describes the growing impact of big data analytics on society, and argues that because of this impact, a set of ethical principles to guide data use is called for.
  • The four proposed themes are: privacy, confidentiality, transparency and identity.
  • Finally, the paper discusses how big data can be integrated into society, going into multiple facets of this integration, including the law, roles of institutions and ethical principles.

OECD, “OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data”. Available from: http://www.oecd.org/sti/ieconomy/oecdguidelinesontheprotectionofprivacyandtransborderflowsofpersonaldata.htm.

  • A globally used set of principles to inform thought about handling personal data, the OECD privacy guidelines serve as one the leading standards for informing privacy policies and data governance structures.
  • The basic principles of national application are the following:
    • Collection Limitation Principle
    • Data Quality Principle
    • Purpose Specification Principle
    • Use Limitation Principle
    • Security Safeguards Principle
    • Openness Principle
    • Individual Participation Principle
    • Accountability Principle

The White House Big Data and Privacy Working Group, “Big Data: Seizing Opportunities, Preserving Values”, White House, 2015. Available from: https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_5.1.14_final_print.pdf.

  • Documenting the findings of the White House big data and privacy working group, this report lists i.a. the following key recommendations regarding data governance:
    • Bringing greater transparency to the data services industry
    • Stimulating international conversation on big data, with multiple stakeholders
    • With regard to educational data: ensuring data is used for the purpose it is collected for
    • Paying attention to the potential for big data to facilitate discrimination, and expanding technical understanding to stop discrimination

William Hoffman, “Pathways for Progress” World Economic Forum, 2015. Available from: http://www3.weforum.org/docs/WEFUSA_DataDrivenDevelopment_Report2015.pdf.

  • This paper treats i.a. the lack of well-defined and balanced governance mechanisms as one of the key obstacles preventing particularly corporate sector data from being shared in a controlled space.
  • An approach that balances the benefits against the risks of large scale data usage in a development context, building trust among all stake holders in the data ecosystem, is viewed as key.
  • Furthermore, this whitepaper notes that new governance models are required not just by the growing amount of data and analytical capacity, and more refined methods for analysis. The current “super-structure” of information flows between institutions is also seen as one of the key reasons to develop alternatives to the current – outdated – approaches to data governance.

Selected Readings on Economic Impact of Open Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of open data was originally published in 2014.

Open data is publicly available data – often released by governments, scientists, and occasionally private companies – that is made available for anyone to use, in a machine-readable format, free of charge. Considerable attention has been devoted to the economic potential of open data for businesses and other organizations, and it is now widely accepted that open data plays an important role in spurring innovation, growth, and job creation. From new business models to innovation in local governance, open data is being quickly adopted as a valuable resource at many levels.

Measuring and analyzing the economic impact of open data in a systematic way is challenging, and governments as well as other providers of open data seek to provide access to the data in a standardized way. As governmental transparency increases and open data changes business models and activities in many economic sectors, it is important to understand best practices for releasing and using non-proprietary, public information. Costs, social challenges, and technical barriers also influence the economic impact of open data.

These selected readings are intended as a first step in the direction of answering the question of if we can and how we consider if opening data spurs economic impact.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Bonina, Carla. New Business Models and the Values of Open Data: Definitions, Challenges, and Opportunities. NEMODE 3K – Small Grants Call 2013. http://bit.ly/1xGf9oe

  • In this paper, Dr. Carla Bonina provides an introduction to open data and open data business models, evaluating their potential economic value and identifying future challenges for the effectiveness of open data, such as personal data and privacy, the emerging data divide, and the costs of collecting, producing and releasing open (government) data.

Carpenter, John and Phil Watts. Assessing the Value of OS OpenData™ to the Economy of Great Britain – Synopsis. June 2013. Accessed July 25, 2014. http://bit.ly/1rTLVUE

  • John Carpenter and Phil Watts of Ordnance Survey undertook a study to examine the economic impact of open data to the economy of Great Britain. Using a variety of methods such as case studies, interviews, downlad analysis, adoption rates, impact calculation, and CGE modeling, the authors estimates that the OS OpenData initiative will deliver a net of increase in GDP of £13 – 28.5 million for Great Britain in 2013.

Capgemini Consulting. The Open Data Economy: Unlocking Economic Value by Opening Government and Public Data. Capgemini Consulting. Accessed July 24, 2014. http://bit.ly/1n7MR02

  • This report explores how governments are leveraging open data for economic benefits. Through using a compariative approach, the authors study important open data from organizational, technological, social and political perspectives. The study highlights the potential of open data to drive profit through increasing the effectiveness of benchmarking and other data-driven business strategies.

Deloitte. Open Growth: Stimulating Demand for Open Data in the UK. Deloitte Analytics. December 2012. Accessed July 24, 2014. http://bit.ly/1oeFhks

  • This early paper on open data by Deloitte uses case studies and statistical analysis on open government data to create models of businesses using open data. They also review the market supply and demand of open government data in emerging sectors of the economy.

Gruen, Nicholas, John Houghton and Richard Tooth. Open for Business: How Open Data Can Help Achieve the G20 Growth Target.  Accessed July 24, 2014, http://bit.ly/UOmBRe

  • This report highlights the potential economic value of the open data agenda in Australia and the G20. The report provides an initial literature review on the economic value of open data, as well as a asset of case studies on the economic value of open data, and a set of recommendations for how open data can help the G20 and Australia achieve target objectives in the areas of trade, finance, fiscal and monetary policy, anti-corruption, employment, energy, and infrastructure.

Heusser, Felipe I. Understanding Open Government Data and Addressing Its Impact (draft version). World Wide Web Foundation. http://bit.ly/1o9Egym

  • The World Wide Web Foundation, in collaboration with IDRC has begun a research network to explore the impacts of open data in developing countries. In addition to the Web Foundation and IDRC, the network includes the Berkman Center for Internet and Society at Harvard, the Open Development Technology Alliance and Practical Participation.

Howard, Alex. San Francisco Looks to Tap Into the Open Data Economy. O’Reilly Radar: Insight, Analysis, and Reach about Emerging Technologies.  October 19, 2012.  Accessed July 24, 2014. http://oreil.ly/1qNRt3h

  • Alex Howard points to San Francisco as one of the first municipalities in the United States to embrace an open data platform.  He outlines how open data has driven innovation in local governance.  Moreover, he discusses the potential impact of open data on job creation and government technology infrastructure in the City and County of San Francisco.

Huijboom, Noor and Tijs Van den Broek. Open Data: An International Comparison of Strategies. European Journal of ePractice. March 2011. Accessed July 24, 2014.  http://bit.ly/1AE24jq

  • This article examines five countries and their open data strategies, identifying key features, main barriers, and drivers of progress for of open data programs. The authors outline the key challenges facing European, and other national open data policies, highlighting the emerging role open data initiatives are playing in political and administrative agendas around the world.

Manyika, J., Michael Chui, Diana Farrell, Steve Van Kuiken, Peter Groves, and Elizabeth Almasi Doshi. Open Data: Unlocking Innovation and Performance with Liquid Innovation. McKinsey Global Institute. October 2013. Accessed July 24, 2014.  http://bit.ly/1lgDX0v

  • This research focuses on quantifying the potential value of open data in seven “domains” in the global economy: education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance.

Moore, Alida. Congressional Transparency Caucus: How Open Data Creates Jobs. April 2, 2014. Accessed July 30, 2014. Socrata. http://bit.ly/1n7OJpp

  • Socrata provides a summary of the March 24th briefing of the Congressional Transparency Caucus on the need to increase government transparency through adopting open data initiatives. They include key takeaways from the panel discussion, as well as their role in making open data available for businesses.

Stott, Andrew. Open Data for Economic Growth. The World Bank. June 25, 2014. Accessed July 24, 2014. http://bit.ly/1n7PRJF

  • In this report, The World Bank examines the evidence for the economic potential of open data, holding that the economic potential is quite large, despite a variation in the published estimates, and difficulties assessing its potential methodologically. They provide five archetypes of businesses using open data, and provides recommendations for governments trying to maximize economic growth from open data.

Selected Readings on Sentiment Analysis


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of sentiment analysis was originally published in 2014.

Sentiment Analysis is a field of Computer Science that uses techniques from natural language processing, computational linguistics, and machine learning to predict subjective meaning from text. The term opinion mining is often used interchangeably with Sentiment Analysis, although it is technically a subfield focusing on the extraction of opinions (the umbrella under which sentiment, evaluation, appraisal, attitude, and emotion all lie).

The rise of Web 2.0 and increased information flow has led to an increase in interest towards Sentiment Analysis — especially as applied to social networks and media. Events causing large spikes in media — such as the 2012 Presidential Election Debates — are especially ripe for analysis. Such analyses raise a variety of implications for the future of crowd participation, elections, and governance.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Choi, Eunsol et al. “Hedge detection as a lens on framing in the GMO debates: a position paper.” Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics 13 Jul. 2012: 70-79. http://bit.ly/1wweftP

  • Understanding the ways in which participants in public discussions frame their arguments is important for understanding how public opinion is formed. This paper adopts the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, the authors propose the following question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a more “scientific” tone?
  • Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. The paper proposes a detailed approach to studying whether hedge detection can be used to understand scientific framing in the GMO debates, and provides corpora to facilitate this study. Some of the preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature.

Michael, Christina, Francesca Toni, and Krysia Broda. “Sentiment analysis for debates.” (Unpublished MSc thesis). Department of Computing, Imperial College London (2013). http://bit.ly/Wi86Xv

  • This project aims to expand on existing solutions used for automatic sentiment analysis on text in order to capture support/opposition and agreement/disagreement in debates. In addition, it looks at visualizing the classification results for enhancing the ease of understanding the debates and for showing underlying trends. Finally, it evaluates proposed techniques on an existing debate system for social networking.

Murakami, Akiko, and Rudy Raymond. “Support or oppose?: classifying positions in online debates from reply activities and opinion expressions.” Proceedings of the 23rd International Conference on Computational Linguistics: Posters 23 Aug. 2010: 869-875. https://bit.ly/2Eicfnm

  • In this paper, the authors propose a method for the task of identifying the general positions of users in online debates, i.e., support or oppose the main topic of an online debate, by exploiting local information in their remarks within the debate. An online debate is a forum where each user posts an opinion on a particular topic while other users state their positions by posting their remarks within the debate. The supporting or opposing remarks are made by directly replying to the opinion, or indirectly to other remarks (to express local agreement or disagreement), which makes the task of identifying users’ general positions difficult.
  • A prior study has shown that a link-based method, which completely ignores the content of the remarks, can achieve higher accuracy for the identification task than methods based solely on the contents of the remarks. In this paper, it is shown that utilizing the textual content of the remarks into the link-based method can yield higher accuracy in the identification task.

Pang, Bo, and Lillian Lee. “Opinion mining and sentiment analysis.” Foundations and trends in information retrieval 2.1-2 (2008): 1-135. http://bit.ly/UaCBwD

  • This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Its focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. It includes material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Ranade, Sarvesh et al. “Online debate summarization using topic directed sentiment analysis.” Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining 11 Aug. 2013: 7. http://bit.ly/1nbKtLn

  • Social networking sites provide users a virtual community interaction platform to share their thoughts, life experiences and opinions. Online debate forum is one such platform where people can take a stance and argue in support or opposition of debate topics. An important feature of such forums is that they are dynamic and grow rapidly. In such situations, effective opinion summarization approaches are needed so that readers need not go through the entire debate.
  • This paper aims to summarize online debates by extracting highly topic relevant and sentiment rich sentences. The proposed approach takes into account topic relevant, document relevant and sentiment based features to capture topic opinionated sentences. ROUGE (Recall-Oriented Understudy for Gisting Evaluation, which employ a set of metrics and a software package to compare automatically produced summary or translation against human-produced onces) scores are used to evaluate the system. This system significantly outperforms several baseline systems and show improvement over the state-of-the-art opinion summarization system. The results verify that topic directed sentiment features are most important to generate effective debate summaries.

Schneider, Jodi. “Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities.” http://bit.ly/1mi7ztx

  • Argumentation mining, a relatively new area of discourse analysis, involves automatically identifying and structuring arguments. Following a basic introduction to argumentation, the authors describe a new possible domain for argumentation mining: debates in open online collaboration communities.
  • Based on our experience with manual annotation of arguments in debates, the authors propose argumentation mining as the basis for three kinds of support tools, for authoring more persuasive arguments, finding weaknesses in others’ arguments, and summarizing a debate’s overall conclusions.

Urban Analytics (Updated and Expanded)


As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. In this edition, we explore the literature on Urban Analytics. To suggest additional readings on this or any other topic, please email biblio@thegovlab.org.

Data and its uses for Governance

Urban Analytics places better information in the hands of citizens as well as government officials to empower people to make more informed choices. Today, we are able to gather real-time information about traffic, pollution, noise, and environmental and safety conditions by culling data from a range of tools: from the low-cost sensors in mobile phones to more robust monitoring tools installed in our environment. With data collected and combined from the built, natural and human environments, we can develop more robust predictive models and use those models to make policy smarter.

With the computing power to transmit and store the data from these sensors, and the tools to translate raw data into meaningful visualizations, we can identify problems as they happen, design new strategies for city management, and target the application of scarce resources where they are most needed.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)
Amini, L., E. Bouillet, F. Calabrese, L. Gasparini, and O. Verscheure. “Challenges and Results in City-scale Sensing.” In IEEE Sensors, 59–61, 2011. http://bit.ly/1doodZm.

  • This paper examines “how city requirements map to research challenges in machine learning, optimization, control, visualization, and semantic analysis.”
  • The authors raises several research challenges including how to extract accurate information when the data is noisy and sparse; how to represent findings from digital pervasive technologies; and how people interact with one another and their environment.

Batty, M., K. W. Axhausen, F. Giannotti, A. Pozdnoukhov, A. Bazzani, M. Wachowicz, G. Ouzounis, and Y. Portugali. “Smart Cities of the Future.The European Physical Journal Special Topics 214, no. 1 (November 1, 2012): 481–518. http://bit.ly/HefbjZ.

  • This paper explores the goals and research challenges involved in the development of smart cities that merge ICT with traditional infrastructures through digital technologies.
  • The authors put forth several research objectives, including: 1) to explore the notion of the city as a laboratory for innovation; 2) to develop technologies that ensure equity, fairness and realize a better quality of city life; and 3) to develop technologies that ensure informed participation and create shared knowledge for democratic city governance.
  • The paper also examines several contemporary smart city initiatives, expected paradigm shifts in the field, benefits, risks and impacts.

Budde, Paul. “Smart Cities of Tomorrow.” In Cities for Smart Environmental and Energy Futures, edited by Stamatina Th Rassia and Panos M. Pardalos, 9–20. Energy Systems. Springer Berlin Heidelberg, 2014. http://bit.ly/17MqPZW.

  • This paper examines the components and strategies involved in the creation of smart cities featuring “cohesive and open telecommunication and software architecture.”
  • In their study of smart cities, the authors examine smart and renewable energy; next-generation networks; smart buildings; smart transport; and smart government.
  • They conclude that for the development of smart cities, information and communication technology (ICT) is needed to build more horizontal collaborative structures, useful data must be analyzed in real time and people and/or machines must be able to make instant decisions related to social and urban life.

Cardone, G., L. Foschini, P. Bellavista, A. Corradi, C. Borcea, M. Talasila, and R. Curtmola. “Fostering Participaction in Smart Cities: a Geo-social Crowdsensing Platform.” IEEE Communications
Magazine 51, no. 6 (2013): 112–119. http://bit.ly/17iJ0vZ.

  • This article examines “how and to what extent the power of collective although imprecise intelligence can be employed in smart cities.”
  • To tackle problems of managing the crowdsensing process, this article proposes a “crowdsensing platform with three main original technical aspects: an innovative geo-social model to profile users along different variables, such as time, location, social interaction, service usage, and human activities; a matching algorithm to autonomously choose people to involve in participActions and to quantify the performance of their sensing; and a new Android-based platform to collect sensing data from smart phones, automatically or with user help, and to deliver sensing/actuation tasks to users.”

Chen, Chien-Chu. “The Trend towards ‘Smart Cities.’” International Journal of Automation and Smart Technology. June 1, 2014. http://bit.ly/1jOOaAg.

  • In this study, Chen explores the ambitions, prevalence and outcomes of a variety of smart cities, organized into five categories:
    • Transportation-focused smart cities
    • Energy-focused smart cities
    • Building-focused smart cities
    • Water-resources-focused smart cities
    • Governance-focused smart cities
  • The study finds that the “Asia Pacific region accounts for the largest share of all smart city development plans worldwide, with 51% of the global total. Smart city development plans in the Asia Pacific region tend to be energy-focused smart city initiatives, aimed at easing the pressure on energy resources that will be caused by continuing rapid urbanization in the future.”
  • North America, on the other hand is generally more geared toward energy-focused smart city development plans. “In North America, there has been a major drive to introduce smart meters and smart electric power grids, integrating the electric power sector with information and communications technology (ICT) and replacing obsolete electric power infrastructure, so as to make cities’ electric power systems more reliable (which in turn can help to boost private-sector investment, stimulate the growth of the ‘green energy’ industry, and create more job opportunities).”
  • Looking to Taiwan as an example, Chen argues that, “Cities in different parts of the world face different problems and challenges when it comes to urban development, making it necessary to utilize technology applications from different fields to solve the unique problems that each individual city has to overcome; the emphasis here is on the development of customized solutions for smart city development.”

Domingo, A., B. Bellalta, M. Palacin, M. Oliver and E. Almirall. “Public Open Sensor Data: Revolutionizing Smart Cities.” Technology and Society Magazine, IEEE 32, No. 4. Winter 2013. http://bit.ly/1iH6ekU.

  • In this article, the authors explore the “enormous amount of information collected by sensor devices” that allows for “the automation of several real-time services to improve city management by using intelligent traffic-light patterns during rush hour, reducing water consumption in parks, or efficiently routing garbage collection trucks throughout the city.”
  • They argue that, “To achieve the goal of sharing and open data to the public, some technical expertise on the part of citizens will be required. A real environment – or platform – will be needed to achieve this goal.” They go on to introduce a variety of “technical challenges and considerations involved in building an Open Sensor Data platform,” including:
    • Scalability
    • Reliability
    • Low latency
    • Standardized formats
    • Standardized connectivity
  • The authors conclude that, despite incredible advancements in urban analytics and open sensing in recent years, “Today, we can only imagine the revolution in Open Data as an introduction to a real-time world mashup with temperature, humidity, CO2 emission, transport, tourism attractions, events, water and gas consumption, politics decisions, emergencies, etc., and all of this interacting with us to help improve the future decisions we make in our public and private lives.”

Harrison, C., B. Eckman, R. Hamilton, P. Hartswick, J. Kalagnanam, J. Paraszczak, and P. Williams. “Foundations for Smarter Cities.” IBM Journal of Research and Development 54, no. 4 (2010): 1–16. http://bit.ly/1iha6CR.

  • This paper describes the information technology (IT) foundation and principles for Smarter Cities.
  • The authors introduce three foundational concepts of smarter cities: instrumented, interconnected and intelligent.
  • They also describe some of the major needs of contemporary cities, and concludes that Creating the Smarter City implies capturing and accelerating flows of information both vertically and horizontally.

Hernández-Muñoz, José M., Jesús Bernat Vercher, Luis Muñoz, José A. Galache, Mirko Presser, Luis A. Hernández Gómez, and Jan Pettersson. “Smart Cities at the Forefront of the Future Internet.” In The Future Internet, edited by John Domingue, Alex Galis, Anastasius Gavras, Theodore Zahariadis, Dave Lambert, Frances Cleary, Petros Daras, et al., 447–462. Lecture Notes in Computer Science 6656. Springer Berlin Heidelberg, 2011. http://bit.ly/HhNbMX.

  • This paper explores how the “Internet of Things (IoT) and Internet of Services (IoS), can become building blocks to progress towards a unified urban-scale ICT platform transforming a Smart City into an open innovation platform.”
  • The authors examine the SmartSantander project to argue that, “the different stakeholders involved in the smart city business is so big that many non-technical constraints must be considered (users, public administrations, vendors, etc.).”
  • The authors also discuss the need for infrastructures at the, for instance, European level for realistic large-scale experimentally-driven research.

Hoon-Lee, Jung, Marguerite Gong Hancock, Mei-Chih Hu. “Towards an effective framework for building smart cities: Lessons from Seoul and San Francisco.” Technological Forecasting and Social Change. Ocotober 3, 2013. http://bit.ly/1rzID5v.

  • In this study, the authors aim to “shed light on the process of building an effective smart city by integrating various practical perspectives with a consideration of smart city characteristics taken from the literature.”
  • They propose a conceptual framework based on case studies from Seoul and San Francisco built around the following dimensions:
    • Urban openness
    • Service innovation
    • Partnerships formation
    • Urban proactiveness
    • Smart city infrastructure integration
    • Smart city governance
  • The authors conclude with a summary of research findings featuring “8 stylized facts”:
    • Movement towards more interactive services engaging citizens;
    • Open data movement facilitates open innovation;
    • Diversifying service development: exploit or explore?
    • How to accelerate adoption: top-down public driven vs. bottom-up market driven partnerships;
    • Advanced intelligent technology supports new value-added smart city services;
    • Smart city services combined with robust incentive systems empower engagement;
    • Multiple device & network accessibility can create network effects for smart city services;
    • Centralized leadership implementing a comprehensive strategy boosts smart initiatives.

Kamel Boulos, Maged N. and Najeeb M. Al-Shorbaji. “On the Internet of Things, smart cities and the WHO Healthy Cities.” International Journal of Health Geographics 13, No. 10. 2014. http://bit.ly/Tkt9GA.

  • In this article, the authors give a “brief overview of the Internet of Things (IoT) for cities, offering examples of IoT-powered 21st century smart cities, including the experience of the Spanish city of Barcelona in implementing its own IoT-driven services to improve the quality of life of its people through measures that promote an eco-friendly, sustainable environment.”
  • The authors argue that one of the central needs for harnessing the power of the IoT and urban analytics is for cities to “involve and engage its stakeholders from a very early stage (city officials at all levels, as well as citizens), and to secure their support by raising awareness and educating them about smart city technologies, the associated benefits, and the likely challenges that will need to be overcome (such as privacy issues).”
  • They conclude that, “The Internet of Things is rapidly gaining a central place as key enabler of the smarter cities of today and the future. Such cities also stand better chances of becoming healthier cities.”

Keller, Sallie Ann, Steven E. Koonin, and Stephanie Shipp. “Big Data and City Living – What Can It Do for Us?Significance 9, no. 4 (2012): 4–7. http://bit.ly/166W3NP.

  • This article provides a short introduction to Big Data, its importance, and the ways in which it is transforming cities. After an overview of the social benefits of big data in an urban context, the article examines its challenges, such as privacy concerns and institutional barriers.
  • The authors recommend that new approaches to making data available for research are needed that do not violate the privacy of entities included in the datasets. They believe that balancing privacy and accessibility issues will require new government regulations and incentives.

Kitchin, Rob. “The Real-Time City? Big Data and Smart Urbanism.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, July 3, 2013. http://bit.ly/1aamZj2.

  • This paper focuses on “how cities are being instrumented with digital devices and infrastructure that produce ‘big data’ which enable real-time analysis of city life, new modes of technocratic urban governance, and a re-imagining of cities.”
  • The authors provide “a number of projects that seek to produce a real-time analysis of the city and provides a critical reflection on the implications of big data and smart urbanism.”

Mostashari, A., F. Arnold, M. Maurer, and J. Wade. “Citizens as Sensors: The Cognitive City Paradigm.” In 2011 8th International Conference Expo on Emerging Technologies for a Smarter World (CEWIT), 1–5, 2011. http://bit.ly/1fYe9an.

  • This paper argues that. “implementing sensor networks are a necessary but not sufficient approach to improving urban living.”
  • The authors introduce the concept of the “Cognitive City” – a city that can not only operate more efficiently due to networked architecture, but can also learn to improve its service conditions, by planning, deciding and acting on perceived conditions.
  • Based on this conceptualization of a smart city as a cognitive city, the authors propose “an architectural process approach that allows city decision-makers and service providers to integrate cognition into urban processes.”

Oliver, M., M. Palacin, A. Domingo, and V. Valls. “Sensor Information Fueling Open Data.” In Computer Software and Applications Conference Workshops (COMPSACW), 2012 IEEE 36th Annual, 116–121, 2012. http://bit.ly/HjV4jS.

  • This paper introduces the concept of sensor networks as a key component in the smart cities framework, and shows how real-time data provided by different city network sensors enrich Open Data portals and require a new architecture to deal with massive amounts of continuously flowing information.
  • The authors’ main conclusion is that by providing a framework to build new applications and services using public static and dynamic data that promote innovation, a real-time open sensor network data platform can have several positive effects for citizens.

Perera, Charith, Arkady Zaslavsky, Peter Christen and Dimitrios Georgakopoulos. “Sensing as a service model for smart cities supported by Internet of Things.” Transactions on Emerging Telecommunications Technologies 25, Issue 1. January 2014. http://bit.ly/1qJLDP9.

  • This paper looks into the “enormous pressure towards efficient city management” that has “triggered various Smart City initiatives by both government and private sector businesses to invest in information and communication technologies to find sustainable solutions to the growing issues.”
  • The authors explore the parallel advancement of the Internet of Things (IoT), which “envisions to connect billions of sensors to the Internet and expects to use them for efficient and effective resource management in Smart Cities.”
  • The paper proposes the sensing as a service model “as a solution based on IoT infrastructure.” The sensing as a service model consists of four conceptual layers: “(i) sensors and sensor owners; (ii) sensor publishers (SPs); (iii) extended service providers (ESPs); and (iv) sensor data consumers. They go on to describe how this model would work in the areas of waste management, smart agriculture and environmental management.

Privacy, Big Data, and the Public Good: Frameworks for Engagement. Edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum; Cambridge University Press, 2014. http://bit.ly/UoGRca.

  • This book focuses on the legal, practical, and statistical approaches for maximizing the use of massive datasets while minimizing information risk.
  • “Big data” is more than a straightforward change in technology.  It poses deep challenges to our traditions of notice and consent as tools for managing privacy.  Because our new tools of data science can make it all but impossible to guarantee anonymity in the future, the authors question whether it possible to truly give informed consent, when we cannot, by definition, know what the risks are from revealing personal data either for individuals or for society as a whole.
  • Based on their experience building large data collections, authors discuss some of the best practical ways to provide access while protecting confidentiality.  What have we learned about effective engineered controls?  About effective access policies?  About designing data systems that reinforce – rather than counter – access policies?  They also explore the business, legal, and technical standards necessary for a new deal on data.
  • Since the data generating process or the data collection process is not necessarily well understood for big data streams, authors discuss what statistics can tell us about how to make greatest scientific use of this data. They also explore the shortcomings of current disclosure limitation approaches and whether we can quantify the extent of privacy loss.

Schaffers, Hans, Nicos Komninos, Marc Pallot, Brigitte Trousse, Michael Nilsson, and Alvaro Oliveira. “Smart Cities and the Future Internet: Towards Cooperation Frameworks for Open Innovation.” In The Future Internet, edited by John Domingue, Alex Galis, Anastasius Gavras, Theodore Zahariadis, Dave Lambert, Frances Cleary, Petros Daras, et al., 431–446. Lecture Notes in Computer Science 6656. Springer Berlin Heidelberg, 2011. http://bit.ly/16ytKoT.

  • This paper “explores ‘smart cities’ as environments of open and user-driven innovation for experimenting and validating Future Internet-enabled services.”
  • The authors examine several smart city projects to illustrate the central role of users in defining smart services and the importance of participation. They argue that, “Two different layers of collaboration can be distinguished. The first layer is collaboration within the innovation process. The second layer concerns collaboration at the territorial level, driven by urban and regional development policies aiming at strengthening the urban innovation systems through creating effective conditions for sustainable innovation.”

Suciu, G., A. Vulpe, S. Halunga, O. Fratu, G. Todoran, and V. Suciu. “Smart Cities Built on Resilient Cloud Computing and Secure Internet of Things.” In 2013 19th International Conference on Control Systems and Computer Science (CSCS), 513–518, 2013. http://bit.ly/16wfNgv.

  • This paper proposes “a new platform for using cloud computing capacities for provision and support of ubiquitous connectivity and real-time applications and services for smart cities’ needs.”
  • The authors present a “framework for data procured from highly distributed, heterogeneous, decentralized, real and virtual devices (sensors, actuators, smart devices) that can be automatically managed, analyzed and controlled by distributed cloud-based services.”

Townsend, Anthony. Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia. W. W. Norton & Company, 2013.

  • In this book, Townsend illustrates how “cities worldwide are deploying technology to address both the timeless challenges of government and the mounting problems posed by human settlements of previously unimaginable size and complexity.”
  • He also considers “the motivations, aspirations, and shortcomings” of the many stakeholders involved in the development of smart cities, and poses a new civics to guide these efforts.
  • He argues that smart cities are not made smart by various, soon-to-be-obsolete technologies built into its infrastructure, but how citizens use these ever-changing technologies to be “human-centered, inclusive and resilient.”

To stay current on recent writings and developments on Urban Analytics, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to biblio@thegovlab.org or in the comments below.

Index: Privacy and Security


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on privacy and security and was originally published in 2014.

Globally

  • Percentage of people who feel the Internet is eroding their personal privacy: 56%
  • Internet users who feel comfortable sharing personal data with an app: 37%
  • Number of users who consider it important to know when an app is gathering information about them: 70%
  • How many people in the online world use privacy tools to disguise their identity or location: 28%, or 415 million people
  • Country with the highest penetration of general anonymity tools among Internet users: Indonesia, where 42% of users surveyed use proxy servers
  • Percentage of China’s online population that disguises their online location to bypass governmental filters: 34%

In the United States

Over the Years

  • In 1996, percentage of the American public who were categorized as having “high privacy concerns”: 25%
    • Those with “Medium privacy concerns”: 59%
    • Those who were unconcerned with privacy: 16%
  • In 1998, number of computer users concerned about threats to personal privacy: 87%
  • In 2001, those who reported “medium to high” privacy concerns: 88%
  • Individuals who are unconcerned about privacy: 18% in 1990, down to 10% in 2004
  • How many online American adults are more concerned about their privacy in 2014 than they were a year ago, indicating rising privacy concerns: 64%
  • Number of respondents in 2012 who believe they have control over their personal information: 35%, downward trend for 7 years
  • How many respondents in 2012 continue to perceive privacy and the protection of their personal information as very important or important to the overall trust equation: 78%, upward trend for seven years
  • How many consumers in 2013 trust that their bank is committed to ensuring the privacy of their personal information is protected: 35%, down from 48% in 2004

Privacy Concerns and Beliefs

  • How many Internet users worry about their privacy online: 92%
    • Those who report that their level of concern has increased from 2013 to 2014: 7 in 10
    • How many are at least sometimes worried when shopping online: 93%, up from 89% in 2012
    • Those who have some concerns when banking online: 90%, up from 86% in 2012
  • Number of Internet users who are worried about the amount of personal information about them online: 50%, up from 33% in 2009
    • Those who report that their photograph is available online: 66%
      • Their birthdate: 50%
      • Home address: 30%
      • Cell number: 24%
      • A video: 21%
      • Political affiliation: 20%
  • Consumers who are concerned about companies tracking their activities: 58%
    • Those who are concerned about the government tracking their activities: 38%
  • How many users surveyed felt that the National Security Association (NSA) overstepped its bounds in light of recent NSA revelations: 44%
  • Respondents who are comfortable with advertisers using their web browsing history to tailor advertisements as long as it is not tied to any other personally identifiable information: 36%, up from 29% in 2012
  • Percentage of voters who do not want political campaigns to tailor their advertisements based on their interests: 86%
  • Percentage of respondents who do not want news tailored to their interests: 56%
  • Percentage of users who are worried about their information will be stolen by hackers: 75%
    • Those who are worried about companies tracking their browsing history for targeted advertising: 54%
  • How many consumers say they do not trust businesses with their personal information online: 54%
  • Top 3 most trusted companies for privacy identified by consumers from across 25 different industries in 2012: American Express, Hewlett Packard and Amazon
    • Most trusted industries for privacy: Healthcare, Consumer Products and Banking
    • Least trusted industries for privacy: Internet and Social Media, Non-Profits and Toys
  • Respondents who admit to sharing their personal information with companies they did not trust in 2012 for reasons such as convenience when making a purchase: 63%
  • Percentage of users who say they prefer free online services supported by targeted ads: 61%
    • Those who prefer paid online services without targeted ads: 33%
  • How many Internet users believe that it is not possible to be completely anonymous online: 59%
    • Those who believe complete online anonymity is still possible: 37%
    • Those who say people should have the ability to use the Internet anonymously: 59%
  • Percentage of Internet users who believe that current laws are not good enough in protecting people’s privacy online: 68%
    • Those who believe current laws provide reasonable protection: 24%

Security Related Issues

  • How many have had an email or social networking account compromised or taken over without permission: 21%
  • Those who have been stalked or harassed online: 12%
  • Those who think the federal government should do more to act against identity theft: 74%
  • Consumers who agree that they will avoid doing business with companies who they do not believe protect their privacy online: 89%
    • Among 65+ year old consumers: 96%

Privacy-Related Behavior

  • How many mobile phone users have decided not to install an app after discovering the amount of information it collects: 54%
  • Number of Internet users who have taken steps to remove or mask their digital footprint (including clearing cookies, encrypting emails, and using virtual networks to mask their IP addresses): 86%
  • Those who have set their browser to disable cookies: 65%
  • Number of users who have not allowed a service to remember their credit card information: 73%
  • Those who have chosen to block an app from accessing their location information: 53%
  • How many have signed up for a two-step sign-in process: 57%
  • Percentage of Gen-X (33-48 year olds) and Millennials (18-32 year olds) who say they never change their passwords or only change them when forced to: 41%
    • How many report using a unique password for each site and service: 4 in 10
    • Those who use the same password everywhere: 7%

Sources

Open Data (Updated and Expanded)


As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. We start our series with a focus on Open Data. To suggest additional readings on this or any other topic, please email biblio@thegovlab.org.

Data and its uses for GovernanceOpen data refers to data that is publicly available for anyone to use and which is licensed in a way that allows for its re-use. The common requirement that open data be machine-readable not only means that data is distributed via the Internet in a digitized form, but can also be processed by computers through automation, ensuring both wide dissemination and ease of re-use. Much of the focus of the open data advocacy community is on government data and government-supported research data. For example, in May 2013, the US Open Data Policy defined open data as publicly available data structured in a way that enables the data to be fully discoverable and usable by end users, and consistent with a number of principles focused on availability, accessibility and reusability.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)
Fox, Mark S. “City Data: Big, Open and Linked.” Working Paper, Enterprise Integration Laboratory (2013). http://bit.ly/1bFr7oL.

  • This paper examines concepts that underlie Big City Data using data from multiple cities as examples. It begins by explaining the concepts of Open, Unified, Linked, and Grounded data, which are central to the Semantic Web. Fox then explore Big Data as an extension of Data Analytics, and provide case examples of good data analytics in cities.
  • Fox concludes that we can develop the tools that will enable anyone to analyze data, both big and small, by adopting the principles of the Semantic Web:
    • Data being openly available over the internet,
    • Data being unifiable using common vocabularies,
    • Data being linkable using International Resource Identifiers,
    • Data being accessible using a common data structure, namely triples,
    • Data being semantically grounded using Ontologies.

Foulonneau, Muriel, Sébastien Martin, and Slim Turki. “How Open Data Are Turned into Services?” In Exploring Services Science, edited by Mehdi Snene and Michel Leonard, 31–39. Lecture Notes in Business Information Processing 169. Springer International Publishing, 2014. http://bit.ly/1fltUmR.

  • In this chapter, the authors argue that, considering the important role the development of new services plays as a motivation for open data policies, the impact of new services created through open data should play a more central role in evaluating the success of open data initiatives.
  • Foulonneau, Martin and Turki argue that the following metrics should be considered when evaluating the success of open data initiatives: “the usage, audience, and uniqueness of the services, according to the changes it has entailed in the public institutions that have open their data…the business opportunity it has created, the citizen perception of the city…the modification to particular markets it has entailed…the sustainability of the services created, or even the new dialog created with citizens.”

Goldstein, Brett, and Lauren Dyson. Beyond Transparency: Open Data and the Future of Civic Innovation. 1 edition. (Code for America Press: 2013). http://bit.ly/15OAxgF

  • This “cross-disciplinary survey of the open data landscape” features stories from practitioners in the open data space — including Michael Flowers, Brett Goldstein, Emer Colmeman and many others — discussing what they’ve accomplished with open civic data. The book “seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving.”
  • The book’s editors seek to accomplish the following objectives:
    • Help local governments learn how to start an open data program
    • Spark discussion on where open data will go next
    • Help community members outside of government better engage with the process of governance
    • Lend a voice to many aspects of the open data community.
  • The book is broken into five sections: Opening Government Data, Building on Open Data, Understanding Open Data, Driving Decisions with Data and Looking Ahead.

Granickas, Karolis. “Understanding the Impact of Releasing and Re-using Open Government Data.” European Public Sector Information Platform, ePSIplatform Topic Report No. 2013/08, (2013). http://bit.ly/GU0Nx4.

  • This paper examines the impact of open government data by exploring the latest research in the field, with an eye toward enabling  an environment for open data, as well as identifying the benefits of open government data and its political, social, and economic impacts.
  • Granickas concludes that to maximize the benefits of open government data: a) further research is required that structure and measure potential benefits of open government data; b) “government should pay more attention to creating feedback mechanisms between policy implementers, data providers and data-re-users”; c) “finding a balance between demand and supply requires mechanisms of shaping demand from data re-users and also demonstration of data inventory that governments possess”; and lastly, d) “open data policies require regular monitoring.”

Gurin, Joel. Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation, (New York: McGraw-Hill, 2014). http://amzn.to/1flubWR.

  • In this book, GovLab Senior Advisor and Open Data 500 director Joel Gurin explores the broad realized and potential benefit of Open Data, and how, “unlike Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.”
  • The book provides “an essential guide to understanding all kinds of open databases – business, government, science, technology, retail, social media, and more – and using those resources to your best advantage.”
  • In particular, Gurin discusses a number of applications of Open Data with very real potential benefits:
    • “Hot Startups: turn government data into profitable ventures;
    • Savvy Marketing: understanding how reputational data drives your brand;
    • Data-Driven Investing: apply new tools for business analysis;
    • Consumer Information: connect with your customers using smart disclosure;
    • Green Business: use data to bet on sustainable companies;
    • Fast R&D: turn the online world into your research lab;
    • New Opportunities: explore open fields for new businesses.”

Jetzek, Thorhildur, Michel Avital, and Niels Bjørn-Andersen. “Generating Value from Open Government Data.” Thirty Fourth International Conference on Information Systems, 5. General IS Topics 2013. http://bit.ly/1gCbQqL.

  • In this paper, the authors “developed a conceptual model portraying how data as a resource can be transformed to value.”
  • Jetzek, Avital and Bjørn-Andersen propose a conceptual model featuring four Enabling Factors (openness, resource governance, capabilities and technical connectivity) acting on four Value Generating Mechanisms (efficiency, innovation, transparency and participation) leading to the impacts of Economic and Social Value.
  • The authors argue that their research supports that “all four of the identified mechanisms positively influence value, reflected in the level of education, health and wellbeing, as well as the monetary value of GDP and environmental factors.”

Kassen, Maxat. “A promising phenomenon of open data: A case study of the Chicago open data project.Government Information Quarterly (2013). http://bit.ly/1ewIZnk.

  • This paper uses the Chicago open data project to explore the “empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing.”
  • Kassen argues that “open data-driven projects offer a new platform for proactive civic engagement” wherein governments can harness “the collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner.”
  • The paper highlights the need for independent IT developers to network in order for this trend to continue, as well as the importance of the private sector in “overall diffusion of the open data concept.”

Keen, Justin, Radu Calinescu, Richard Paige, John Rooksby. “Big data + politics = open data: The case of health care data in England.Policy and Internet 5 (2), (2013): 228–243. http://bit.ly/1i231WS.

  • This paper examines the assumptions regarding open datasets, technological infrastructure and access, using healthcare systems as a case study.
  • The authors specifically address two assumptions surrounding enthusiasm about Big Data in healthcare: the assumption that healthcare datasets and technological infrastructure are up to task, and the assumption of access to this data from outside the healthcare system.
  • By using the National Health Service in England as an example, the authors identify data, technology, and information governance challenges. They argue that “public acceptability of third party access to detailed health care datasets is, at best, unclear,” and that the prospects of Open Data depend on Open Data policies, which are inherently political, and the government’s assertion of property rights over large datasets. Thus, they argue that the “success or failure of Open Data in the NHS may turn on the question of trust in institutions.”

Kulk, Stefan and Bastiaan Van Loenen. “Brave New Open Data World?International Journal of Spatial Data Infrastructures Research, May 14, 2012. http://bit.ly/15OAUYR.

  • This paper examines the evolving tension between the open data movement and the European Union’s privacy regulations, especially the Data Protection Directive.
  • The authors argue, “Technological developments and the increasing amount of publicly available data are…blurring the lines between non-personal and personal data. Open data may not seem to be personal data on first glance especially when it is anonymised or aggregated. However, it may become personal by combining it with other publicly available data or when it is de-anonymised.”

Kundra, Vivek. “Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect.” Joan Shorenstein Center on the Press, Politics and Public Policy, Harvard College: Discussion Paper Series, January 2012, http://hvrd.me/1fIwsjR.

  • In this paper, Vivek Kundra, the first Chief Information Officer of the United States, explores the growing impact of open data, and argues that, “In the information economy, data is power and we face a choice between democratizing it and holding on to it for an asymmetrical advantage.”
  • Kundra offers four specific recommendations to maximize the impact of open data: Citizens and NGOs must demand open data in order to fight government corruption, improve accountability and government services; Governments must enact legislation to change the default setting of government to open, transparent and participatory; The press must harness the power of the network effect through strategic partnerships and crowdsourcing to cut costs and provide better insights; and Venture capitalists should invest in startups focused on building companies based on public sector data.

Noveck, Beth Simone and Daniel L. Goroff. “Information for Impact: Liberating Nonprofit Sector Data.” The Aspen Institute Philanthropy & Social Innovation Publication Number 13-004. 2013. http://bit.ly/WDxd7p.

  • This report is focused on “obtaining better, more usable data about the nonprofit sector,” which encompasses, as of 2010, “1.5 million tax-exempt organizations in the United States with $1.51 trillion in revenues.”
  • Toward that goal, the authors propose liberating data from the Form 990, an Internal Revenue Service form that “gathers and publishes a large amount of information about tax-exempt organizations,” including information related to “governance, investments, and other factors not directly related to an organization’s tax calculations or qualifications for tax exemption.”
  • The authors recommend a two-track strategy: “Pursuing the longer-term goal of legislation that would mandate electronic filing to create open 990 data, and pursuing a shorter-term strategy of developing a third party platform that can demonstrate benefits more immediately.”

Robinson, David G., Harlan Yu, William P. Zeller, and Edward W. Felten, “Government Data and the Invisible Hand.” Yale Journal of Law & Technology 11 (2009), http://bit.ly/1c2aDLr.

  • This paper proposes a new approach to online government data that “leverages both the American tradition of entrepreneurial self-reliance and the remarkable low-cost flexibility of contemporary digital technology.”
  • “In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that ‘exposes’ the underlying data.”
Ubaldi, Barbara. “Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives.” OECD Working Papers on Public Governance. Paris: Organisation for Economic Co-operation and Development, May 27, 2013. http://bit.ly/15OB6qP.

  • This working paper from the OECD seeks to provide an all-encompassing look at the principles, concepts and criteria framing open government data (OGD) initiatives.
  • Ubaldi also analyzes a variety of challenges to implementing OGD initiatives, including policy, technical, economic and financial, organizational, cultural and legal impediments.
  • The paper also proposes a methodological framework for evaluating OGD Initiatives in OECD countries, with the intention of eventually “developing a common set of metrics to consistently assess impact and value creation within and across countries.”

Worthy, Ben. “David Cameron’s Transparency Revolution? The Impact of Open Data in the UK.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, November 29, 2013. http://bit.ly/NIrN6y.

  • In this article, Worthy “examines the impact of the UK Government’s Transparency agenda, focusing on the publication of spending data at local government level. It measures the democratic impact in terms of creating transparency and accountability, public participation and everyday information.”
  • Worthy’s findings, based on surveys of local authorities, interviews and FOI requests, are disappointing. He finds that:
    • Open spending data has led to some government accountability, but largely from those already monitoring government, not regular citizens.
    • Open Data has not led to increased participation, “as it lacks the narrative or accountability instruments to fully bring such effects.”
    • It has also not “created a new stream of information to underpin citizen choice, though new innovations offer this possibility. The evidence points to third party innovations as the key.
  • Despite these initial findings, “Interviewees pointed out that Open Data holds tremendous opportunities for policy-making. Joined up data could significantly alter how policy is made and resources targeted. From small scale issues e.g. saving money through prescriptions to targeting homelessness or health resources, it can have a transformative impact. “

Zuiderwijk, Anneke, Marijn Janssen, Sunil Choenni, Ronald Meijer and Roexsana Sheikh Alibaks. “Socio-technical Impediments of Open Data.” Electronic Journal of e-Government 10, no. 2 (2012). http://bit.ly/17yf4pM.

  • This paper to seeks to identify the socio-technical impediments to open data impact based on a review of the open data literature, as well as workshops and interviews.
  • The authors discovered 118 impediments across ten categories: 1) availability and access; 2) find-ability; 3) usability; 4) understandability; 5) quality; 6) linking and combining data; 7) comparability and compatibility; 8) metadata; 9) interaction with the data provider; and 10) opening and uploading.

Zuiderwijk, Anneke and Marijn Janssen. “Open Data Policies, Their Implementation and Impact: A Framework for Comparison.” Government Information Quarterly 31, no. 1 (January 2014): 17–29. http://bit.ly/1bQVmYT.

  • In this article, Zuiderwijk and Janssen argue that “currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research [being] done on the issues that are covered by open data policies, their intent and actual impact.”
  • With this evaluation deficit in mind, the authors propose a new framework for comparing open data policies at different government levels using the following elements for comparison:
    • Policy environment and context, such as level of government organization and policy objectives;
    • Policy content (input), such as types of data not publicized and technical standards;
    • Performance indicators (output), such as benefits and risks of publicized data; and
    • Public values (impact).

To stay current on recent writings and developments on Open Data, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to biblio@thegovlab.org or in the comments below.

Selected Readings on Personal Data: Security and Use


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of personal data was originally published in 2014.

Advances in technology have greatly increased the potential for policymakers to utilize the personal data of large populations for the public good. However, the proliferation of vast stores of useful data has also given rise to a variety of legislative, political, and ethical concerns surrounding the privacy and security of citizens’ personal information, both in terms of collection and usage. Challenges regarding the governance and regulation of personal data must be addressed in order to assuage individuals’ concerns regarding the privacy, security, and use of their personal information.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Cavoukian, Ann. “Personal Data Ecosystem (PDE) – A Privacy by Design Approach to an Individual’s Pursuit of Radical Control.” Privacy by Design, October 15, 2013. https://bit.ly/2S00Yfu.

  • In this paper, Cavoukian describes the Personal Data Ecosystem (PDE), an “emerging landscape of companies and organizations that believe individuals should be in control of their personal data, and make available a growing number of tools and technologies to enable this control.” She argues that, “The right to privacy is highly compatible with the notion of PDE because it enables the individual to have a much greater degree of control – “Radical Control” – over their personal information than is currently possible today.”
  • To ensure that the PDE reaches its privacy-protection potential, Cavouckian argues that it must practice The 7 Foundational Principles of Privacy by Design:
    • Proactive not Reactive; Preventative not Remedial
    • Privacy as the Default Setting
    • Privacy Embedded into Design
    • Full Functionality – Positive-Sum, not Zero-Sum
    • End-to-End Security – Full Lifecycle Protection
    • Visibility and Transparency – Keep it Open
    • Respect for User Privacy – Keep it User-Centric

Kirkham, T., S. Winfield, S. Ravet, and S. Kellomaki. “A Personal Data Store for an Internet of Subjects.” In 2011 International Conference on Information Society (i-Society). 92–97.  http://bit.ly/1alIGuT.

  • This paper examines various factors involved in the governance of personal data online, and argues for a shift from “current service-oriented applications where often the service provider is in control of the person’s data” to a person centric architecture where the user is at the center of personal data control.
  • The paper delves into an “Internet of Subjects” concept of Personal Data Stores, and focuses on implementation of such a concept on personal data that can be characterized as either “By Me” or “About Me.”
  • The paper also presents examples of how a Personal Data Store model could allow users to both protect and present their personal data to external applications, affording them greater control.

OECD. The 2013 OECD Privacy Guidelines. 2013. http://bit.ly/166TxHy.

  • This report is indicative of the “important role in promoting respect for privacy as a fundamental value and a condition for the free flow of personal data across borders” played by the OECD for decades. The guidelines – revised in 2013 for the first time since being drafted in 1980 – are seen as “[t]he cornerstone of OECD work on privacy.”
  • The OECD framework is built around eight basic principles for personal data privacy and security:
    • Collection Limitation
    • Data Quality
    • Purpose Specification
    • Use Limitation
    • Security Safeguards
    • Openness
    • Individual Participation
    • Accountability

Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” UCLA Law Review 57, 1701 (2010). http://bit.ly/18Q5Mta.

  • This article explores the implications of the “astonishing ease” with which scientists have demonstrated the ability to “reidentify” or “deanonmize” supposedly anonymous personal information.
  • Rather than focusing exclusively on whether personal data is “anonymized,” Ohm offers five factors for governments and other data-handling bodies to use for assessing the risk of privacy harm: data-handling techniques, private versus public release, quantity, motive and trust.

Polonetsky, Jules and Omer Tene. “Privacy in the Age of Big Data: A Time for Big Decisions.” Stanford Law Review Online 64 (February 2, 2012): 63. http://bit.ly/1aeSbtG.

  • In this article, Tene and Polonetsky argue that, “The principles of privacy and data protection must be balanced against additional societal values such as public health, national security and law enforcement, environmental protection, and economic efficiency. A coherent framework would be based on a risk matrix, taking into account the value of different uses of data against the potential risks to individual autonomy and privacy.”
  • To achieve this balance, the authors believe that, “policymakers must address some of the most fundamental concepts of privacy law, including the definition of ‘personally identifiable information,’ the role of consent, and the principles of purpose limitation and data minimization.”

Shilton, Katie, Jeff Burke, Deborah Estrin, Ramesh Govindan, Mark Hansen, Jerry Kang, and Min Mun. “Designing the Personal Data Stream: Enabling Participatory Privacy in Mobile Personal Sensing”. TPRC, 2009. http://bit.ly/18gh8SN.

  • This article argues that the Codes of Fair Information Practice, which have served as a model for data privacy for decades, do not take into account a world of distributed data collection, nor the realities of data mining and easy, almost uncontrolled, dissemination.
  • The authors suggest “expanding the Codes of Fair Information Practice to protect privacy in this new data reality. An adapted understanding of the Codes of Fair Information Practice can promote individuals’ engagement with their own data, and apply not only to governments and corporations, but software developers creating the data collection programs of the 21st century.”
  • In order to achieve this change in approach, the paper discusses three foundational design principles: primacy of participants, data legibility, and engagement of participants throughout the data life cycle.

Selected Readings on Big Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of big data was originally published in 2014.

Big Data refers to the wide-scale collection, aggregation, storage, analysis and use of data. Government is increasingly in control of a massive amount of raw data that, when analyzed and put to use, can lead to new insights on everything from public opinion to environmental concerns. The burgeoning literature on Big Data argues that it generates value by: creating transparency; enabling experimentation to discover needs, expose variability, and improve performance; segmenting populations to customize actions; replacing/supporting human decision making with automated algorithms; and innovating new business models, products and services. The insights drawn from data analysis can also be visualized in a manner that passes along relevant information, even to those without the tech savvy to understand the data on its own terms (see The GovLab Selected Readings on Data Visualization).

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Australian Government Information Management Office. The Australian Public Service Big Data Strategy: Improved Understanding through Enhanced Data-analytics Capability Strategy Report. August 2013. http://bit.ly/17hs2xY.

  • This Big Data Strategy produced for Australian Government senior executives with responsibility for delivering services and developing policy is aimed at ingraining in government officials that the key to increasing the value of big data held by government is the effective use of analytics. Essentially, “the value of big data lies in [our] ability to extract insights and make better decisions.”
  • This positions big data as a national asset that can be used to “streamline service delivery, create opportunities for innovation, identify new service and policy approaches as well as supporting the effective delivery of existing programs across a broad range of government operations.”

Bollier, David. The Promise and Peril of Big Data. The Aspen Institute, Communications and Society Program, 2010. http://bit.ly/1a3hBIA.

  • This report captures insights from the 2009 Roundtable exploring uses of Big Data within a number of important consumer behavior and policy implication contexts.
  • The report concludes that, “Big Data presents many exciting opportunities to improve modern society. There are incalculable opportunities to make scientific research more productive, and to accelerate discovery and innovation. People can use new tools to help improve their health and well-being, and medical care can be made more efficient and effective. Government, too, has a great stake in using large databases to improve the delivery of government services and to monitor for threats to national security.
  • However, “Big Data also presents many formidable challenges to government and citizens precisely because data technologies are becoming so pervasive, intrusive and difficult to understand. How shall society protect itself against those who would misuse or abuse large databases? What new regulatory systems, private-law innovations or social practices will be capable of controlling anti-social behaviors–and how should we even define what is socially and legally acceptable when the practices enabled by Big Data are so novel and often arcane?”

Boyd, Danah and Kate Crawford. “Six Provocations for Big Data.” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. September 2011http://bit.ly/1jJstmz.

  • In this paper, Boyd and Crawford raise challenges to unchecked assumptions and biases regarding big data. The paper makes a number of assertions about the “computational culture” of big data and pushes back against those who consider big data to be a panacea.
  • The authors’ provocations for big data are:
    • Automating Research Changes the Definition of Knowledge
    • Claims to Objectivity and Accuracy are Misleading
    • Big Data is not always Better Data
    • Not all Data is Equivalent
    • Just Because it is accessible doesn’t make it ethical
    • Limited Access to Big Data creates New Digital Divide

The Economist Intelligence Unit. Big Data and the Democratisation of Decisions. October 2012. http://bit.ly/17MpH8L.

  • This report from the Economist Intelligence Unit focuses on the positive impact of big data adoption in the private sector, but its insights can also be applied to the use of big data in governance.
  • The report argues that innovation can be spurred by democratizing access to data, allowing a diversity of stakeholders to “tap data, draw lessons and make business decisions,” which in turn helps companies and institutions respond to new trends and intelligence at varying levels of decision-making power.

Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big Data: The Next Frontier for Innovation, Competition, and Productivity.  McKinsey & Company. May 2011. http://bit.ly/18Q5CSl.

  • This report argues that big data “will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, and that “leaders in every sector will have to grapple with the implications of big data.” 
  • The report offers five broad ways in which using big data can create value:
    • First, big data can unlock significant value by making information transparent and usable at much higher frequency.
    • Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance.
    • Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.
    • Fourth, big sophisticated analytics can substantially improve decision-making.
    • Finally, big data can be used to improve the development of the next generation of products and services.

The Partnership for Public Service and the IBM Center for The Business of Government. “From Data to Decisions II: Building an Analytics Culture.” October 17, 2012. https://bit.ly/2EbBTMg.

  • This report discusses strategies for better leveraging data analysis to aid decision-making. The authors argue that, “Organizations that are successful at launching or expanding analytics program…systematically examine their processes and activities to ensure that everything they do clearly connects to what they set out to achieve, and they use that examination to pinpoint weaknesses or areas for improvement.”
  • While the report features many strategies for government decisions-makers, the central recommendation is that, “leaders incorporate analytics as a way of doing business, making data-driven decisions transparent and a fundamental approach to day-to-day management. When an analytics culture is built openly, and the lessons are applied routinely and shared widely, an agency can embed valuable management practices in its DNA, to the mutual benet of the agency and the public it serves.”

TechAmerica Foundation’s Federal Big Data Commission. “Demystifying Big Data: A Practical Guide to Transforming the Business of Government.” 2013. http://bit.ly/1aalUrs.

  • This report presents key big data imperatives that government agencies must address, the challenges and the opportunities posed by the growing volume of data and the value Big Data can provide. The discussion touches on the value of big data to businesses and organizational mission, presents case study examples of big data applications, technical underpinnings and public policy applications.
  • The authors argue that new digital information, “effectively captured, managed and analyzed, has the power to change every industry including cyber security, healthcare, transportation, education, and the sciences.” To ensure that this opportunity is realized, the report proposes a detailed big data strategy framework with the following steps: define, assess, plan, execute and review.

World Economic Forum. “Big Data, Big Impact: New Possibilities for International Development.” 2012. http://bit.ly/17hrTKW.

  • This report examines the potential for channeling the “flood of data created every day by the interactions of billions of people using computers, GPS devices, cell phones, and medical devices” into “actionable information that can be used to identify needs, provide services, and predict and prevent crises for the benefit of low-income populations”
  • The report argues that, “To realise the mutual benefits of creating an environment for sharing mobile-generated data, all ecosystem actors must commit to active and open participation. Governments can take the lead in setting policy and legal frameworks that protect individuals and require contractors to make their data public. Development organisations can continue supporting governments and demonstrating both the public good and the business value that data philanthropy can deliver. And the private sector can move faster to create mechanisms for the sharing data that can benefit the public.”

Selected Readings on Data Visualization


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data visualization was originally published in 2013.

Data visualization is a response to the ever-increasing amount of  information in the world. With big data, informatics and predictive analytics, we have an unprecedented opportunity to revolutionize policy-making. Yet data by itself can be overwhelming. New tools and techniques for visualizing information can help policymakers clearly articulate insights drawn from data. Moreover, the rise of open data is enabling those outside of government to create informative and visually arresting representations of public information that can be used to support decision-making by those inside or outside governing institutions.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Duke, D.J., K.W. Brodlie, D.A. Duce and I. Herman. “Do You See What I Mean? [Data Visualization].” IEEE Computer Graphics and Applications 25, no. 3 (2005): 6–9. http://bit.ly/1aeU6yA.

  • In this paper, the authors argue that a more systematic ontology for data visualization to ensure the successful communication of meaning. “Visualization begins when someone has data that they wish to explore and interpret; the data are encoded as input to a visualization system, which may in its turn interact with other systems to produce a representation. This is communicated back to the user(s), who have to assess this against their goals and knowledge, possibly leading to further cycles of activity. Each phase of this process involves communication between two parties. For this to succeed, those parties must share a common language with an agreed meaning.”
  • That authors “believe that now is the right time to consider an ontology for visualization,” and “as visualization move from just a private enterprise involving data and tools owned by a research team into a public activity using shared data repositories, computational grids, and distributed collaboration…[m]eaning becomes a shared responsibility and resource. Through the Semantic Web, there is both the means and motivation to develop a shared picture of what we see when we turn and look within our own field.”

Friendly, Michael. “A Brief History of Data Visualization.” In Handbook of Data Visualization, 15–56. Springer Handbooks Comp.Statistics. Springer Berlin Heidelberg, 2008. http://bit.ly/17fM1e9.

  • In this paper, Friendly explores the “deep roots” of modern data visualization. “These roots reach into the histories of the earliest map making and visual depiction, and later into thematic cartography, statistics and statistical graphics, medicine and other fields. Along the way, developments in technologies (printing, reproduction), mathematical theory and practice, and empirical observation and recording enabled the wider use of graphics and new advances in form and content.”
  • Just as the general the visualization of data is far from a new practice, Friendly shows that the graphical representation of government information has a similarly long history. “The collection, organization and dissemination of official government statistics on population, trade and commerce, social, moral and political issues became widespread in most of the countries of Europe from about 1825 to 1870. Reports containing data graphics were published with some regularity in France, Germany, Hungary and Finland, and with tabular displays in Sweden, Holland, Italy and elsewhere.”

Graves, Alvaro and James Hendler. “Visualization Tools for Open Government Data.” In Proceedings of the 14th Annual International Conference on Digital Government Research, 136–145. Dg.o ’13. New York, NY, USA: ACM, 2013. http://bit.ly/1eNSoXQ.

  • In this paper, the authors argue that, “there is a gap between current Open Data initiatives and an important part of the stakeholders of the Open Government Data Ecosystem.” As it stands, “there is an important portion of the population who could benefit from the use of OGD but who cannot do so because they cannot perform the essential operations needed to collect, process, merge, and make sense of the data. The reasons behind these problems are multiple, the most critical one being a fundamental lack of expertise and technical knowledge. We propose the use of visualizations to alleviate this situation. Visualizations provide a simple mechanism to understand and communicate large amounts of data.”
  • The authors also describe a prototype of a tool to create visualizations based on OGD with the following capabilities:
    • Facilitate visualization creation
    • Exploratory mechanisms
    • Viralization and sharing
    • Repurpose of visualizations

Hidalgo, César A. “Graphical Statistical Methods for the Representation of the Human Development Index and Its Components.” United Nations Development Programme Human Development Reports, September 2010. http://bit.ly/166TKur.

  • In this paper for the United Nations Human Development Programme, Hidalgo argues that “graphical statistical methods could be used to help communicate complex data and concepts through universal cognitive channels that are heretofore underused in the development literature.”
  • To support his argument, representations are provided that “show how graphical methods can be used to (i) compare changes in the level of development experienced by countries (ii) make it easier to understand how these changes are tied to each one of the components of the Human Development Index (iii) understand the evolution of the distribution of countries according to HDI and its components and (iv) teach and create awareness about human development by using iconographic representations that can be used to graphically narrate the story of countries and regions.”

Stowers, Genie. “The Use of Data Visualization in Government.” IBM Center for The Business of Government, Using Technology Series, 2013. http://bit.ly/1aame9K.

  • This report seeks “to help public sector managers understand one of the more important areas of data analysis today — data visualization. Data visualizations are more sophisticated, fuller graphic designs than the traditional spreadsheet charts, usually with more than two variables and, typically, incorporating interactive features.”
  • Stowers also offers numerous examples of “visualizations that include geographical and health data, or population and time data, or financial data represented in both absolute and relative terms — and each communicates more than simply the data that underpin it. In addition to these many examples of visualizations, the report discusses the history of this technique, and describes tools that can be used to create visualizations from many different kinds of data sets.”