Melissa Jun Rowley at the Toolbox: “Though democratic governments are of the people, by the people, and for the people, it often seems that our only input is electing officials who pass laws on our behalf. After all, I don’t know many people who attend town hall meetings these days. But the evolution of technology has given citizens a new way to participate. Governments are using technology to include as many voices from their communities as possible in civic decisions and activities. Here are three examples.
Raleigh, NC
Raleigh North Carolina’s open government initiative is a great example of passive citizen engagement. By following an open source strategy, Open Raleigh has made city data available to the public. Citizens then use the data in a myriad of ways, from simply visualizing daily crime in their city, to creating an app that lets users navigate and interactively utilize the city’s greenway system.
Fort Smith, AR
Using MindMixer, Fort Smith Arkansas has created an online forum for residents to discuss the city’s comprehensive plan, effectively putting the community’s future in the hands of the community itself. Citizens are invited to share their own ideas, vote on ideas submitted by others, and engage with city officials that are “listening” to the conversation on the site.
Seattle, WA
Being a tech town, it’s no surprise that Seattle is using social media as a citizen engagement tool. The Seattle Police Department (SPD) uses a variety of social media tools to reach the public. In 2012, the department launched a first-of-its kind hyper-local twitter initiative. A police scanner for the twitter generation, Tweets by Beat provides twitter feeds of police dispatches in each of Seattle’s 51 police beats so that residents can find out what is happening right on their block.
In addition to Twitter and Facebook, SPD created a Tumblr to, in their own words, “show you your police department doing police-y things in your city.” In a nutshell, the department’s Tumblr serves as an extension of their other social media outlets. “
"Natural Cities" Emerge from Social Media Location Data
Emerging Technology From the arXiv: “Nobody agrees on how to define a city. But the emergence of “natural cities” from social media data sets may change that, say computational geographers…
A city is a large, permanent human settlement. But try and define it more carefully and you’ll soon run into trouble. A settlement that qualifies as a city in Sweden may not qualify in China, for example. And the reasons why one settlement is classified as a town while another as a city can sometimes seem almost arbitrary.
City planners know this problem well. They tend to define cities by administrative, legal or even historical boundaries that have little logic to them. Indeed, the same city can sometimes be defined in various different ways.
That causes all kinds of problems from counting the total population to working out who pays for the upkeep of the place. Which definition do you use?
Now help may be at hand thanks to the work of Bin Jiang and Yufan Miao at the University of Gävle in Sweden. These guys have found a way to use people’s location recorded by social media to define the boundaries of so-called natural cities which have a close resemblance to real cities in the US.
Jiang and Miao began with a dataset from the Brightkite social network, which was active between 2008 and 2010. The site encouraged users to log in with their location details so that they could see other users nearby. So the dataset consists of almost 3 million locations in the US and the dates on which they were logged.
To start off, Jiang and Miao simply placed a dot on a map at the location of each login. They then connected these dots to their neighbours to form triangles that end up covering the entire mainland US.
Next, they calculated the size of each triangle on the map and plotted this size distribution, which turns out to follow a power law. So there are lots of tiny triangles but only a few large ones.
Finally, the calculated the average size of the triangles and then coloured in all those that were smaller than average. The coloured areas are “natural cities”, say Jiang and Miao.
It’s easy to imagine that resulting map of triangles is of little value. But to the evident surprise of ther esearchers, it produces a pretty good approximation of the cities in the US. “We know little about why the procedure works so well but the resulting patterns suggest that the natural cities effectively capture the evolution of real cities,” they say.
That’s handy because it suddenly gives city planners a way to study and compare cities on a level playing field. It allows them to see how cities evolve and change over time too. And it gives them a way to analyse how cities in different parts of the world differ.
Of course, Jiang and Miao will want to find out why this approach reveals city structures in this way. That’s still something of a puzzle but the answer itself may provide an important insight into the nature of cities (or at least into the nature of this dataset).
A few days ago, this blog wrote about how a new science of cities is emerging from the analysis of big data. This is another example and expect to see more.
Ref: http://arxiv.org/abs/1401.6756 : The Evolution of Natural Cities from the Perspective of Location-Based Social Media”
New format of congressional documents creates transparency, opportunity
“This is the result of an effective ongoing collaboration between the Library and GPO to provide legislative information in modern, widely used formats,” said Librarian of Congress James Billington. “We are pleased the popular bill summaries, which provide objective descriptions of complex legislative text, are now available through the Federal Digital System.”
The movement for more transparent government data gained traction in the House Appropriations Committee and its support of the task force on bulk data established by the House.
House bills, the Federal Register, the Code of Federal Regulations as well as various executive branch documents are currently available in XML downloadable format, but this latest development is different. These bill summaries are prepared by LOC’s Congressional Research Service and describe the key provisions of a piece of legislation, and explain the potential implications the legislation may have on current federal programs and laws.”
Open data: Strategies for impact
Important though these considerations are, they miss what should be an obvious and more profound alternative.
Right now, organisations like DataKind™ and Periscopic, and many other entrepreneurs, innovators and established social enterprises that use open data, see things differently. They are using these straplines to shake up the status quo, to demonstrate that data-driven businesses can do well by doing good.
And it’s the confluence of the many national and international open data initiatives, and the growing number of technically able, socially responsible organisations that provide the opportunity for social as well as economic growth. The World Wide Web Foundation now estimates that there are over 370 open data initiatives around the world. Collectively, and through portals such as Quandl and and datacatalogs.org, these initiatives have made a staggering quantity of data available – in excess of eight million data sets. In addition, several successful and data-rich companies are entering into a new spirit of philanthropy – by donating their data for the public good. There’s no doubt that opening up data signals a new willingness by governments and businesses all over the world to engage with their citizens and customers in a new and more transparent way.
The challenge, though, is ensuring that these popular national and international open data initiatives are cohesive and impactful. And that the plans drawn up by public sector bodies to release specific data sets are based on the potential the data has to achieve a beneficial outcome, not – or, at least, not solely – based on the cost or ease of publication. Despite the best of intentions, only a relatively small proportion of open data sets now available has the latent potential to create significant economic or social impact. In our push to open up data and government, it seems that we may have fallen into the trap of believing the ends are the same as the means; that effect is the same as cause…”
How Open Data Are Turned into Services?
New Paper by Muriel Foulonneau, Sébastien Martin, Slim Turki: “The Open Data movement has mainly been a data provision movement. The release of Open Data is usually motivated by (i) government transparency (citizen access to government data), (ii) the development of services by third parties for the benefit for citizens and companies (typically smart city approach), or (iii) the development of new services that stimulate the economy. The success of the Open Data movement and its return on investment should therefore be assessed among other criteria by the number and impact of the services created based on those data. In this paper, we study the development of services based on open data and means to make the data opening process more effective.”
Give the Data to the People
Harlan Krumholz in the New York Times: “LAST week, Johnson & Johnson announced that it was making all of its clinical trial data available to scientists around the world. It has hired my group, Yale University Open Data Access Project, or YODA, to fully oversee the release of the data. Everything in the company’s clinical research vaults, including unpublished raw data, will be available for independent review.
This is an extraordinary donation to society, and a reversal of the industry’s traditional tendency to treat data as an asset that would lose value if exposed to public scrutiny.
Today, more than half of the clinical trials in the United States, including many sponsored by academic and governmental institutions, are not published within two years of their completion. Often they are never published at all. The unreported results, not surprisingly, are often those in which a drug failed to perform better than a placebo. As a result, evidence-based medicine is, at best, based on only some of the evidence. One of the most troubling implications is that full information on a drug’s effects may never be discovered or released.
Even when studies are published, the actual data are usually not made available. End users of research — patients, doctors and policy makers — are implicitly told by a single group of researchers to “take our word for it.” They are often forced to accept the report without the prospect of other independent scientists’ reproducing the findings — a violation of a central tenet of the scientific method.
To be fair, the decision to share data is not easy. Companies worry that their competitors will benefit, that lawyers will take advantage, that incompetent scientists will misconstrue the data and come to mistaken conclusions. Researchers feel ownership of the data and may be reluctant to have others use it. So Johnson & Johnson, as well as companies like GlaxoSmithKline and Medtronic that have made more cautious moves toward transparency, deserve much credit. The more we share data, however, the more we find that many of these problems fail to materialize….
This program doesn’t mean that just anyone can gain access to the data without disclosing how they intend to use it. We require those who want the data to submit a proposal and identify their research team, funding and any conflicts of interest. They have to complete a short course on responsible conduct and sign an agreement that restricts them to their proposed research question. Most important, they must agree to share whatever they find. And we exclude applicants who seek data for commercial or legal purposes. Our intent is not to be tough gatekeepers, but to ensure that the data are used in a transparent way and contribute to overall scientific knowledge.
There are many benefits to this kind of sharing. It honors the contributions of the subjects and scientists who participated in the research. It is proof that an organization, whether it is part of industry or academia, wants to play a role as a good global citizen. It demonstrates that the organization has nothing to hide. And it enables scientists to use the data to learn new ways to help patients. Such an approach can even teach a company like Johnson & Johnson something it didn’t know about its own products.
For the good of society, this is a breakthrough that should be replicated throughout the research world.”
Why SayIt is (partly) a statement about the future of Open Data
Tom Steinberg from MySociety: “This is where SayIt comes in, as an example of a relatively low-cost approach to making sure that the next generation of government IT systems do produce Open Data.
SayIt is a newly launched open source tool for publishing transcripts of trials, debates, interviews and so on. It publishes them online in a way that matches modern expectations about how stuff should work on the web – responsive, searchable and so on. It’s being built as a Poplus Component, which means it’s part of an international network of groups collaborating on shared technologies. Here’s JK Rowling being interviewed, published via SayIt.
But how does this little tool relate to the business of getting governments to release more Open Data? Well, SayIt isn’t just about publishing data, it’s about making it too – in a few months we’ll be sharing an authoring interface for making new transcripts from whatever source a user has access to.
We hope that having iterated and improved this authoring interface, SayIt can become the tool of choice for public sector transcribers, replacing whatever tool they use today (almost certainly Word). Then, if they use SayIt to make a transcript, instead of Word, then it will produce new, instantly-online Open Data every time they use it….
But we can’t expect the public sector to use a tool like SayIt to make new Open Data unless it is cheaper, better and less burdensome than whatever they’re using now. We can’t – quite simply – expect to sell government procurement officers a new product mainly on the virtues of Open Data. This means the tough task of persuading government employees that there is a new tool that is head-and-shoulders better than Excel or Word for certain purposes: formidable, familiar products that are much better than their critics like to let on.
So in order for SayIt to replace the current tools used by any current transcriber, it’s going to have to be really, really good. And really trustworthy. And it’s going to have to be well marketed. And that’s why we’ve chosen to build SayIt as an international, open source collaboration – as a Poplus Component. Because we think that without the billions of dollars it takes to compete with Microsoft, our best hope is to develop very narrow tools that do 0.01% of what Word does, but which do that one thing really really well. And our key strategic advantage, other than the trust that comes with Open Source and Open Standards, is the energy of the global civic hacking and government IT reform sector. SayIt is far more likely to succeed if it has ideas and inputs from contributors from around the world.
Regardless of whether or not SayIt ever succeeds in penetrating inside governments, this post is about an idea that such an approach represents. The idea is that people can advance the Open Data agenda not just by lobbying, but also by building and popularising tools that mean that data is born open in the first place. I hope this post will encourage more people to work on such tools, either on your own, or via collaborations like Poplus.”
Selected Readings on Personal Data: Security and Use
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of personal data was originally published in 2014.
Advances in technology have greatly increased the potential for policymakers to utilize the personal data of large populations for the public good. However, the proliferation of vast stores of useful data has also given rise to a variety of legislative, political, and ethical concerns surrounding the privacy and security of citizens’ personal information, both in terms of collection and usage. Challenges regarding the governance and regulation of personal data must be addressed in order to assuage individuals’ concerns regarding the privacy, security, and use of their personal information.
Selected Reading List (in alphabetical order)
- Ann Cavoukian – Personal Data Ecosystem (PDE) – A Privacy by Design Approach to an Individual’s Pursuit of Radical Control – a paper describing the emerging framework of technologies enabling individuals to hold greater control of their data.
- T. Kirkham, S. Winfield, S. Ravet and S. Kellomaki – A Personal Data Store for an Internet of Subjects – a paper arguing for a shift from the current service-oriented data control architecture to a system centered on individuals controlling their data.
- OECD – The 2013 OECD Privacy Guidelines – a privacy framework built around eight core principles.
- Paul Ohm – Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization – a paper revealing the ease with which researchers have been able to reattach supposedly anonymized personal data to individuals.
- Jules Polonetsky and Omer Tene – Privacy in the Age of Big Data: A Time for Big Decisions – a paper proposing the development of a risk-value matrix for personal data in the big data era.
- Katie Shilton, Jeff Burke, Deborah Estrin, Ramesh Govindan, Mark Hansen, Jerry Kang, and Min Mun – Designing the Personal Data Stream: Enabling Participatory Privacy in Mobile Personal Sensing – a paper arguing for a reimagined system for protecting the privacy of personal data, moving past the out-of-date Codes of Fair Information Practice.
Annotated Selected Reading List (in alphabetical order)
Cavoukian, Ann. “Personal Data Ecosystem (PDE) – A Privacy by Design Approach to an Individual’s Pursuit of Radical Control.” Privacy by Design, October 15, 2013. https://bit.ly/2S00Yfu.
- In this paper, Cavoukian describes the Personal Data Ecosystem (PDE), an “emerging landscape of companies and organizations that believe individuals should be in control of their personal data, and make available a growing number of tools and technologies to enable this control.” She argues that, “The right to privacy is highly compatible with the notion of PDE because it enables the individual to have a much greater degree of control – “Radical Control” – over their personal information than is currently possible today.”
- To ensure that the PDE reaches its privacy-protection potential, Cavouckian argues that it must practice The 7 Foundational Principles of Privacy by Design:
- Proactive not Reactive; Preventative not Remedial
- Privacy as the Default Setting
- Privacy Embedded into Design
- Full Functionality – Positive-Sum, not Zero-Sum
- End-to-End Security – Full Lifecycle Protection
- Visibility and Transparency – Keep it Open
- Respect for User Privacy – Keep it User-Centric
Kirkham, T., S. Winfield, S. Ravet, and S. Kellomaki. “A Personal Data Store for an Internet of Subjects.” In 2011 International Conference on Information Society (i-Society). 92–97. http://bit.ly/1alIGuT.
- This paper examines various factors involved in the governance of personal data online, and argues for a shift from “current service-oriented applications where often the service provider is in control of the person’s data” to a person centric architecture where the user is at the center of personal data control.
- The paper delves into an “Internet of Subjects” concept of Personal Data Stores, and focuses on implementation of such a concept on personal data that can be characterized as either “By Me” or “About Me.”
- The paper also presents examples of how a Personal Data Store model could allow users to both protect and present their personal data to external applications, affording them greater control.
OECD. The 2013 OECD Privacy Guidelines. 2013. http://bit.ly/166TxHy.
- This report is indicative of the “important role in promoting respect for privacy as a fundamental value and a condition for the free flow of personal data across borders” played by the OECD for decades. The guidelines – revised in 2013 for the first time since being drafted in 1980 – are seen as “[t]he cornerstone of OECD work on privacy.”
- The OECD framework is built around eight basic principles for personal data privacy and security:
- Collection Limitation
- Data Quality
- Purpose Specification
- Use Limitation
- Security Safeguards
- Openness
- Individual Participation
- Accountability
Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” UCLA Law Review 57, 1701 (2010). http://bit.ly/18Q5Mta.
- This article explores the implications of the “astonishing ease” with which scientists have demonstrated the ability to “reidentify” or “deanonmize” supposedly anonymous personal information.
- Rather than focusing exclusively on whether personal data is “anonymized,” Ohm offers five factors for governments and other data-handling bodies to use for assessing the risk of privacy harm: data-handling techniques, private versus public release, quantity, motive and trust.
Polonetsky, Jules and Omer Tene. “Privacy in the Age of Big Data: A Time for Big Decisions.” Stanford Law Review Online 64 (February 2, 2012): 63. http://bit.ly/1aeSbtG.
- In this article, Tene and Polonetsky argue that, “The principles of privacy and data protection must be balanced against additional societal values such as public health, national security and law enforcement, environmental protection, and economic efficiency. A coherent framework would be based on a risk matrix, taking into account the value of different uses of data against the potential risks to individual autonomy and privacy.”
- To achieve this balance, the authors believe that, “policymakers must address some of the most fundamental concepts of privacy law, including the definition of ‘personally identifiable information,’ the role of consent, and the principles of purpose limitation and data minimization.”
Shilton, Katie, Jeff Burke, Deborah Estrin, Ramesh Govindan, Mark Hansen, Jerry Kang, and Min Mun. “Designing the Personal Data Stream: Enabling Participatory Privacy in Mobile Personal Sensing”. TPRC, 2009. http://bit.ly/18gh8SN.
- This article argues that the Codes of Fair Information Practice, which have served as a model for data privacy for decades, do not take into account a world of distributed data collection, nor the realities of data mining and easy, almost uncontrolled, dissemination.
- The authors suggest “expanding the Codes of Fair Information Practice to protect privacy in this new data reality. An adapted understanding of the Codes of Fair Information Practice can promote individuals’ engagement with their own data, and apply not only to governments and corporations, but software developers creating the data collection programs of the 21st century.”
- In order to achieve this change in approach, the paper discusses three foundational design principles: primacy of participants, data legibility, and engagement of participants throughout the data life cycle.
Big Data, Privacy, and the Public Good
Forthcoming book and website by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum (editors): “The overarching goal of the book is to identify ways in which vast new sets of data on human beings can be collected, integrated, and analysed to improve evidence based decision making while protecting confidentiality. …
Massive amounts of new data on human beings can now be accessed and analyzed. Much has been made of the many uses of such data for pragmatic purposes, including selling goods and services, winning political campaigns, and identifying possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use new forms of data to do research that improves the lives of human beings, federal, state and local governments can use data to improve services and reduce taxpayer costs and public organizations can use information to advocate for public causes.
Much has also been made of the privacy and confidentiality issues associated with access. A survey of statisticians at the 2013 Joint Statistical Meeting found that the majority thought consumers should worry about privacy issues, and that an ethical framework should be in place to guide data scientists. Yet there are many unanswered questions. What are the ethical and legal requirements for scientists and government officials seeking to serve the public good without harming individual citizens? What are the rules of engagement? What are the best ways to provide access while protecting confidentiality? Are there reasonable mechanisms to compensate citizens for privacy loss?
The goal of this book is to answer some of these questions. The book’s authors paint an intellectual landscape that includes the legal, economic and statistical context necessary to frame the many privacy issues, including the value to the public of data access. The authors also identify core practical approaches that use new technologies to simultaneously maximize the utility of data access while minimizing information risk. As is appropriate for such a new and evolving field, each chapter also identifies important questions that require future research.
The work in this book is also intended to be accessible to an audience broader than the academy. In addition to informing the public, we hope that the book will be useful to people trying to provide data access but protect confidentiality in the roles as data custodians for federal, state and local agencies, or decision makers on institutional review boards.”
Visual Insights: A Practical Guide to Making Sense of Data
New book by Katy Börner and David E. Polley: “In the age of Big Data, the tools of information visualization offer us a macroscope to help us make sense of the avalanche of data available on every subject. This book offers a gentle introduction to the design of insightful information visualizations. It is the only book on the subject that teaches nonprogrammers how to use open code and open data to design insightful visualizations. Readers will learn to apply advanced data mining and visualization techniques to make sense of temporal, geospatial, topical, and network data.
Visual Insights will be an essential resource on basic information visualization techniques for scholars in many fields, students, designers, or anyone who works with data.”
Check out also the Information Visualization MOOC at http://ivmooc.cns.iu.edu/