Smithsonian turns to crowdsourcing for massive digitization project


PandoDaily: “There are 5 million plant specimens in the US Herbarium at the Natural History Museum’s Botany Department, one of the most extensive collections of plant life in the world. They all have labels. But only 1.3 million of those labels can be read by computers. That’s where you come in.

Jason Shen and Sarah Allen, a pair of Presidential Innovation Fellows working with the Smithsonian Institute to improve its open data initiatives, have gone all Mechanical Turk on the esteemed knowledge network.

In a pilot project that is serving as a test run for other large Smithsonian scientific collections – accouning for a total of about 126 million specimens – the innovation fellows are crowdsourcing the transcription of scanned images of the labels.

To get involved, you don’t need to commit to a certain number of hours, or make yourself available at specific times. You just log into the Smithsonian’s recently established transcription site, select a project to work on, and start transcribing. Different volunteers can work on the same project at different times. When you’ve done your bit, you submit it for review, at which point a different volunteer comes in to check to see that you’ve done the transcription correctly.

 So, for instance, you might get to look at specimens collected by Martin W. Gorman on his 1902 expedition to Alaska’s Lake Iliamna Region, and read his thoughts on his curious findings. If you’re the type to get excited by a bit of vintage potentilla fruitcosa, then this is your Disneyland.

It’s the sort of crowdsourcing initiative that has been going on for years in other corners of the Internet, but the Smithsonian is only just getting going. It has long thought of itself as passer-on of knowledge – its mission is “the increase and diffusion of knowledge” – with the public as inherent recipients rather than contributors, so the “let’s get everyone to help us with this gargantuan task” mentality has not been its default position. It does rely on a lot of volunteers to lead tours and maintain back rooms, and the likes, but organizing knowledge is another thing…

Shen and Allen quietly launched the Smithsonian Transcription Center in August as part of a wider effort to digitize all of the Institute’s collections. The Herbarium effort is one of the most significant to date, but other projects have included field notes of bird observations to letters written between 20th-century American artists. More than 1,400 volunteers have contributed to the projects to date, accounting for more than 18,000 transcriptions.”

Transparency 2.0: The Fundamentals of Online Open Government


White Paper by Granicus: “Open government is about building transparency, trust, and engagement with the public. Today, with 80% of the North American public on the Internet, it is becoming increasingly clear that building open government starts online. Transparency 2.0 not only provides public information, but also develops civic engagement, opens the decision-making process online, and takes advantage of today’s technology trends.
Citizen ideation & feedback. While open data comprised much of what online transparency used to be, today, government agencies have expanded openness to include public records, legislative data, decision-making workflow, and citizen ideation and feedback.
This paper outlines the principles of Transparency 2.0, the fundamentals and best practices for creating the most advanced and comprehensive online open government that over a thousand state, federal, and local government agencies are now using to reduce information requests, create engagement, and improve efficiency.”

13 ways to unlock the potential of open government


The Guardian: “Nine experts offer their thoughts on making open data initiatives work for all citizens…
Tiago Peixoto, open government specialist, The World Bank, Washington DC, US. @participatory
Open data is an enabler – not a guarantee – of good participation: Participation implies creating legitimate channels of communication between citizens and governments, and opening up data does not create that channel. We need to consider which structures enable us to know about citizens’ needs and preferences.
Both governments and civil society are responsible for connecting governments to the people: If we assume institutional or regulatory reforms are needed, then clearly governments (at both the legislative and executive level) should take a big part of the responsibility. After that, it is civil society’s role (and individual citizens) to further promote and strengthen those institutions….
Ben Taylor, open data consultant, Twaweza, UK and Tanzania. @mtega
We need to put people before data: The OGP Summit raised some interesting questions on open data and open government in developing countries. In a particular session discussing how to harness data to drive citizens engagement, the consensus was that this was the wrong way around. It should instead be reversed, putting the real, everyday needs of citizens first, and then asking how can we use data to help meet these.
Open government is not all about technology: Often people assume that open government means technology, but I think that’s wrong. For me, open government is a simple idea: it’s about making the nuts and bolts of how government works visible to citizens. Even open data isn’t always just about technology, for example postings on noticeboards and in newspapers are also valuable. Technology has a lot to offer, but it has limitations as well…
Juan M Casanueva, director, SocialTIC, Mexico City, Mexico. @jm_casanueva
Closed working cultures stifle open government initiatives: It is interesting to think about why governments struggle to open up. While closed systems tend to foster corruption and other perverse practices, most government officials also follow a pre-established closed culture that has become ingrained in their working practices. There are sometimes few incentives and high risks for government officials that want to make career in the public service and some also lack capacities to handle technology and citizen involvement. It is very interesting to see government officials that overcome these challenges actually benefiting politically for doing innovative citizen-centered actions. Unfortunately, that is too much of a risk at higher levels of government.
NGOs in Mexico are leading the way with access to information and citizen involvement: Sonora Ciudana recently opened the state’s health payroll and approached the public staff so that they could compare what they earn with the state expense reports. Pacto por Juarez has created grassroots transparency and accountability schools and even have a bus tour that goes around the city explaining the city’s budget and how it is being spent….”

Index: Trust in Institutions


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on trust in institutions and was originally published in 2013.

Trust in Government

  • How many of the global public feel that their governments listen to them: 17%
  • How much of the global population trusts in institutions: almost half 
  • The number of Americans who trust institutions: less than half
  • How many people globally believe that business leaders and government officials will tell the truth when confronted with a difficult issue: Less than one-fifth
  • The average level of confidence amongst citizens in 25 OECD countries:
    • In national government: 40%, down from 45% in 2007
    • In financial institutions: 43%
    • In public services such as local police and healthcare: 72% and 71% respectively

Executive Government

  • How many Americans trust the government in Washington to do what is right “just about always or most of the time” in September 2013: 19%
  • Those who trust the “men and women … who either hold or are running for public office”: 46%
  • Number of Americans who express a great deal or fair amount of trust in:
    • Local government: 71%
    • State government: 62%
    • Federal government: 52%
  • How many Americans trust in the ability of “the American people” to make judgments about political issues facing the country:  61%, declining every year since 2009
  • Those who have trust and confidence in the federal government’s ability to handle international problems: 49%
  • Number of Americans who feel “angry” at the federal government: 3 in 10, all-time high since first surveyed in 1997

Congress

  • Percentage of Americans who say “the political system can work fine, it’s the members of Congress that are the problem” in October 2013: 58%
  • Following the government shutdown, number of Americans who stated that Congress would work better if nearly every member was replaced next year: nearly half
  • Those who think that even an entire overhaul of Congress would not make much difference: 4 in 10 
  • Those who think that “most members of Congress have good intentions, it’s the political system that is broken” in October 2013: 32%

Trust in Media

  • Global trust in media (traditional, social, hybrid, owned, online search): 57% and rising
  • The percentage of Americans who say they have “a great deal or fair amount of trust and confidence in the mass media”: 44% – the lowest level since first surveyed in 1997
  • How many Americans see the mass media as too liberal: 46%
    • As too conservative: 13%
    • As “just about right”: 37%
  • The number of Americans who see the press as fulfilling the role of political watchdog and believe press criticism of political leaders keeps them from doing things that should not be done: 68%
  • The proportion of Americans who have “only a little/not at all” level of trust in Facebook to protect privacy and personal information: three in four
    • In Google: 68%
    • In their cell phone provider: 63%

Trust in Industry

  • Global trust in business: 58%
  • How much of the global public trusts financial institutions: 50%
  • Proportion of the global public who consider themselves informed about the banking scandals: more than half
  • Of those, how many Americans report they now trust banks less: almost half
  • Number of respondents globally who say they trust tech companies to do what’s right: 77%, most trusted industry
  • Number of consumers across eight markets who were “confident” or “somewhat confident” that the tech sector can provide long-term solutions to meet the world’s toughest challenges: 76%

Sources

What future do you want? Commission invites votes on what Europe could look like in 2050 to help steer future policy and research planning


European Commission – MEMO: “Vice-President Neelie Kroes, responsible for the Digital Agenda, is inviting people to join a voting and ranking process on 11 visions of what the world could look like in 20-40 years. The Commission is seeking views on living and learning, leisure and working in Europe in 2050, to steer long-term policy or research planning.
The visions have been gathered over the past year through the Futurium, an online debate platform that allows policymakers to not only consult citizens, but to collaborate and “co-create” with them, and at events throughout Europe. Thousands of thinkers – from high school students, to the Erasmus Students Network; from entrepreneurs and internet pioneers to philosophers and university professors, have engaged in a collective inquiry – a means of crowd-sourcing what our future world could look like.
Eleven over-arching themes have been drawn together from more than 200 ideas for the future. From today, everyone is invited to join the debate and offer their rating and rankings of the various ideas. The results of the feedback will help the European Commission make better decisions about how to fund projects and ideas that both shape the future and get Europe ready for that future….
The Futurium is a foresight project run by DG CONNECT, based on an open source approach. It develops visions of society, technologies, attitudes and trends in 2040-2050 and use these, for example as potential blueprints for future policy choices or EU research and innovation funding priorities.
It is an online platform developed to capture emerging trends and enable interested citizens to co-create compelling visions of the futures that matter to them.

This crowd-sourcing approach provides useful insights on:

  1. vision: where people want to go, how desirable and likely are the visions posted on the platform;
  2. policy ideas: what should ideally be done to realise the futures; the possible impacts and plausibility of policy ideas;
  3. evidence: scientific and other evidence to support the visions and policy ideas.

….
Connecting policy making to people: in an increasingly connected society, online outreach and engagement is an essential response to the growing demand for participation, helping to capture new ideas and to broaden the legitimacy of the policy making process (IP/10/1296). The Futurium is an early prototype of a more general policy-making model described in the paper “The Futurium—a Foresight Platform for Evidence-Based and Participatory Policymaking“.

The Futurium was developed to lay the groundwork for future policy proposals which could be considered by the European Parliament and the European Commission under their new mandates as of 2014. But the Futurium’s open, flexible architecture makes it easily adaptable to any policy-making context, where thinking ahead, stakeholder participation and scientific evidence are needed.”

Concerns about opening up data, and responses which have proved effective


Google doc by Christopher Gutteridge, University of Southampton and Alexander Dutton, University of Oxford:  “This document is inspired by the open data excuses bingo card. Someone asked for what responses have proved effective. This document is a work in progress based on our experience. Carly Strasser has also written at the Data Pub blog about these issues from an Open Science and research data perspective. You may also be interested in How to make a business case for open data, published by the ODI.
We’ll get spam…
Terrorists might use the data…
People will contact us to ask about stuff…
People will misinterpret the data…
It’s too big…
It’s not very interesting…
We might want to use it in a research paper…
There’s no API to that system…
We’re worried about the Data Protection Act…
We’re not sure that we own it…
I don’t mind making it open, but I worry someone else might object…
It’s too complicated…
Our data is embarrassingly bad…
It’s not a priority and we’re busy…
Our lawyers want to make a custom license…
It changes too quickly…
There’s already a project in progress which sounds similar…
Some of what you asked for is confidential…
I don’t own the data, so can’t give you permission…
We don’t have that data…
That data is already published via (external organisation X)….
We can’t provide that dataset because one part is not possible…
What if something breaks and the open version becomes out of date?…
We can’t see the benefit…
What if we want to sell access to this data…?
If we publish this data, people might sue us…
We want people to come direct to us so we know why they want the data…

Findings from the emerging field of Transparency Research


Tiago Peixoto: “HEC Paris has just hosted the 3rd Global Conference on Transparency Research, and they have made the list of accepted papers available. …
As one goes through the papers,  it is clear that unlike most of the open government space, when it comes to research, transparency is treated less as a matter of technology and formats and more as a matter of social and political institutions.  And that is a good thing.”
This year’s papers are listed below:

What's Different that Makes Open Data an Infrastructure?


Article by Christopher Thomas: “It wasn’t too long ago that governments remained pretty guarded with their data. It really did not matter who the data steward was, as each discipline had its “reasons” for keeping data out of the hands of others…
The mapping and GIS industry was no stranger to the resistance to open data.  However the concerns were slightly different than the governments’ concerns.  Perhaps this was due to the time, effort, and money required to develop the data by staff.  Mapping and GIS fought a valiant battle that this data was not information subject to the Freedom of Information Act, but rather an asset subject to different rules of funding and cost recovery.
Recently, attitudes have been changing as mapping and GIS data are being looked at as more of an infrastructure, because governments now see the importance of including it as part of their daily operation. …
So what’s different today? Well, governments can avoid data dumps that leave important members of your team wondering how the data is being used.  Or better yet, wondering how many times your old data has been exchanged and used without new or updated data being considered. You later learn that someone has used your old data on a project that has come back to haunt you.  The major difference today is that there is an ability to extend this infrastructure as a web service.  If you publish current data on websites or portals, data can now be downloaded for use in various products or connected to apps. As Mark Head, chief data officer for the city of Philadelphia puts it, “web services are the ‘secret sauce’ to open data.” Governments can simply extend map and GIS data for adoption by business startups and civic hackers, for example, with the confidence that current data is being used.”

Open Government and Its Constraints


Blog entry by Panthea Lee: “Open government” is everywhere. Search the term and you’ll find OpenGovernment.orgOpenTheGovernment.orgOpen Government InitiativeOpen Gov Hub and the Open Gov Foundation; you’ll find open government initiatives for New York CityBostonKansasVirginiaTennessee and the list goes on; you’ll find dedicated open government plans for the White HouseState DepartmentUSAIDTreasuryJustice DepartmentCommerceEnergy and just about every other major federal agency. Even the departments of Defense and Homeland Security are in on open government.
And that’s just in the United States.
There is Open Government AfricaOpen Government in the EU and Open Government Data. The World Bank has an Open Government Data Toolkit and recently announced a three-year initiative to help developing countries leverage open data. And this week, over 1,000 delegates from over 60 countries are in London for the annual meeting of the Open Government Partnership, which has grown from 8 to 60 member states in just two years….
Many of us have no consensus or clarity on just what exactly “open government” iswhat we hope to achieve from it or how to measure our progress. Too often, our initiatives are designed through the narrow lenses of our own biases and without a concrete understanding of those they are intended for — both those in and out of government.
If we hope to realize the promise of more open governments, let’s be clear about the barriers we face so that we may start to overcome them.
Barrier 1: “Open Gov” is…?
Open government is… not new, for starters….
Barrier 2: Open Gov is Not Inclusive
The central irony of open government is that it’s often not “open” at all….
Barrier 3: Open Gov Lacks Empathy
Open government practitioners love to speak of “the citizen” and “the government.” But who exactly are these people? Too often, we don’t really know. We are builders, makers and creators with insufficient understanding of whom we are building, making and creating for…On the flip side, who do we mean by “the government?” And why, gosh darn it, is it so slow to innovate? Simply put, “the government” is comprised of individual people working in environments that are not conducive to innovation….
For open government to realize its potential, we must overcome these barriers.”

Selected Readings on Linked Data and the Semantic Web


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of linked data and the semantic web was originally published in 2013.

Linked Data and the Semantic Web movement are seeking to make our growing body of digital knowledge and information more interconnected, searchable, machine-readable and useful. First introduced by the W3C, Sir Tim Berners-Lee, Christian Bizer and Tom Heath define Linked Data as “data published to the Web in such a way that it is machine-readable, its meaning is explicitly defined, it is linked to other external data sets, and can in turn be linked to from external datasets.” In other words, Linked Data and the Semantic Web seek to do for data what the Web did for documents. Additionally, the evolving capability of linking together different forms of data is fueling the potentially transformative rise of social machines – “processes in which the people do the creative work and the machine does the administration.”

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Alani, Harith, David Dupplaw, John Sheridan, Kieron O’Hara, John Darlington, Nigel Shadbolt, and Carol Tullo. “Unlocking the Potential of Public Sector Information with Semantic Web Technology,” 2007. http://bit.ly/17fMbCt.

  • This paper explores the potential of using Semantic Web technology to increase the value of public sector information already in existence.
  • The authors note that, while “[g]overnments often hold very rich data and whilst much of this information is published and available for re-use by others, it is often trapped by poor data structures, locked up in legacy data formats or in fragmented databases. One of the great benefits that Semantic Web (SW) technology offers is facilitating the large scale integration and sharing of distributed data sources.”
  • They also argue that Linked Data and the Semantic Web are growing in use and visibility in other sectors, but government has been slower to adapt: “The adoption of Semantic Web technology to allow for more efficient use of data in order to add value is becoming more common where efficiency and value-added are important parameters, for example in business and science. However, in the field of government there are other parameters to be taken into account (e.g. confidentiality), and the cost-benefit analysis is more complex.” In spite of that complexity, the authors’ work “was intended to show that SW technology could be valuable in the governmental context.”

Berners-Lee, Tim, James Hendler, and Ora Lassila. “The Semantic Web.” Scientific American 284, no. 5 (2001): 28–37. http://bit.ly/Hhp9AZ.

  • In this article, Sir Tim Berners-Lee, James Hendler and Ora Lassila introduce the Semantic Web, “a new form of Web content that is meaningful to computers [and] will unleash a revolution of new possibilities.”
  • The authors argue that the evolution of linked data and the Semantic Web “lets anyone express new concepts that they invent with minimal effort. Its unifying logical language will enable these concepts to be progressively linked into a universal Web. This structure will open up the knowledge and workings of humankind to meaningful analysis by software agents, providing a new class of tools by which we can live, work and learn together.”

Bizer, Christian, Tom Heath, and Tim Berners-Lee. “Linked Data – The Story So Far.” International Journal on Semantic Web and Information Systems (IJSWIS) 5, no. 3 (2009): 1–22. http://bit.ly/HedpPO.

  • In this paper, the authors take stock of Linked Data’s challenges, potential and successes close to a decade after its introduction. They build their argument for increasingly linked data by referring to the incredible value creation of the Web: “Despite the inarguable benefits the Web provides, until recently the same principles that enabled the Web of documents to flourish have not been applied to data.”
  • The authors expect that “Linked Data will enable a significant evolutionary step in leading the Web to its full potential” if a number of research challenges can be adequately addressed, both technical, like interaction paradigms and data fusion; and non-technical, like licensing, quality and privacy.

Ding, Li, Dominic Difranzo, Sarah Magidson, Deborah L. Mcguinness, and Jim Hendler. Data-Gov Wiki: Towards Linked Government Data, n.d. http://bit.ly/1h3ATHz.

  • In this paper, the authors “investigate the role of Semantic Web technologies in converting, enhancing and using linked government data” in the context of Data-gov Wiki, a project that attempts to integrate datasets found at Data.gov into the Linking Open Data (LOD) cloud.
  • The paper features discussion and “practical strategies” based on four key issue areas: Making Government Data Linkable, Linking Government Data, Supporting the Use of Linked Government Data and Preserving Knowledge Provenance.

Kalampokis, Evangelos, Michael Hausenblas, and Konstantinos Tarabanis. “Combining Social and Government Open Data for Participatory Decision-Making.” In Electronic Participation, edited by Efthimios Tambouris, Ann Macintosh, and Hans de Bruijn, 36–47. Lecture Notes in Computer Science 6847. Springer Berlin Heidelberg, 2011. http://bit.ly/17hsj4a.

  • This paper presents a proposed data architecture for “supporting participatory decision-making based on the integration and analysis of social and government data.” The authors believe that their approach will “(i) allow decision makers to understand and predict public opinion and reaction about specific decisions; and (ii) enable citizens to inadvertently contribute in decision-making.”
  • The proposed approach, “based on the use of the linked data paradigm,” draws on subjective social data and objective government data in two phases: Data Collection and Filtering and Data Analysis. “The aim of the former phase is to narrow social data based on criteria such as the topic of the decision and the target group that is affected by the decision. The aim of the latter phase is to predict public opinion and reactions using independent variables related to both subjective social and objective government data.”

Rady, Kaiser. Publishing the Public Sector Legal Information in the Era of the Semantic Web. SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, 2012. http://bit.ly/17fMiOp.

  • Following an EU directive calling for the release of public sector information by member states, this study examines the “uniqueness” of creating and publishing primary legal source documents on the web and highlights “the most recent technological strategy used to structure, link and publish data online (the Semantic Web).”
  • Rady argues for public sector legal information to be published as “open-linked-data in line with the new approach for the web.” He believes that if data is created and published in this form, “the data will be more independent from devices and applications and could be considered as a component of [a] big information system. That because, it will be well-structured, classified and has the ability to be used and utilized in various combinations to satisfy specific user requirements.”

Shadbolt, Nigel, Kieron O’Hara, Tim Berners-Lee, Nicholas Gibbins, Hugh Glaser, Wendy Hall, and m.c. schraefel. “Linked Open Government Data: Lessons from Data.gov.uk.” IEEE Intelligent Systems 27, no. 3 (May 2012): 16–24. http://bit.ly/1cgdH6R.

  • In this paper, the authors view Open Government Data (OGD) as an “opportunity and a challenge for the LDW [Linked Data Web]. The opportunity is to grow by linking with PSI [Public Sector Information] – real-world, useful information with good provenance. The challenge is to manage the sudden influx of heterogeneous data, often with minimal semantics and structure, tailored to highly specific task contexts.
  • As the linking of OGD continues, the authors argue that, “Releasing OGD is not solely a technical problem, although it presents technical challenges. OGD is not a rigid government IT specification, but it demands productive dialogue between data providers, users, and developers. We should expect a ‘perpetual beta,’ in which best practice, technical development, innovative use of data, and citizen-centric politics combine to drive data-release programs.”
  • Despite challenges, the authors believe that, “Integrating OGD onto the LDW will vastly increase the scope and richness of the LDW. A reciprocal benefit is that the LDW will provide additional resources and context to enrich OGD. Here, we see the network effect in action, with resources mutually adding value to one another.”

Vitale, Michael, Anni Rowland-Campbell, Valentina Cardo, and Peter Thompson. “The Implications of Government as a ‘Social Machine’ for Making and Implementing Market-based Policy.” Intersticia, September 2013. http://bit.ly/HhMzqD.

  • This report from the Australia and New Zealand School of Government (ANZSOG) explores the concept of government as a social machine. The authors draw on the definition of a social machine proposed by Sir Nigel Shadbolt et al. – a system where “human and computational intelligence coalesce in order to achieve a given purpose” – to describe a “new approach to the relationship between citizens and government, facilitated by technological systems which are increasingly becoming intuitive, intelligent and ‘social.'”
  • The authors argue that beyond providing more and varied data to government, the evolving concept of government as a social machine as the potential to alter power dynamics, address the growing lack of trust in public institutions and facilitate greater public involvement in policy-making.