From binoculars to big data: Citizen scientists use emerging technology in the wild


Interview by Rebecca Kondos: “For years, citizen scientists have trekked through local fields, rivers, and forests to observe, measure, and report on species and habitats with notebooks, binoculars, butterfly nets, and cameras in hand. It’s a slow process, and the gathered data isn’t easily shared. It’s a system that has worked to some degree, but one that’s in need of a technology and methodology overhaul.

Thanks to the team behind Wildme.org and their Wildbook software, both citizen and professional scientists are becoming active participants in using AI, computer vision, and big data. Wildbook is working to transform the data collection process, and citizen scientists who use the software have more transparency into conservation research and the impact it’s making. As a result, engagement levels have increased; scientists can more easily share their work; and, most important, endangered species like the whale shark benefit.

In this interview, Colin Kingen, a software engineer for WildBook, (with assistance from his colleagues Jason Holmberg and Jon Van Oast) discusses Wildbook’s work, explains classic problems in field observation science, and shares how Wildbook is working to solve some of the big problems that have plagued wildlife research. He also addresses something I’ve wondered about: why isn’t there an “uberdatabase” to share the work of scientists across all global efforts? The work Kingen and his team are doing exemplifies what can be accomplished when computer scientists with big hearts apply their talents to saving wildlife….(More)”.

Government initiative offers Ghanaians chance for greater participation


Springwise: “Openness and transparency are key ingredients in building an accountable and effective democratic government. An “open” government is transparent, accessible to anyone, anytime, anywhere; and is responsive to new ideas and demands. The key to this is providing access to accurate data to all citizens. However, in many countries, a low rate of citizen participation and involvement has led to poor accountability from government officials. In Ghana, a new project, TransGov, is developing innovative tools to foster participation in local governance of marginalised groups, and improve government accountability to those who need it most.

TransGov’s research found that many Ghanaians were not aware of the status of local development projects, and this has led to a general public apathy, where people felt they had no influence on getting the government to work for them. TransGov created a platform to enhance information disclosure, dissemination and to create ways for citizens to engage with the local leaders in their communities. The TransGov platform allows all citizens to track the progress of government projects in their area and to publish information about those projects. TransGov has four integrated platforms, including a website, mobile app, voice response technology (IVR) and SMS – to allow the participation of people from a wide range of socio-economic backgrounds.

The organization has recently partnered with the government-sponsored Ghana Open Data Initiative, to share resources, tools, and research and hold workshops and seminars. This is aimed to strengthen various government agencies in collecting and managing data for public use. The hope is that making this information more accessible will help create more business opportunities and drive innovation, as well as increasing democratic participation. We have seen this in educational radio broadcasts in Cairo subways and an app that allows citizen feedback on city development….(More)”.

How Africa’s Data Revolution Can Deliver Sustainable Development Outcomes


Donald Mogeni at Huffington Post: “…As a demonstration of this political will, several governments in Africa are blazing the trail in numerous ways. For instance, the Government of Senegal now considers investment in data as important as it would treat investment in physical infrastructure such as roads. In Ghana and Sierra Leone, more policy-makers and legislators are now using data to inform their work and make planning is continuously evidence-based.

Despite the progressive developments, several cautionary statements are worth noting. Firstly, data is not a silver-bullet to addressing present development challenges and/or problems. To be transformative, use of data and evidence must include political agency and citizen mobilization. Thus, while data may highlight important development cleavages, it may not guarantee change if not used appropriately within the various political contexts. ‘Everyone Counts’, a new global initiative by CARE, KWANTU and World Vision (that was also showcased in the meeting) seeks to contribute to this agenda.

Secondly, there is need for data ‘experts’ to move beyond the chronic obsession with big numbers to ensure greater inclusion of marginalised and vulnerable segments of the population. Achieving this will require a ‘business unusual’ approach that devises better data collection methodologies and technologies that must collect more and better than ever before. This ‘new’ data should then be used together with administrative and open data to ensure that ‘no one is left behind’.

Thirdly, the utility of citizen-generated data is still contentious – especially within state institutions. Increasing the value of this data must therefore involve standardization of data collection tools and methodologies across the board (to the extent possible), making consideration for ethical approvals, subjecting this data to quality audits and triangulation, as well as adhering to quality assurance standards.

Fourthly, the emergence of various data communities within African countries has made the roles of National Statistical Offices in the data ecosystem even more crucial. However, significant capacity and technical disparities exist between the various National Statistical Offices (NSOs) in Africa. To realise the potential of data and statistics in achieving sustainable development outcomes, financial and human capacities of these institutions must to be enhanced….(More)”.

Formalised data citation practices would encourage more authors to make their data available for reuse


 Hyoungjoo Park and Dietmar Wolfram at the LSE Impact Blog: “Today’s researchers work in a heavily data-intensive and collaborative environment in order to further scientific discovery across and within fields. It is becoming routine for researchers (i.e. authors and data publishers) to submit their research data, such as datasets, biological samples in biomedical fields, and computer code, as supplementary information in order to comply with data sharing requirements of major funding agencies, high-profile journals, and data journals. This is part of open science, where data and any publication products are expected to be made available to anyone interested.

Given that researchers benefit from publicly shared data through data reuse in their own research, researchers who provide access to data should be acknowledged for their contributions, much in the same way that authors are recognised for their research publications through citation. Researchers who use shared data or other shared research products (e.g. open access software, tissue cultures) should also acknowledge the providers of these resources through formal citation. At present, data citation is not widely practised in most disciplines and as an object of study remains largely overlooked….

We found that data citations appear in the references section of an article less frequently than in the main text, making it difficult to identify the reward and credit for data authors (i.e. data sharers). Consistent data citation formats could not be found. Current data citation practices do not (yet) benefit data sharers. Also, data citation was sometimes located in the supplementary information, outside of the references. Data that had been reused was often not acknowledged in the reference lists, but was rather hidden in the representation of data (e.g. tables, figures, images, graphs, and other elements), which may be a consequence of the fact that data citation practices are not yet common in scholarly communications.

Ongoing challenges remain in identifying and documenting data citation. First, the practice of informal data citation presents a challenge for accurately documenting data citation. …

Second, data recitation by one or more co-authors of earlier studies (i.e. self-citation) is common, which reduces the broader impact of data sharing by limiting much of the reuse to the original authors..

Third, currently indexed data citations may not include rapidly advancing areas, such as in the hard sciences or computer engineering, because approximately 90% of indexed works were associated with journal articles…

Fourth, the number of authors associated with shared datasets raises questions of the ownership of and responsibility for a collective work, although some journals require one author to be responsible for the data used in the study…(More). (See also An examination of research data sharing and re-use: implications for data citation practice, published in Scientometrics)

Politicizing Digital Space: Theory, the Internet, and Renewing Democracy


Book by Trevor Garrison Smith: “The objective of this book is to outline how a radically democratic politics can be reinvigorated in theory and practice through the use of the internet. The author argues that politics in its proper sense can be distinguished from anti-politics by analyzing the configuration of public space, subjectivity, participation, and conflict. Each of these terrains can be configured in a more or less political manner, though the contemporary status quo heavily skews them towards anti-political configuration.

Using this understanding of what exactly politics entails, this book considers how the internet can both help and hinder efforts to move each area in a more political direction. By explicitly interpreting contemporary theories of the political in terms of the internet, this analysis avoids the twin traps of both technological determinism and technological cynicism.

Raising awareness of what the word ‘politics’ means, the author develops theoretical work by Arendt, Rancière, Žižek and Mouffe to present a clear and coherent view of how in theory, politics can be digitized and alternatively how the internet can be deployed in the service of trulydemocratic politics…(More)”.

Uber Releases Open Source Project for Differential Privacy


Katie Tezapsidis at Uber Security: “Data analysis helps Uber continuously improve the user experience by preventing fraud, increasing efficiency, and providing important safety features for riders and drivers. Data gives our teams timely feedback about what we’re doing right and what needs improvement.

Uber is committed to protecting user privacy and we apply this principle throughout our business, including our internal data analytics. While Uber already has technical and administrative controls in place to limit who can access specific databases, we are adding additional protections governing how that data is used — even in authorized cases.

We are excited to give a first glimpse of our recent work on these additional protections with the release of a new open source tool, which we’ll introduce below.

Background: Differential Privacy

Differential privacy is a formal definition of privacy and is widely recognized by industry experts as providing strong and robust privacy assurances for individuals. In short, differential privacy allows general statistical analysis without revealing information about a particular individual in the data. Results do not even reveal whether any individual appears in the data. For this reason, differential privacy provides an extra layer of protection against re-identification attacks as well as attacks using auxiliary data.

Differential privacy can provide high accuracy results for the class of queries Uber commonly uses to identify statistical trends. Consequently, differential privacy allows us to calculate aggregations (averages, sums, counts, etc.) of elements like groups of users or trips on the platform without exposing information that could be used to infer details about a specific user or trip.

Differential privacy is enforced by adding noise to a query’s result, but some queries are more sensitive to the data of a single individual than others. To account for this, the amount of noise added must be tuned to the sensitivity of the query, which is defined as the maximum change in the query’s output when an individual’s data is added to or removed from the database.

As part of their job, a data analyst at Uber might need to know the average trip distance in a particular city. A large city, like San Francisco, might have hundreds of thousands of trips with an average distance of 3.5 miles. If any individual trip is removed from the data, the average remains close to 3.5 miles. This query therefore has low sensitivity, and thus requires less noise to enable each individual to remain anonymous within the crowd.

Conversely, the average trip distance in a smaller city with far fewer trips is more influenced by a single trip and may require more noise to provide the same degree of privacy. Differential privacy defines the precise amount of noise required given the sensitivity.

A major challenge for practical differential privacy is how to efficiently compute the sensitivity of a query. Existing methods lack sufficient support for the features used in Uber’s queries and many approaches require replacing the database with a custom runtime engine. Uber uses many different database engines and replacing these databases is infeasible. Moreover, custom runtimes cannot meet Uber’s demanding scalability and performance requirements.

Introducing Elastic Sensitivity

To address these challenges we adopted Elastic Sensitivity, a technique developed by security researchers at the University of California, Berkeley for efficiently calculating the sensitivity of a query without requiring changes to the database. The full technical details of Elastic Sensitivity are described here.

Today, we are excited to share a tool developed in collaboration with these researchers to calculate Elastic Sensitivity for SQL queries. The tool is available now on GitHub. It is designed to integrate easily with existing data environments and support additional state-of-the-art differential privacy mechanisms, which we plan to share in the coming months….(More)”.

Political Inequality in Affluent Democracies


 for the SSRC: “A key characteristic of a democracy,” according to Robert Dahl, is “the continuing responsiveness of the government to the preferences of its citizens, considered as political equals.” Much empirical research over the past half century, most of it focusing on the United States, has examined the relationship between citizens’ policy preferences and the policy choices of elected officials. According to Robert Shapiro, this research has generated “evidence for strong effects of public opinion on government policies,” providing “a sanguine picture of democracy at work.”

In recent years, however, scholars of American politics have produced striking evidence that the apparent “strong effects” of aggregate public opinion in these studies mask severe inequalities in responsiveness. As Martin Gilens put it, “The American government does respond to the public’s preferences, but that responsiveness is strongly tilted toward the most affluent citizens. Indeed, under most circumstances, the preferences of the vast majority of Americans appear to have essentially no impact on which policies the government does or doesn’t adopt.”

One possible interpretation of these findings is that the American political system is anomalous in its apparent disregard for the preferences of middle-class and poor people. In that case, the severe political inequality documented there would presumably be accounted for by distinctive features of the United States, such as its system of private campaign finance, its weak labor unions, or its individualistic political culture. But, what if severe political inequality is endemic in affluent democracies? That would suggest that fiddling with the political institutions of the United States to make them more like Denmark’s (or vice versa) would be unlikely to bring us significantly closer to satisfying Dahl’s standard of democratic equality. We would be forced to conclude either that Dahl’s standard is fundamentally misguided or that none of the political systems commonly identified as democratic comes anywhere close to meriting that designation.

Analyzing policy responsiveness

“I have attempted to test the extent to which policymakers in a variety of affluent democracies respond to the preferences of their citizens considered as political equals.”

To address this question, I have attempted to test the extent to which policymakers in a variety of affluent democracies respond to the preferences of their citizens considered as political equals. My analyses focus on the relationship between public opinion and government spending on social welfare programs, including pensions, health, education, and unemployment benefits. These programs represent a major share of government spending in every affluent democracy and, arguably, an important source of public well-being. Moreover, social spending figures prominently in the comparative literature on the political impact of public opinion in affluent democracies, with major scholarly works suggesting that it is significantly influenced by citizens’ preferences.

My analyses employ data on citizens’ views about social spending and the welfare state from three major cross-national survey projects—the International Social Survey Programme (ISSP), the World Values Survey (WVS), and the European Values Survey (EVS). In combination, these three sources provide relevant opinion data from 160 surveys conducted between 1985 and 2012 in 30 countries, including most of the established democracies of Western Europe and the English-speaking world and some newer democracies in Eastern Europe, Latin America, and Asia. I examine shifts in (real per capita) social spending in the two years following each survey. Does greater public enthusiasm for the welfare state lead to increases in social spending, other things being equal? And, more importantly here, do the views of low-income people have the same apparent influence on policy as the views of affluent people?…(More)”.

Open Data Blueprint


ODX Canada: “In Canada, the open data environment should be viewed as a supply chain. The movement of open data from producers to consumers involves many different organizations, people, activities, projects and initiatives, all of which work together to push out a final product. Naturally, if there is a break or hurdle in this supply chain, it doesn’t work efficiently. A fundamental hurdle highlighted by companies across the country was the inability to scale their business at the provincial, national and international levels.

This blueprint aims to address the challenges Canadian entrepreneurs are facing by encouraging municipalities to launch open data initiatives. By sharing best practices, we hope to encourage the accessibility of datasets within existing jurisdictions. The structured recommendations in this Open Data Blueprint are based on feedback and best practices seen in major cities across Canada collected through ODX’s primary research….(More)”

(Read more about the OD150 initiative here)

Principles and Practices for a Federal Statistical Agency


National Academies of Sciences Report: “Publicly available statistics from government agencies that are credible, relevant, accurate, and timely are essential for policy makers, individuals, households, businesses, academic institutions, and other organizations to make informed decisions. Even more, the effective operation of a democratic system of government depends on the unhindered flow of statistical information to its citizens.

In the United States, federal statistical agencies in cabinet departments and independent agencies are the governmental units whose principal function is to compile, analyze, and disseminate information for such statistical purposes as describing population characteristics and trends, planning and monitoring programs, and conducting research and evaluation. The work of these agencies is coordinated by the U.S. Office of Management and Budget. Statistical agencies may acquire information not only from surveys or censuses of people and organizations, but also from such sources as government administrative records, private-sector datasets, and Internet sources that are judged of suitable quality and relevance for statistical use. They may conduct analyses, but they do not advocate policies or take partisan positions. Statistical purposes for which they provide information relate to descriptions of groups and exclude any interest in or identification of an individual person, institution, or economic unit.

Four principles are fundamental for a federal statistical agency: relevance to policy issues, credibility among data users, trust among data providers, and independence from political and other undue external influence.� Principles and Practices for a Federal Statistical Agency: Sixth Edition presents and comments on these principles as they’ve been impacted by changes in laws, regulations, and other aspects of the environment of federal statistical agencies over the past 4 years….(More)”.

Lessons from Airbnb and Uber to Open Government as a Platform


Interview by Marquis Cabrera with Sangeet Paul Choudary: “…Platform companies have a very strong core built around data, machine learning, and a central infrastructure. But they rapidly innovate around it to try and test new things in the market and that helps them open themselves for further innovation in the ecosystem. Governments can learn to become more modular and more agile, the way platform companies are. Modularity in architecture is a very fundamental part of being a platform company; both in terms of your organizational architecture, as well as your business model architecture.

The second thing that governments can learn from a platform company is that successful platform companies are created with intent. They are not created by just opening out what you have available. If you look at the current approach of applying platform thinking in government, a common approach is just to take data and open it out to the world. However, successful platform companies first create a shaping strategy to shape-out and craft a direction of vision for the ecosystem in terms of what they can achieve by being on the platform. They then provision the right tools and services that serve the vision to enable success for the ecosystem[1] . And only then do they open up their infrastructure. It’s really important that you craft the right shaping strategy and use that to define the rights tools and services before you start pursuing a platform implementation.

In my work with governments, I regularly find myself stressing the importance of thinking as a market maker rather than as a service provider. Governments have always been market makers but when it comes to technology, they often take the service provider approach.

In your book, you used San Francisco City Government and Data.gov as examples of infusing platform thinking in government. But what are some global examples of governments, countries infusing platform thinking around the world?

One of the best examples is from my home country Singapore, which has been at the forefront of converting the nation into a platform. It has now been pursuing platform strategy both overall as a nation by building a smart nation platform, and also within verticals. If you look particularly at mobility and transportation, it has worked to create a central core platform and then build greater autonomy around how mobility and transportation works in the country. Other good examples of governments applying this are Dubai, South Korea, Barcelona; they are all countries and cities that have applied the concept of platforms very well to create a smart nation platform. India is another example that is applying platform thinking with the creation of the India stack, though the implementation could benefit from better platform governance structures and a more open regulation around participation….(More)”.