Data Science ethics


Gov.uk blog: “If Tesco knows day-to-day how poorly the nation is, how can Government access  similar  insights so it can better plan health services? If Airbnb can give you a tailored service depending on your tastes, how can Government provide people with the right support to help them back into work in a way that is right for them? If companies are routinely using social media data to get feedback from their customers to improve their services, how can Government also use publicly available data to do the same?

Data science allows us to use new types of data and powerful tools to analyse this more quickly and more objectively than any human could. It can put us in the vanguard of policymaking – revealing new insights that leads to better and more tailored interventions. And  it can help reduce costs, freeing up resource to spend on more serious cases.

But some of these data uses and machine-learning techniques are new and still relatively untested in Government. Of course, we operate within legal frameworks such as the Data Protection Act and Intellectual Property law. These are flexible but don’t always talk explicitly about the new challenges data science throws up. For example, how are you to explain the decision making process of a deep learning black box algorithm? And if you were able to, how would you do so in plain English and not a row of 0s and 1s?

We want data scientists to feel confident to innovate with data, alongside  the policy makers and operational staff who make daily decisions on the data that the analysts provide –. That’s why we are creating an ethical framework which brings together the relevant parts of the law and ethical considerations into a simple document that helps Government officials decide what it can do and what it should do. We have a moral responsibility to maximise the use of data – which is never more apparent than after incidents of abuse or crime are left undetected – as well as to pay heed to the potential risks of these new tools. The guidelines are draft and not formal government policy, but we want to share them more widely in order to help iterate and improve them further….

So what’s in the framework? There is more detail in the fuller document, but it is based around six key principles:

  1. Start with a clear user need and public benefit: this will help you justify the level of data sensitivity and method you use
  2. Use the minimum level of data necessary to fulfill the public benefit: there are many techniques for doing so, such as de-identification, aggregation or querying against data
  3. Build robust data science models: the model is only as good as the data it contains and while machines are less biased than humans they can get it wrong. It’s critical to be clear about the confidence of the model and think through unintended consequences and biases contained within the data
  4. Be alert to public perceptions: put simply, what would a normal person on the street think about the project?
  5. Be as open and accountable as possible: Transparency is the antiseptic for unethical behavior. Aim to be as open as possible (with explanations in plain English), although in certain public protection cases the ability to be transparent will be constrained.
  6. Keep data safe and secure: this is not restricted to data science projects but we know that the public are most concerned about losing control of their data….(More)”

Big Data in the Policy Cycle: Policy Decision Making in the Digital Era


Paper by Johann Höchtl et al in the Journal of Organizational Computing and Electronic Commerce: “Although of high relevance to political science, the interaction between technological change and political change in the era of Big Data remains somewhat of a neglected topic. Most studies focus on the concept of e-government and e-governance, and on how already existing government activities performed through the bureaucratic body of public administration could be improved by technology. This paper attempts to build a bridge between the field of e-governance and theories of public administration that goes beyond the service delivery approach that dominates a large part of e-government research. Using the policy cycle as a generic model for policy processes and policy development, a new look on how policy decision making could be conducted on the basis of ICT and Big Data is presented in this paper….(More)”

Citizenship, Social Media, and Big Data: Current and Future Research in the Social Sciences


Homero Gil de Zúñiga at Social Science Computer Review: “This special issue of the Social Science Computer Review provides a sample of the latest strategies employing large data sets in social media and political communication research. The proliferation of information communication technologies, social media, and the Internet, alongside the ubiquity of high-performance computing and storage technologies, has ushered in the era of computational social science. However, in no way does the use of “big data” represent a standardized area of inquiry in any field. This article briefly summarizes pressing issues when employing big data for political communication research. Major challenges remain to ensure the validity and generalizability of findings. Strong theoretical arguments are still a central part of conducting meaningful research. In addition, ethical practices concerning how data are collected remain an area of open discussion. The article surveys studies that offer unique and creative ways to combine methods and introduce new tools while at the same time address some solutions to ethical questions….(More)”

Five Studies: How Behavioral Science Can Help in International Development


 in Pacific Standard: “In 2012, there were 896 million people around the world—12.7 percent of the global population—living on less than two dollars a day. The World Food Programestimates that 795 million people worldwide don’t have enough food to “lead a healthy life”; 25 percent of people living in Sub-Saharan Africa are undernourished. Over three million children die every year thanks to poor nutrition, and hunger is the leading cause of death worldwide. In 2012, just three preventable diseases (pneumonia, diarrhea, and malaria) killed 4,600 children every day.

Last month, the World Bank announced the launch of the Global Insights Initiative (GINI). The initiative, which follows in the footsteps of so-called “nudge units” in the United Kingdom and United States, is the Bank’s effort to incorporate insights from the field of behavioral science into the design of international development programs; too often, those programs failed to account for how people behave in the real world. Development policy, according to the Bank’s 2015 World Development Report, is overdue for a “redesign based on careful consideration of human factors.” Researchers have applauded the announcement, but it raises an interesting question: What can nudges really accomplish in the face of the developing world’s overwhelming poverty and health-care deficits?

In fact, researchers have found that instituting small program changes, informed by a better understanding of people’s motivations and limitations, can have big effects on everything from savings rates to vaccination rates to risky sexual behavior. Here are five studies that demonstrate the benefits of bringing empirical social science into the developing world….(More)”

State of the Commons


Creative Commons: “Creative Commoners have known all along that collaboration, sharing, and cooperation are a driving force for human evolution. And so for many it will come as no surprise that in 2015 we achieved a tremendous milestone: over 1.1 billion CC licensed photos, videos, audio tracks, educational materials, research articles, and more have now been contributed to the shared global commons…..

Whether it’s open education, open data, science, research, music, video, photography, or public policy, we are putting sharing and collaboration at the heart of the Web. In doing so, we are much closer to realizing our vision: unlocking the full potential of the Internet to drive a new era of development, growth, and productivity.

I am proud to share with you our 2015 State of the Commons report, our best effort to measure the immeasurable scope of the commons by looking at the CC licensed content, along with content marked as public domain, that comprise the slice of the commons powered by CC tools. We are proud to be a leader in the commons movement, and we hope you will join us as we celebrate all we have accomplished together this year. ….Report at https://stateof.creativecommons.org/2015/”

Peering at Open Peer Review


at the Political Methodologist: “Peer review is an essential part of the modern scientific process. Sending manuscripts for others to scrutinize is such a widespread practice in academia that its importance cannot be overstated. Since the late eighteenth century, when the Philosophical Transactions of the Royal Society pioneered editorial review,1 virtually every scholarly outlet has adopted some sort of pre-publication assessment of received works. Although the specifics may vary, the procedure has remained largely the same since its inception: submit, receive anonymous criticism, revise, restart the process if required. A recent survey of APSA members indicates that political scientists overwhelmingly believe in the value of peer review (95%) and the vast majority of them (80%) think peer review is a useful tool to keep themselves up to date with cutting-edge research (Djupe 2015, 349). But do these figures suggest that journal editors can rest upon their laurels and leave the system as it is?

Not quite. A number of studies have been written about the shortcomings of peer review. The system has been criticised for being too slow (Ware 2008), conservative (Eisenhart 2002), inconsistent (Smith 2006; Hojat, Gonnella, and Caelleigh 2003), nepotist (Sandström and Hällsten 2008), biased against women (Wold and Wennerås 1997), affiliation (Peters and Ceci 1982), nationality (Ernst and Kienbacher 1991) and language (Ross et al. 2006). These complaints have fostered interesting academic debates (e.g. Meadows 1998; Weller 2001), but thus far the literature offers little practical advice on how to tackle peer review problems. One often overlooked aspect in these discussions is how to provide incentives for reviewers to write well-balanced reports. On the one hand, it is not uncommon for reviewers to feel that their work is burdensome and not properly acknowledged. Further, due to the anonymous nature of the reviewing process itself, it is impossible to give the referee proper credit for a constructive report. On the other hand, the reviewers’ right to full anonymity may lead to sub-optimal outcomes as referees can rarely be held accountable for being judgmental (Fabiato 1994).

Open peer review (henceforth OPR) is largely in line with this trend towards a more transparent political science. Several definitions of OPR have been suggested, including more radical ones such as allowing anyone to write pre-publication reviews (crowdsourcing) or by fully replacing peer review with post-publication comments (Ford 2013). However, I believe that by adopting a narrow definition of OPR – only asking referees to sign their reports – we can better accommodate positive aspects of traditional peer review, such as author blinding, into an open framework. Hence, in this text OPR is understood as a reviewing method where both referee information and their reports are disclosed to the public, while the authors’ identities are not known to the reviewers before manuscript publication.

How exactly would OPR increase transparency in political science? As noted by a number of articles on the topic, OPR creates incentives for referees to write insightful reports, or at least it has no adverse impact over the quality of reviews (DeCoursey 2006; Godlee 2002; Groves 2010; Pöschl 2012; Shanahan and Olsen 2014). In a study that used randomized trials to assess the effect of OPR in the British Journal of Psychiatry, Walsh et al. (2000) show that “signed reviews were of higher quality, were more courteous and took longer to complete than unsigned reviews.” Similar results were reported by McNutt et al. (1990, 1374), who affirm that “editors graded signers as more constructive and courteous […], [and] authors graded signers as fairer.” In the same vein, Kowalczuk et al. (2013) measured the difference in review quality in BMC Microbiology and BMC Infectious Diseases and stated that signers received higher ratings for their feedback on methods and for the amount of evidence they mobilised to substantiate their decisions. Van Rooyen and her colleagues ((1999; 2010)) also ran two randomized studies on the subject, and although they did not find a major difference in perceived quality of both types of review, they reported that reviewers in the treatment group also took significantly more time to evaluate the manuscripts in comparison with the control group. They also note authors broadly favored the open system against closed peer review.

Another advantage of OPR is that it offers a clear way for referees to highlight their specialized knowledge. When reviews are signed, referees are able to receive credit for their important, yet virtually unsung, academic contributions. Instead of just having a rather vague “service to profession” section in their CVs, referees can precise information about the topics they are knowledgeable about and which sort of advice they are giving to prospective authors. Moreover, reports assigned a DOI number can be shared as any other piece of scholarly work, which leads to an increase in the body of knowledge of our discipline and a higher number of citations to referees. In this sense, signed reviews can also be useful for universities and funding bodies. It is an additional method to assess the expert knowledge of a prospective candidate. As supervising skills are somewhat difficult to measure, signed reviews are a good proxy for an applicant’s teaching abilities.

OPR provides background to manuscripts at the time of publication (Ford 2015; Lipworth et al. 2011). It is not uncommon for a manuscript to take months, or even years, to be published in a peer-reviewed journal. In the meantime, the text usually undergoes several major revisions, but readers rarely, if ever, see this trial-and-error approach in action. With public reviews, everyone would be able to track the changes made in the original manuscript and understand how the referees improved the text before its final version. Hence, OPR makes the scientific exchange clear, provides useful background information to manuscripts and fosters post-publication discussions by the readership at large.

Signed and public reviews are also important pedagogical tools. OPR gives a rare glimpse of how academic research is actually conducted, making explicit the usual need for multiple iterations between the authors and the editors before an article appears in print. Furthermore, OPR can fill some of the gap in peer-review training for graduate students. OPR allows junior scholars to compare different review styles, understand what the current empirical or theoretical puzzles of their discipline are, and engage in post-publication discussions about topics in which they are interested (Ford 2015; Lipworth et al. 2011)….(More)”

Forging Trust Communities: How Technology Changes Politics


Book by Irene S. Wu: “Bloggers in India used social media and wikis to broadcast news and bring humanitarian aid to tsunami victims in South Asia. Terrorist groups like ISIS pour out messages and recruit new members on websites. The Internet is the new public square, bringing to politics a platform on which to create community at both the grassroots and bureaucratic level. Drawing on historical and contemporary case studies from more than ten countries, Irene S. Wu’s Forging Trust Communities argues that the Internet, and the technologies that predate it, catalyze political change by creating new opportunities for cooperation. The Internet does not simply enable faster and easier communication, but makes it possible for people around the world to interact closely, reciprocate favors, and build trust. The information and ideas exchanged by members of these cooperative communities become key sources of political power akin to military might and economic strength.

Wu illustrates the rich world history of citizens and leaders exercising political power through communications technology. People in nineteenth-century China, for example, used the telegraph and newspapers to mobilize against the emperor. In 1970, Taiwanese cable television gave voice to a political opposition demanding democracy. Both Qatar (in the 1990s) and Great Britain (in the 1930s) relied on public broadcasters to enhance their influence abroad. Additional case studies from Brazil, Egypt, the United States, Russia, India, the Philippines, and Tunisia reveal how various technologies function to create new political energy, enabling activists to challenge institutions while allowing governments to increase their power at home and abroad.

Forging Trust Communities demonstrates that the way people receive and share information through network communities reveals as much about their political identity as their socioeconomic class, ethnicity, or religion. Scholars and students in political science, public administration, international studies, sociology, and the history of science and technology will find this to be an insightful and indispensable work….(More)”

Creating Value through Open Data


Press Release: “Capgemini Consulting, the global strategy and transformation consulting arm of the Capgemini Group, today published two new reports on the state of play of Open Data in Europe, to mark the launch of the European Open Data Portal. The first report addresses “Open Data Maturity in Europe 2015: Insights into the European state of play” and the second focuses on “Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources.” The countries covered by these assessments include the EU28 countries plus Iceland, Liechtenstein, Norway, and Switzerland – commonly referred to as the EU28+ countries. The reports were requested by the European Commission within the framework of the Connecting Europe Facility program, supporting the deployment of European Open Data infrastructure.

Open Data refers to the information collected, produced or paid for by public bodies and can be freely used, modified and shared by anyone.. For the period 2016-2020, the direct market size for Open Data is estimated at EUR 325 billion for Europe. Capgemini’s study “Creating Value through Open Data” illustrates how Open Data can create economic value in multiple ways including increased market transactions, job creation from producing services and products based on Open Data, to cost savings and efficiency gains. For instance, effective use of Open Data could help save 629 million hours of unnecessary waiting time on the roads in the EU; and help reduce energy consumption by 16%. The accumulated cost savings for public administrations making use of Open Data across the EU28+ in 2020 are predicted to equal 1.7 bn EUR. Reaping these benefits requires reaching a high level of Open Data maturity.

In order to address the accessibility and the value of Open Data across European countries, the European Union has launched the Beta version of the European Data Portal. The Portal addresses the whole Data Value Chain, from data publishing to data re-use. Over 240,000 data sets are referenced on the Portal and 34 European countries. It offers seamless access to public data across Europe, with over 13 content categories to categorize data, ranging from health or education to transport or even science and justice. Anyone, citizens, businesses, journalists or administrations can search, access and re-use the full data collection. A wide range of data is available, from crime records in Helsinki, labor mobility in the Netherlands, forestry maps in France to the impact of digitization in Poland…..The study, “Open Data Maturity in Europe 2015: Insights into the European state of play”, uses two key indicators: Open Data Readiness and Portal Maturity. These indicators cover both the maturity of national policies supporting Open Data as well as an assessment of the features made available on national data portals. The study shows that the EU28+ have completed just 44% of the journey towards achieving full Open Data Maturity and there are large discrepancies across countries. A third of European countries (32%), recognized globally, are leading the way with solid policies, licensing norms, good portal traffic and many local initiatives and events to promote Open Data and its re-use….(More)”

Decoding the Future for National Security


George I. Seffers at Signal: “U.S. intelligence agencies are in the business of predicting the future, but no one has systematically evaluated the accuracy of those predictions—until now. The intelligence community’s cutting-edge research and development agency uses a handful of predictive analytics programs to measure and improve the ability to forecast major events, including political upheavals, disease outbreaks, insider threats and cyber attacks.

The Office for Anticipating Surprise at the Intelligence Advanced Research Projects Activity (IARPA) is a place where crystal balls come in the form of software, tournaments and throngs of people. The office sponsors eight programs designed to improve predictive analytics, which uses a variety of data to forecast events. The programs all focus on incidents outside of the United States, and the information is anonymized to protect privacy. The programs are in different stages, some having recently ended as others are preparing to award contracts.

But they all have one more thing in common: They use tournaments to advance the state of the predictive analytic arts. “We decided to run a series of forecasting tournaments in which people from around the world generate forecasts about, now, thousands of real-world events,” says Jason Matheny, IARPA’s new director. “All of our programs on predictive analytics do use this tournament style of funding and evaluating research.” The Open Source Indicators program used a crowdsourcing technique in which people across the globe offered their predictions on such events as political uprisings, disease outbreaks and elections.

The data analyzed included social media trends, Web search queries and even cancelled dinner reservations—an indication that people are sick. “The methods applied to this were all automated. They used machine learning to comb through billions of pieces of data to look for that signal, that leading indicator, that an event was about to happen,” Matheny explains. “And they made amazing progress. They were able to predict disease outbreaks weeks earlier than traditional reporting.” The recently completed Aggregative Contingent Estimation (ACE) program also used a crowdsourcing competition in which people predicted events, including whether weapons would be tested, treaties would be signed or armed conflict would break out along certain borders. Volunteers were asked to provide information about their own background and what sources they used. IARPA also tested participants’ cognitive reasoning abilities. Volunteers provided their forecasts every day, and IARPA personnel kept score. Interestingly, they discovered the “deep domain” experts were not the best at predicting events. Instead, people with a certain style of thinking came out the winners. “They read a lot, not just from one source, but from multiple sources that come from different viewpoints. They have different sources of data, and they revise their judgments when presented with new information. They don’t stick to their guns,” Matheny reveals. …

The ACE research also contributed to a recently released book, Superforecasting: The Art and Science of Prediction, according to the IARPA director. The book was co-authored, along with Dan Gardner, by Philip Tetlock, the Annenberg University professor of psychology and management at the University of Pennsylvania who also served as a principal investigator for the ACE program. Like ACE, the Crowdsourcing Evidence, Argumentation, Thinking and Evaluation program uses the forecasting tournament format, but it also requires participants to explain and defend their reasoning. The initiative aims to improve analytic thinking by combining structured reasoning techniques with crowdsourcing.

Meanwhile, the Foresight and Understanding from Scientific Exposition (FUSE) program forecasts science and technology breakthroughs….(More)”

Tech and Innovation to Re-engage Civic Life


Hollie Russon Gilman at the Stanford Social Innovation Review: “Sometimes even the best-intentioned policymakers overlook the power of people. And even the best-intentioned discussions on social impact and leveraging big data for the social sector can obscure the power of every-day people in their communities.

But time and time again, I’ve seen the transformative power of civic engagement when initiatives are structured well. For example, the other year I witnessed a high school student walk into a school auditorium one evening during Boston’s first-ever youth-driven participatory budgeting project. Participatory budgeting gives residents a structured opportunity to work together to identify neighborhood priorities, work in tandem with government officials to draft viable projects, and prioritize projects to fund. Elected officials in turn pledge to implement these projects and are held accountable to their constituents. Initially intrigued by an experiment in democracy (and maybe the free pizza), this student remained engaged over several months, because she met new members of her community; got to interact with elected officials; and felt like she was working on a concrete objective that could have a tangible, positive impact on her neighborhood.

For many of the young participants, ages 12-25, being part of a participatory budgeting initiative is the first time they are involved in civic life. Many were excited that the City of Boston, in collaboration with the nonprofit Participatory Budgeting Project, empowered young people with the opportunity to allocate $1 million in public funds. Through participating, young people gain invaluable civic skills, and sometimes even a passion that can fuel other engagements in civic and communal life.

This is just one example of a broader civic and social innovation trend. Across the globe, people are working together with their communities to solve seemingly intractable problems, but as diverse as those efforts are, there are also commonalities. Well-structured civic engagement creates the space and provides the tools for people to exert agency over policies. When citizens have concrete objectives, access to necessary technology (whether it’s postcards, trucks, or open data portals), and an eye toward outcomes, social change happens.

Using Technology to Distribute Expertise

Technology is allowing citizens around the world to participate in solving local, national, and global problems. When it comes to large, public bureaucracies, expertise is largely top-down and concentrated. Leveraging technology creates opportunities for people to work together in new ways to solve public problems. One way is through civic crowdfunding platforms like Citizinvestor.com, which cities can use to develop public sector projects for citizen support; several cities in Rhode Island, Oregon, and Philadelphia have successfully pooled citizen resources to fund new public works. Another way is through citizen science. Old Weather, a crowdsourcing project from the National Archives and Zooniverse, enrolls people to transcribe old British ship logs to identify climate change patterns. Platforms like these allow anyone to devote a small amount of time or resources toward a broader public good. And because they have a degree of transparency, people can see the progress and impact of their efforts. ….(More)”