Citicafe: conversation-based intelligent platform for citizen engagement


Paper by Amol Dumrewal et al in the Proceedings of the ACM India Joint International Conference on Data Science and Management of Data: “Community civic engagement is a new and emerging trend in urban cities driven by the mission of developing responsible citizenship. The recognition of civic potential in every citizen goes a long way in creating sustainable societies. Technology is playing a vital role in helping this mission and over the last couple of years, there have been a plethora of social media avenues to report civic issues. Sites like Twitter, Facebook, and other online portals help citizens to report issues and register complaints. These complaints are analyzed by the public services to help understand and in-turn address these issues. However, once the complaint is registered, often no formal or informal feedback is given back from these sites to the citizens. This de-motivates citizens and may deter them from registering further complaints. In addition, these sites offer no holistic information about a neighborhood to the citizens. It is useful for people to know whether there are similar complaints posted by other people in the same area, the profile of all complaints and a know-how of how and when these complaints will be addressed.

In this paper, we create a conversation-based platform CitiCafe for enhancing citizen engagement front-ended by a virtual agent with a Twitter interface. This platform back-end stores and processes information pertaining to civic complaints in a city. A Twitter based conversation service allows citizens to have a direct correspondence with CitiCafe via “tweets” and direct messages. The platform also helps citizens to (a) report problems and (b) gather information related to civic issues in different neighborhoods. This can also help, in the long run, to develop civic conversations among citizens and also between citizens and public services….(More)”.

Public Scrutiny of Automated Decisions: Early Lessons and Emerging Methods


Research Report by Omidyar Network: “Automated decisions are increasingly part of everyday life, but how can the public scrutinize, understand, and govern them? To begin to explore this, Omidyar Network has, in partnership with Upturn, published Public Scrutiny of Automated Decisions: Early Lessons and Emerging Methods.

The report is based on an extensive review of computer and social science literature, a broad array of real-world attempts to study automated systems, and dozens of conversations with global digital rights advocates, regulators, technologists, and industry representatives. It maps out the landscape of public scrutiny of automated decision-making, both in terms of what civil society was or was not doing in this nascent sector and what laws and regulations were or were not in place to help regulate it.

Our aim in exploring this is three-fold:

1) We hope it will help civil society actors consider how much they have to gain in empowering the public to effectively scrutinize, understand, and help govern automated decisions;

2) We think it can start laying a policy framework for this governance, adding to the growing literature on the social and economic impact of such decisions; and

3) We’re optimistic that the report’s findings and analysis will inform other funders’ decisions in this important and growing field. (Read the full report here.)”

Six creative ways to engage the public in innovation policy


Tom Saunders at Nesta: “When someone decides to engage the public in a discussion about science or innovation, it usually involves booking a room, bringing a group of people together and giving them some information about a topical issue then listening to their thoughts about it. After this, the organisers ususally produce a report which they email to everyone they want to influence, or if it was commissioned directly by a research funder or a public body, there is usually a response detailing how they are going to act on the views of the public.

What’s wrong with this standard format of public dialogue? Through our research into public engagement in innovation policy, we noticed a number of issues:

  • Almost all public engagement work is offline, with very little money spent on digital methods

  • Most dialogues are top down, e.g a research council decides that they need to engage the public on a particular issue. They rarely come from citizens themselves

  • Most public dialogues are only open to a small number of hand-picked participants. No one else can take part, even if they want to

  • Few public engagement activities focus specifically on engaging with underrepresented groups…(More)”.

Data-Driven Regulation and Governance in Smart Cities


Chapter by Sofia Ranchordas and Abram Klop in Berlee, V. Mak, E. Tjong Tjin Tai (Eds), Research Handbook on Data Science and Law (Edward Elgar, 2018): “This paper discusses the concept of data-driven regulation and governance in the context of smart cities by describing how these urban centres harness these technologies to collect and process information about citizens, traffic, urban planning or waste production. It describes how several smart cities throughout the world currently employ data science, big data, AI, Internet of Things (‘IoT’), and predictive analytics to improve the efficiency of their services and decision-making.

Furthermore, this paper analyses the legal challenges of employing these technologies to influence or determine the content of local regulation and governance. It explores in particular three specific challenges: the disconnect between traditional administrative law frameworks and data-driven regulation and governance, the effects of the privatization of public services and citizen needs due to the growing outsourcing of smart cities technologies to private companies; and the limited transparency and accountability that characterizes data-driven administrative processes. This paper draws on a review of interdisciplinary literature on smart cities and offers illustrations of data-driven regulation and governance practices from different jurisdictions….(More)”.

Open data sharing and the Global South—Who benefits?


David Serwadda et al in Science: “A growing number of government agencies, funding organizations, and publishers are endorsing the call for increased data sharing, especially in biomedical research, many with an ultimate goal of open data. Open data is among the least restrictive forms of data sharing, in contrast to managed access mechanisms, which typically have terms of use and in some cases oversight by the data generators themselves. But despite an ethically sound rationale and growing support for open data sharing in many parts of the world, concerns remain, particularly among researchers in low- and middle-income countries (LMICs) in Africa, Latin America, and parts of Asia and the Middle East that comprise the Global South. Drawing on our perspective as researchers and ethicists working in the Global South, we see opportunities to improve community engagement, raise awareness, and build capacity, all toward improving research and data sharing involving researchers in LMICs…African scientists have expressed concern that open data compromises national ownership and reopens the gates for “parachute-research” (i.e., Northern researchers absconding with data to their home countries). Other LMIC researchers have articulated fears over free-riding scientists using the data collected by others for their own career advancement …(More)”

Do Academic Journals Favor Researchers from Their Own Institutions?


Yaniv Reingewertz and Carmela Lutmar at Harvard Business Review: “Are academic journals impartial? While many would suggest that academic journals work for the advancement of knowledge and science, we show this is not always the case. In a recent study, we find that two international relations (IR) journals favor articles written by authors who share the journal’s institutional affiliation. We term this phenomenon “academic in-group bias.”

In-group bias is a well-known phenomenon that is widely documented in the psychological literature. People tend to favor their group, whether it is their close family, their hometown, their ethnic group, or any other group affiliation. Before our study, the evidence regarding academic in-group bias was scarce, with only one studyfinding academic in-group bias in law journals. Studies from economics found mixedresults. Our paper provides evidence of academic in-group bias in IR journals, showing that this phenomenon is not specific to law. We also provide tentative evidence which could potentially resolve the conflict in economics, suggesting that these journals might also exhibit in-group bias. In short, we show that academic in-group bias is general in nature, even if not necessarily large in scope….(More)”.

Data Collaboratives can transform the way civil society organisations find solutions


Stefaan G. Verhulst at Disrupt & Innovate: “The need for innovation is clear: The twenty-first century is shaping up to be one of the most challenging in recent history. From climate change to income inequality to geopolitical upheaval and terrorism: the difficulties confronting International Civil Society Organisations (ICSOs) are unprecedented not only in their variety but also in their complexity. At the same time, today’s practices and tools used by ICSOs seem stale and outdated. Increasingly, it is clear, we need not only new solutions but new methods for arriving at solutions.

Data will likely become more central to meeting these challenges. We live in a quantified era. It is estimated that 90% of the world’s data was generated in just the last two years. We know that this data can help us understand the world in new ways and help us meet the challenges mentioned above. However, we need new data collaboration methods to help us extract the insights from that data.

UNTAPPED DATA POTENTIAL

For all of data’s potential to address public challenges, the truth remains that most data generated today is in fact collected by the private sector – including ICSOs who are often collecting a vast amount of data – such as, for instance, the International Committee of the Red Cross, which generates various (often sensitive) data related to humanitarian activities. This data, typically ensconced in tightly held databases toward maintaining competitive advantage or protecting from harmful intrusion, contains tremendous possible insights and avenues for innovation in how we solve public problems. But because of access restrictions and often limited data science capacity, its vast potential often goes untapped.

DATA COLLABORATIVES AS A SOLUTION

Data Collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas — including the private sector, government, and civil society — come together to exchange data and pool analytical expertise.

While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains. Importantly several ICSOs have started to collaborate with others around their own data and that of the private and public sector. For example:

  • Several civil society organisations, academics, and donor agencies are partnering in the Health Data Collaborative to improve the global data infrastructure necessary to make smarter global and local health decisions and to track progress against the Sustainable Development Goals (SDGs).
  • Additionally, the UN Office for the Coordination of Humanitarian Affairs (UNOCHA) built Humanitarian Data Exchange (HDX), a platform for sharing humanitarian from and for ICSOs – including Caritas, InterAction and others – donor agencies, national and international bodies, and other humanitarian organisations.

These are a few examples of Data Collaboratives that ICSOs are participating in. Yet, the potential for collaboration goes beyond these examples. Likewise, so do the concerns regarding data protection and privacy….(More)”.

The future of statistics and data science


Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.

Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.

Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.

First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.

Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….

Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.

Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….

 Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.

Data journalism and the ethics of publishing Twitter data


Matthew L. Williams at Data Driven Journalism: “Collecting and publishing data collected from social media sites such as Twitter are everyday practices for the data journalist. Recent findings from Cardiff University’s Social Data Science Lab question the practice of publishing Twitter content without seeking some form of informed consent from users beforehand. Researchers found that tweets collected around certain topics, such as those related to terrorism, political votes, changes in the law and health problems, create datasets that might contain sensitive content, such as extreme political opinion, grossly offensive comments, overly personal revelations and threats to life (both to oneself and to others). Handling these data in the process of analysis (such as classifying content as hateful and potentially illegal) and reporting has brought the ethics of using social media in social research and journalism into sharp focus.

Ethics is an issue that is becoming increasingly salient in research and journalism using social media data. The digital revolution has outpaced parallel developments in research governance and agreed good practice. Codes of ethical conduct that were written in the mid twentieth century are being relied upon to guide the collection, analysis and representation of digital data in the twenty-first century. Social media is particularly ethically challenging because of the open availability of the data (particularly from Twitter). Many platforms’ terms of service specifically state users’ data that are public will be made available to third parties, and by accepting these terms users legally consent to this. However, researchers and data journalists must interpret and engage with these commercially motivated terms of service through a more reflexive lens, which implies a context sensitive approach, rather than focusing on the legally permissible uses of these data.

Social media researchers and data journalists have experimented with data from a range of sources, including Facebook, YouTube, Flickr, Tumblr and Twitter to name a few. Twitter is by far the most studied of all these networks. This is because Twitter differs from other networks, such as Facebook, that are organised around groups of ‘friends’, in that it is more ‘open’ and the data (in part) are freely available to researchers. This makes Twitter a more public digital space that promotes the free exchange of opinions and ideas. Twitter has become the primary space for online citizens to publicly express their reaction to events of national significance, and also the primary source of data for social science research into digital publics.

The Twitter streaming API provides three levels of data access: the free random 1% that provides ~5M tweets daily and the random 10% and 100% (chargeable or free to academic researchers upon request). Datasets on social interactions of this scale, speed and ease of access have been hitherto unrealisable in the social sciences and journalism, and have led to a flood of journal articles and news pieces, many of which include tweets with full text content and author identity without informed consent. This is presumably because of Twitter’s ‘open’ nature, which leads to the assumption that ‘these are public data’ and using it does not require the rigor and scrutiny of an ethical oversight. Even when these data are scrutinised, journalists don’t need to be convinced by the ‘public data’ argument, due to the lack of a framework to evaluate the potential harms to users. The Social Data Science Lab takes a more ethically reflexive approach to the use of social media data in social research, and carefully considers users’ perceptions, online context and the role of algorithms in estimating potentially sensitive user characteristics.

recent Lab survey conducted into users’ perceptions of the use of their social media posts found the following:

  • 94% were aware that social media companies had Terms of Service
  • 65% had read the Terms of Service in whole or in part
  • 76% knew that when accepting Terms of Service they were giving permission for some of their information to be accessed by third parties
  • 80% agreed that if their social media information is used in a publication they would expect to be asked for consent
  • 90% agreed that if their tweets were used without their consent they should be anonymized…(More)”.

Can Crowdsourcing and Collaboration Improve the Future of Human Health?


Ben Wiegand at Scientific American: “The process of medical research has been likened to searching for a needle in a haystack. With the continued acceleration of novel science and health care technologies in areas like artificial intelligence, digital therapeutics and the human microbiome we have tremendous opportunity to search the haystack in new and exciting ways. Applying these high-tech advances to today’s most pressing health issues increases our ability to address the root cause of disease, intervene earlier and change the trajectory of human health.

Global crowdsourcing forums, like the Johnson & Johnson Innovation QuickFire Challenges, can be incredibly valuable tools for searching the “haystack.” An initiative of JLABS—the no-strings-attached incubators of Johnson & Johnson Innovation—these contests spur scientific diversity through crowdsourcing, inspiring and attracting fresh thinking. They seek to stimulate the global innovation ecosystem through funding, mentorship and access to resources that can kick-start breakthrough ideas.

Our most recent challenge, the Next-Gen Baby Box QuickFire Challenge, focused on updating the 80-year-old “Finnish baby box,” a free, government-issued maternity supply kit for new parents containing such essentials as baby clothing, bath and sleep supplies packaged in a sleep-safe cardboard box. Since it first launched, the baby box has, together with increased use of maternal healthcare services early in pregnancy, helped to significantly reduce the Finnish infant mortality rate from 65 in every 1,000 live births in the 1930s to 2.5 per 1,000 today—one of the lowest rates in the world.

Partnering with Finnish innovation and government groups, we set out to see if updating this popular early parenting tool with the power of personalized health technology might one day impact Finland’s unparalleled high rate of type 1 diabetes. We issued the call globally to help create “the Baby Box of the future” as part of the Janssen and Johnson & Johnson Innovation vision to create a world without disease by accelerating science and delivering novel solutions to prevent, intercept and cure disease. The contest brought together entrepreneurs, researchers and innovators to focus on ideas with the potential to promote child health, detect childhood disease earlier and facilitate healthy parenting.

Incentive challenges like this award participants who have most effectively met a predefined objective or task. It’s a concept that emerged well before our time—as far back as the 18th century—from Napoleon’s Food Preservation Prize, meant to find a way to keep troops fed during battle, to the Longitude Prize for improved marine navigation.

Research shows that prize-based challenges that attract talent across a wide range of disciplines can generate greater risk-taking and yield more dramatic solutions….(More)”.