Gabrielle Berman andKerry Albright at UNICEF: “In an era of increasing dependence on data science and big data, the voices of one set of major stakeholders – the world’s children and those who advocate on their behalf – have been largely absent. A recent paper estimates one in three global internet users is a child, yet there has been little rigorous debate or understanding of how to adapt traditional, offline ethical standards for research involving data collection from children, to a big data, online environment (Livingstone et al., 2015). This paper argues that due to the potential for severe, long-lasting and differential impacts on children, child rights need to be firmly integrated onto the agendas of global debates about ethics and data science. The authors outline their rationale for a greater focus on child rights and ethics in data science and suggest steps to move forward, focusing on the various actors within the data chain including data generators, collectors, analysts and end-users. It concludes by calling for a much stronger appreciation of the links between child rights, ethics and data science disciplines and for enhanced discourse between stakeholders in the data chain, and those responsible for upholding the rights of children, globally….(More)”.
Research data infrastructures in the UK
The Open Research Data Task Force : “This report is intended to inform the work of the Open Research Data Task Force, which has been established with the aim of building on the principles set out in Open Research Data Concordat (published in July 2016) to co-ordinate creation of a roadmap to develop the infrastructure for open research data across the UK. As an initial contribution to that work, the report provides an outline of the policy and service infrastructure in the UK as it stands in the first half of 2017, including some comparisons with other countries; and it points to some key areas and issues which require attention. It does not seek to identify possible courses of action, nor even to suggest priorities the Task Force might consider in creating its final report to be published in 2018. That will be the focus of work for the Task Force over the next few months.
Why is this important?
The digital revolution continues to bring fundamental changes to all aspects of research: how it is conducted, the findings that are produced, and how they are interrogated and transmitted not only within the research community but more widely. We are as yet still in the early stages of a transformation in which progress is patchy across the research community, but which has already posed significant challenges for research funders and institutions, as well as for researchers themselves. Research data is at the heart of those challenges: not simply the datasets that provide the core of the evidence analysed in scholarly publications, but all the data created and collected throughout the research process. Such data represents a potentially-valuable resource for people and organisations in the commercial, public and voluntary sectors, as well as for researchers. Access to such data, and more general moves towards open science, are also critically-important in ensuring that research is reproducible, and thus in sustaining public confidence in the work of the research community. But effective use of research data depends on an infrastructure – of hardware, software and services, but also of policies, organisations and individuals operating at various levels – that is as yet far from fully-formed. The exponential increases in volumes of data being generated by researchers create in themselves new demands for storage and computing power. But since the data is characterised more by heterogeneity then by uniformity, development of the infrastructure to manage it involves a complex set of requirements in preparing, collecting, selecting, analysing, processing, storing and preserving that data throughout its life cycle.
Over the past decade and more, there have been many initiatives on the part of research institutions, funders, and members of the research community at local, national and international levels to address some of these issues. Diversity is a key feature of the landscape, in terms of institutional types and locations, funding regimes, and nature and scope of partnerships, as well as differences between disciplines and subject areas. Hence decision-makers at various levels have fostered via their policies and strategies many community-organised developments, as well as their own initiatives and services. Significant progress has been achieved as a result, through the enthusiasm and commitment of key organisations and individuals. The less positive features have been a relative lack of harmonisation or consolidation, and there is an increasing awareness of patchiness in provision, with gaps, overlaps and inconsistencies. This is not surprising, since policies, strategies and services relating to research data necessarily affect all aspects of support for the diverse processes of research itself. Developing new policies and infrastructure for research data implies significant re-thinking of structures and regimes for supporting, fostering and promoting research itself. That in turn implies taking full account of widely-varying characteristics and needs of research of different kinds, while also keeping in clear view the benefits to be gained from better management of research data, and from greater openness in making data accessible for others to re-use for a wide range of different purposes….(More)”.
Using Collaboration to Harness Big Data for Social Good
Jake Porway at SSIR: “These days, it’s hard to get away from the hype around “big data.” We read articles about how Silicon Valley is using data to drive everything from website traffic to autonomous cars. We hear speakers at social sector conferences talk about how nonprofits can maximize their impact by leveraging new sources of digital information like social media data, open data, and satellite imagery.
Braving this world can be challenging, we know. Creating a data-driven organization can require big changes in culture and process. Some nonprofits, like Crisis Text Line and Watsi, started off boldly by building their own data science teams. But for the many other organizations wondering how to best use data to advance their mission, we’ve found that one ingredient works better than all the software and tech that you can throw at a problem: collaboration.
As a nonprofit dedicated to applying data science for social good, DataKind has run more than 200 projects in collaboration with other nonprofits worldwide by connecting them to teams of volunteer data scientists. What do the most successful ones have in common? Strong collaborations on three levels: with data science experts, within the organization itself, and across the nonprofit sector as a whole.
1. Collaborate with data science experts to define your project. As we often say, finding problems can be harder than finding solutions. ….
2. Collaborate across your organization to “build with, not for.” Our projects follow the principles of human-centered design and the philosophy pioneered in the civic tech world of “design with, not for.” ….
3. Collaborate across your sector to move the needle. Many organizations think about building data science solutions for unique challenges they face, such as predicting the best location for their next field office. However, most of us are fighting common causes shared by many other groups….
By focusing on building strong collaborations on these three levels—with data experts, across your organization, and across your sector—you’ll go from merely talking about big data to making big impact….(More).
Detecting riots with Twitter
Cardiff University News: “An analysis of data taken from the London riots in 2011 showed that computer systems could automatically scan through Twitter and detect serious incidents, such as shops being broken in to and cars being set alight, before they were reported to the Metropolitan Police Service.
The computer system could also discern information about where the riots were rumoured to take place and where groups of youths were gathering. The new research, published in the peer-review journal ACM Transactions on Internet Technology, showed that on average the computer systems could pick up on disruptive events several minutes before officials and over an hour in some cases.
“Antagonistic narratives and cyber hate”
The researchers believe that their work could enable police officers to better manage and prepare for both large and small scale disruptive events.
Co-author of the study Dr Pete Burnap, from Cardiff University’s School of Computer Science and Informatics, said: “We have previously used machine-learning and natural language processing on Twitter data to better understand online deviance, such as the spread of antagonistic narratives and cyber hate…”
“We will never replace traditional policing resource on the ground but we have demonstrated that this research could augment existing intelligence gathering and draw on new technologies to support more established policing methods.”
Scientists are continually looking to the swathes of data produced from Twitter, Facebook and YouTube to help them to detect events in real-time.
Estimates put social media membership at approximately 2.5 billion non-unique users, and the data produced by these users have been used to predict elections, movie revenues and even the epicentre of earthquakes.
In their study the research team analysed 1.6m tweets relating to the 2011 riots in England, which began as an isolated incident in Tottenham on August 6 but quickly spread across London and to other cities in England, giving rise to looting, destruction of property and levels of violence not seen in England for more than 30 years.
Machine-learning algorithms
The researchers used a series of machine-learning algorithms to analyse each of the tweets from the dataset, taking into account a number of key features such as the time they were posted, the location where they were posted and the content of the tweet itself.
Results showed that the machine-learning algorithms were quicker than police sources in all but two of the disruptive events reported…(More)”.
Examining the Mistrust of Science
Proceedings of a National Academies Workshop: “The Government-University-Industry Research Roundtable held a meeting on February 28 and March 1, 2017, to explore trends in public opinion of science, examine potential sources of mistrust, and consider ways that cross-sector collaboration between government, universities, and industry may improve public trust in science and scientific institutions in the future. The keynote address on February 28 was given by Shawn Otto, co-founder and producer of the U.S. Presidential Science Debates and author of The War on Science.
“There seems to be an erosion of the standing and understanding of science and engineering among the public,” Otto said. “People seem much more inclined to reject facts and evidence today than in the recent past. Why could that be?” Otto began exploring that question after the candidates in the 2008 presidential election declined an invitation to debate science-driven policy issues and instead chose to debate faith and values.
“Wherever the people are well-informed, they can be trusted with their own government,” wrote Thomas Jefferson. Now, some 240 years later, science is so complex that it is difficult even for scientists and engineers to understand the science outside of their particular fields. Otto argued,
“The question is, are people still well-enough informed to be trusted with their own government? Of the 535 members of Congress, only 11—less than 2 percent—have a professional background in science or engineering. By contrast, 218—41 percent—are lawyers. And lawyers approach a problem in a fundamentally different way than a scientist or engineer. An attorney will research both sides of a question, but only so that he or she can argue against the position that they do not support. A scientist will approach the question differently, not starting with a foregone conclusion and arguing towards it, but examining both sides of the evidence and trying to make a fair assessment.”
According to Otto, anti-science positions are now acceptable in public discourse, in Congress, state legislatures and city councils, in popular culture, and in presidential politics. Discounting factually incorrect statements does not necessarily reshape public opinion in the way some trust it to. What is driving this change? “Science is never partisan, but science is always political,” said Otto. “Science takes nothing on faith; it says, ‘show me the evidence and I’ll judge for myself.’ But the discoveries that science makes either confirm or challenge somebody’s cherished beliefs or vested economic or ideological interests. Science creates knowledge—knowledge is power, and that power is political.”…(More)”.
Rawification and the careful generation of open government data
Jérôme Denis and Samuel Goëta in Social Studies of Science: “Drawing on a two-year ethnographic study within several French administrations involved in open data programs, this article aims to investigate the conditions of the release of government data – the rawness of which open data policies require. This article describes two sets of phenomena. First, far from being taken for granted, open data emerge in administrations through a progressive process that entails uncertain collective inquiries and extraction work. Second, the opening process draws on a series of transformations, as data are modified to satisfy an important criterion of open data policies: the need for both human and technical intelligibility. There are organizational consequences of these two points, which can notably lead to the visibilization or the invisibilization of data labour. Finally, the article invites us to reconsider the apparent contradiction between the process of data release and the existence of raw data. Echoing the vocabulary of one of the interviewees, the multiple operations can be seen as a ‘rawification’ process by which open government data are carefully generated. Such a notion notably helps to build a relational model of what counts as data and what counts as work….(More)”.
Regulation of Big Data: Perspectives on Strategy, Policy, Law and Privacy
Paper by Pompeu Casanovas, Louis de Koker, Danuta Mendelson and David Watts: “…presents four complementary perspectives stemming from governance, law, ethics, and computer science. Big, Linked, and Open Data constitute complex phenomena whose economic and political dimensions require a plurality of instruments to enhance and protect citizens’ rights. Some conclusions are offered in the end to foster a more general discussion.
This article contends that the effective regulation of Big Data requires a combination of legal tools and other instruments of a semantic and algorithmic nature. It commences with a brief discussion of the concept of Big Data and views expressed by Australian and UK participants in a study of Big Data use in a law enforcement and national security perspective. The second part of the article highlights the UN’s Special Rapporteur on the Right to Privacy interest in the themes and the focus of their new program on Big Data. UK law reforms regarding authorisation of warrants for the exercise of bulk data powers is discussed in the third part. Reflecting on these developments, the paper closes with an exploration of the complex relationship between law and Big Data and the implications for regulation and governance of Big Data….(More)”.
Teaching machines to understand – and summarize – text
The Conversation: “We humans are swamped with text. It’s not just news and other timely information: Regular people are drowning in legal documents. The problem is so bad we mostly ignore it. Every time a person uses a store’s loyalty rewards card or connects to an online service, his or her activities are governed by the equivalent of hundreds of pages of legalese. Most people pay no attention to these massive documents, often labeled “terms of service,” “user agreement” or “privacy policy.”
These are just part of a much wider societal problem of information overload. There is so much data stored – exabytes of it, as much stored as has ever been spoken by people in all of human history – that it’s humanly impossible to read and interpret everything. Often, we narrow down our pool of information by choosing particular topics or issues to pay attention to. But it’s important to actually know the meaning and contents of the legal documents that govern how our data is stored and who can see it.
As computer science researchers, we are working on ways artificial intelligence algorithms could digest these massive texts and extract their meaning, presenting it in terms regular people can understand….
Examining privacy policies
A modern internet-enabled life today more or less requires trusting for-profit companies with private information (like physical and email addresses, credit card numbers and bank account details) and personal data (photos and videos, email messages and location information).
These companies’ cloud-based systems typically keep multiple copies of users’ data as part of backup plans to prevent service outages. That means there are more potential targets – each data center must be securely protected both physically and electronically. Of course, internet companies recognize customers’ concerns and employ security teams to protect users’ data. But the specific and detailed legal obligations they undertake to do that are found in their impenetrable privacy policies. No regular human – and perhaps even no single attorney – can truly understand them.
In our study, we ask computers to summarize the terms and conditions regular users say they agree to when they click “Accept” or “Agree” buttons for online services. We downloaded the publicly available privacy policies of various internet companies, including Amazon AWS, Facebook, Google, HP, Oracle, PayPal, Salesforce, Snapchat, Twitter and WhatsApp….
Our software examines the text and uses information extraction techniques to identify key information specifying the legal rights, obligations and prohibitions identified in the document. It also uses linguistic analysis to identify whether each rule applies to the service provider, the user or a third-party entity, such as advertisers and marketing companies. Then it presents that information in clear, direct, human-readable statements….(More)”
Artificial intelligence can predict which congressional bills will pass
Other algorithms have predicted whether a bill will survive a congressional committee, or whether the Senate or House of Representatives will vote to approve it—all with varying degrees of success. But John Nay, a computer scientist and co-founder of Skopos Labs, a Nashville-based AI company focused on studying policymaking, wanted to take things one step further. He wanted to predict whether an introduced bill would make it all the way through both chambers—and precisely what its chances were.
Nay started with data on the 103rd Congress (1993–1995) through the 113th Congress (2013–2015), downloaded from a legislation-tracking website call GovTrack. This included the full text of the bills, plus a set of variables, including the number of co-sponsors, the month the bill was introduced, and whether the sponsor was in the majority party of their chamber. Using data on Congresses 103 through 106, he trained machine-learning algorithms—programs that find patterns on their own—to associate bills’ text and contextual variables with their outcomes. He then predicted how each bill would do in the 107th Congress. Then, he trained his algorithms on Congresses 103 through 107 to predict the 108th Congress, and so on.
Nay’s most complex machine-learning algorithm combined several parts. The first part analyzed the language in the bill. It interpreted the meaning of words by how they were embedded in surrounding words. For example, it might see the phrase “obtain a loan for education” and assume “loan” has something to do with “obtain” and “education.” A word’s meaning was then represented as a string of numbers describing its relation to other words. The algorithm combined these numbers to assign each sentence a meaning. Then, it found links between the meanings of sentences and the success of bills that contained them. Three other algorithms found connections between contextual data and bill success. Finally, an umbrella algorithm used the results from those four algorithms to predict what would happen…. his program scored about 65% better than simply guessing that a bill wouldn’t pass, Nay reported last month in PLOS ONE…(More).
LSE launches crowdsourcing project inspiring millennials to shape Brexit
LSE Press Release: “A crowdsourcing project inspiring millennials in Britain and the EU to help shape the upcoming Brexit negotiations is being launched by the London School of Economics and Political Science (LSE) this week.
The social media-based project, which hopes to engage 3000 millennials aged 35 and under, kicks off on 23 June, the first anniversary of the life-changing vote to take Britain out of the EU.
One of the Generation Brexit project leaders, Dr Jennifer Jackson-Preece from LSE’s European Institute, said the online platform would give a voice to British and European millennials on the future of Europe in the Brexit negotiations and beyond.
She said: “We’re going to invite millennials from across the UK and Europe to debate, decide and draft policy proposals that will be sent to parliaments in Westminster and Strasbourg during the negotiations.”
Another project leader, Dr Roch Dunin-Wąsowicz, said the pan-European project would seek views from a whole cross section of millennials, including Leavers, Remainers, left and right-wingers, European federalists and nationalists.
“We want to come up with millennial proposals for a mutually beneficial relationship, reflecting the diverse political, cultural, religious and economic backgrounds in the UK and EU.
“We are especially keen to engage the forgotten, the apolitical and the apathetic – for whom Brexit has become a moment of political awakening,” he said.
Generation Brexit follows on the heels of LSE’s Constitution UK crowdsourcing project in 2015, which broke new ground in galvanising people around the country to help shape Britain’s first constitution. The 10-week internet project signed up 1500 people from all corners of the UK to debate how the country should be governed.
Dr Manmit Bhambra, also working on the project, said the success of the Constitution UK platform had laid the foundation for Generation Brexit, with LSE hoping to double the numbers and sign up 3000 participants, split equally between Britain and Europe.
The project can be accessed at www.generationbrexit.org and all updates will be available on Twitter @genbrexit & @lsebrexitvote with the hashtag #GenBrexit, and on facebook.com/GenBrexit… (More)”.