Regulation of Big Data: Perspectives on Strategy, Policy, Law and Privacy


Paper by Pompeu CasanovasLouis de KokerDanuta Mendelson and David Watts: “…presents four complementary perspectives stemming from governance, law, ethics, and computer science. Big, Linked, and Open Data constitute complex phenomena whose economic and political dimensions require a plurality of instruments to enhance and protect citizens’ rights. Some conclusions are offered in the end to foster a more general discussion.

This article contends that the effective regulation of Big Data requires a combination of legal tools and other instruments of a semantic and algorithmic nature. It commences with a brief discussion of the concept of Big Data and views expressed by Australian and UK participants in a study of Big Data use in a law enforcement and national security perspective. The second part of the article highlights the UN’s Special Rapporteur on the Right to Privacy interest in the themes and the focus of their new program on Big Data. UK law reforms regarding authorisation of warrants for the exercise of bulk data powers is discussed in the third part. Reflecting on these developments, the paper closes with an exploration of the complex relationship between law and Big Data and the implications for regulation and governance of Big Data….(More)”.

Open Data’s Effect on Food Security


Jeremy de Beer, Jeremiah Baarbé, and Sarah Thuswaldner at Open AIR: “Agricultural data is a vital resource in the effort to address food insecurity. This data is used across the food-production chain. For example, farmers rely on agricultural data to decide when to plant crops, scientists use data to conduct research on pests and design disease resistant plants, and governments make policy based on land use data. As the value of agricultural data is understood, there is a growing call for governments and firms to open their agricultural data.

Open data is data that anyone can access, use, or share. Open agricultural data has the potential to address food insecurity by making it easier for farmers and other stakeholders to access and use the data they need. Open data also builds trust and fosters collaboration among stakeholders that can lead to new discoveries to address the problems of feeding a growing population.

 

A network of partnerships is growing around agricultural data research. The Open African Innovation Research (Open AIR) network is researching open agricultural data in partnership with the Plant Phenotyping and Imaging Research Centre (P2IRC) and the Global Institute for Food Security (GIFS). This research builds on a partnership with the Global Open Data for Agriculture and Nutrition (GODAN) and they are exploring partnerships with Open Data for Development (OD4D) and other open data organizations.

…published two works on open agricultural data. Published in partnership with GODAN, “Ownership of Open Data” describes how intellectual property law defines ownership rights in data. Firms that collect data own the rights to data, which is a major factor in the power dynamics of open data. In July, Jeremiah Baarbé and Jeremy de Beer will be presenting “A Data Commons for Food Security” …The paper proposes a licensing model that allows farmers to benefit from the datasets to which they contribute. The license supports SME data collectors, who need sophisticated legal tools; contributors, who need engagement, privacy, control, and benefit sharing; and consumers who need open access….(More)“.

Teaching machines to understand – and summarize – text


 and  in The Conversation: “We humans are swamped with text. It’s not just news and other timely information: Regular people are drowning in legal documents. The problem is so bad we mostly ignore it. Every time a person uses a store’s loyalty rewards card or connects to an online service, his or her activities are governed by the equivalent of hundreds of pages of legalese. Most people pay no attention to these massive documents, often labeled “terms of service,” “user agreement” or “privacy policy.”

These are just part of a much wider societal problem of information overload. There is so much data stored – exabytes of it, as much stored as has ever been spoken by people in all of human history – that it’s humanly impossible to read and interpret everything. Often, we narrow down our pool of information by choosing particular topics or issues to pay attention to. But it’s important to actually know the meaning and contents of the legal documents that govern how our data is stored and who can see it.

As computer science researchers, we are working on ways artificial intelligence algorithms could digest these massive texts and extract their meaning, presenting it in terms regular people can understand….

Examining privacy policies

A modern internet-enabled life today more or less requires trusting for-profit companies with private information (like physical and email addresses, credit card numbers and bank account details) and personal data (photos and videos, email messages and location information).

These companies’ cloud-based systems typically keep multiple copies of users’ data as part of backup plans to prevent service outages. That means there are more potential targets – each data center must be securely protected both physically and electronically. Of course, internet companies recognize customers’ concerns and employ security teams to protect users’ data. But the specific and detailed legal obligations they undertake to do that are found in their impenetrable privacy policies. No regular human – and perhaps even no single attorney – can truly understand them.

In our study, we ask computers to summarize the terms and conditions regular users say they agree to when they click “Accept” or “Agree” buttons for online services. We downloaded the publicly available privacy policies of various internet companies, including Amazon AWS, Facebook, Google, HP, Oracle, PayPal, Salesforce, Snapchat, Twitter and WhatsApp….

Our software examines the text and uses information extraction techniques to identify key information specifying the legal rights, obligations and prohibitions identified in the document. It also uses linguistic analysis to identify whether each rule applies to the service provider, the user or a third-party entity, such as advertisers and marketing companies. Then it presents that information in clear, direct, human-readable statements….(More)”

Slave to the Algorithm? Why a ‘Right to Explanation’ is Probably Not the Remedy You are Looking for


Paper by Lilian Edwards and Michael Veale: “Algorithms, particularly of the machine learning (ML) variety, are increasingly consequential to individuals’ lives but have caused a range of concerns evolving mainly around unfairness, discrimination and opacity. Transparency in the form of a “right to an explanation” has emerged as a compellingly attractive remedy since it intuitively presents as a means to “open the black box”, hence allowing individual challenge and redress, as well as possibilities to foster accountability of ML systems. In the general furore over algorithmic bias and other issues laid out in section 2, any remedy in a storm has looked attractive.

However, we argue that a right to an explanation in the GDPR is unlikely to be a complete remedy to algorithmic harms, particularly in some of the core “algorithmic war stories” that have shaped recent attitudes in this domain. We present several reasons for this conclusion. First (section 3), the law is restrictive on when any explanation-related right can be triggered, and in many places is unclear, or even seems paradoxical. Second (section 4), even were some of these restrictions to be navigated, the way that explanations are conceived of legally — as “meaningful information about the logic of processing” — is unlikely to be provided by the kind of ML “explanations” computer scientists have been developing. ML explanations are restricted both by the type of explanation sought, the multi-dimensionality of the domain and the type of user seeking an explanation. However (section 5) “subject-centric” explanations (SCEs), which restrict explanations to particular regions of a model around a query, show promise for interactive exploration, as do pedagogical rather than decompositional explanations in dodging developers’ worries of IP or trade secrets disclosure.

As an interim conclusion then, while convinced that recent research in ML explanations shows promise, we fear that the search for a “right to an explanation” in the GDPR may be at best distracting, and at worst nurture a new kind of “transparency fallacy”. However, in our final section, we argue that other parts of the GDPR related (i) to other individual rights including the right to erasure (“right to be forgotten”) and the right to data portability and (ii) to privacy by design, Data Protection Impact Assessments and certification and privacy seals, may have the seeds of building a better, more respectful and more user-friendly algorithmic society….(More)”

Facebook Disaster Maps


Molly Jackman et al at Facebook: “After a natural disaster, humanitarian organizations need to know where affected people are located, what resources are needed, and who is safe. This information is extremely difficult and often impossible to capture through conventional data collection methods in a timely manner. As more people connect and share on Facebook, our data is able to provide insights in near-real time to help humanitarian organizations coordinate their work and fill crucial gaps in information during disasters. This morning we announced a Facebook disaster map initiative to help organizations address the critical gap in information they often face when responding to natural disasters.

Facebook disaster maps provide information about where populations are located, how they are moving, and where they are checking in safe during a natural disaster. All data is de-identified and aggregated to a 360 square meter tile or local administrative boundaries (e.g. census boundaries). [1]

This blog describes the disaster maps datasets, how insights are calculated, and the steps taken to ensure that we’re preserving privacy….(More)”.

Citizenship office wants ‘Emma’ to help you


 at FedScoop: “U.S. Citizenship and Immigration Services unveiled a new virtual assistant live-chat service, known as “Emma,” to assist customers and website visitors in finding information and answering questions in a timely and efficient fashion.

The agency told FedScoop that it built the chatbot with the help of Verizon and artificial intelligence interface company Next IT. The goal  is “to address the growing need for customers to obtain information quicker and through multiple access points, USCIS broadened the traditional call center business model to include web-based self-help tools,” the agency says.

USCIS, a component agency of the Department of Homeland Security, says it receives nearly 14 million calls relating to immigration every year. The virtual assistant and live-chat services are aimed at becoming the first line of help available to users of USCIS.gov who might have trouble finding answers by themselves.

The bot greets customers when they enter the website, answers basic questions via live chat and supplies additional information in both English and Spanish. As a result, the amount of time customers spend searching for information on the website is greatly reduced, according to USCIS. Because the virtual assistant is embedded within the website, it can rapidly provide relevant information that may have been difficult to access manually.

The nature of the bot lends itself to potential encounters with personally identifiable information (PII), or PII, of the customers it interacts with. Because of this, USCIS recently conducted a privacy impact assessment (PIA).

Much of the assessment revolved around accuracy and the security of information that Emma could potentially encounter in a customer interaction. For the most part, the chat bot doesn’t require customers to submit personal information. Instead, it draws its responses from content already available on USCIS.gov, relative to the amount of information that users choose to provide. Answers are, according to the PIA, verified by thorough and frequent examination of all content posted to the site.

According to USCIS, the Emma will delete all chat logs — and therefore all PII — immediately after the customer ends the chat session. Should a customer reach a question that it can’t answer effectively and choose to continue the session with an agent in a live chat, the bot will ask for the preferred language (English or Spanish), the general topic of conversation, short comments on why the customer wishes to speak with a live agent, and the case on file and receipt number.

This information would then be transferred to the live agent. All other sensitive information entered, such as Social Security numbers or receipt numbers, would then be automatically masked in the subsequent transfer to the live agent…(More)”

Internet of Things: Status and implications of an increasingly connected world


GAO Technology Assessment: “The Internet of Things (IoT) refers to the technologies and devices that sense information and communicate it to the Internet or other networks and, in some cases, act on that information. These “smart” devices are increasingly being used to communicate and process quantities and types of information that have never been captured before and respond automatically to improve industrial processes, public services, and the well-being of individual consumers. For example, a “connected” fitness tracker can monitor a user’s vital statistics, and store the information on a smartphone. A “smart” tractor can use GPS-based driving guidance to maximize crop planting or harvesting. Electronic processors and sensors have become smaller and less costly, which makes it easier to equip devices with IoT capabilities. This is fueling the global proliferation of connected devices, allowing new technologies to be embedded in millions of everyday products. The IoT’s rapid emergence brings the promise of important new benefits, but also presents potential challenges such as the following:

  • Information security. The IoT brings the risks inherent in potentially unsecured information technology systems into homes, factories, and communities. IoT devices, networks, or the cloud servers where they store data can be compromised in a cyberattack. For example, in 2016, hundreds of thousands of weakly-secured IoT devices were accessed and hacked, disrupting traffic on the Internet.
  • Privacy. Smart devices that monitor public spaces may collect information about individuals without their knowledge or consent. For example, fitness trackers link the data they collect to online user accounts, which generally include personally identifiable information, such as names, email addresses, and dates of birth. Such information could be used in ways that the consumer did not anticipate. For example, that data could be sold to companies to target consumers with advertising or to determine insurance rates.
  • Safety. Researchers have demonstrated that IoT devices such as connected automobiles and medical devices can be hacked, potentially endangering the health and safety of their owners. For example, in 2015, hackers gained remote access to a car through its connected entertainment system and were able to cut the brakes and disable the transmission.
  • Standards. IoT devices and systems must be able to communicate easily. Technical standards to enable this communication will need to be developed and implemented effectively.
  • Economic issues. While impacts such as positive growth for industries that can use the IoT to reduce costs and provide better services to customers are likely, economic disruptions are also possible, such as reducing the need for certain types of businesses and jobs that rely on individual interventions, including assembly line work or commercial vehicle deliveries…(More)”

ALTwitter


ALTwitter” – as in the alternate Twitter is the profiles of the Members of European Parliaments built on their Twitter metadata. In spite of the risks and challenges associated with the privacy of ineffectively regulated metadata, the beauty of the metadata which everyone should appreciate lies in its brevity and flexibility.

When you navigate to the profiles of the members of the parliament listed below, you will notice that these profiles give the essence of their interaction with Twitter and the data that they generate there. Without going through all their tweets, one can learn their areas/topics that they work, the device/mediums they use, the type of websites they refer, their sleeping/activity pattern, etc. The amount insight that can be derived from these metadata is indeed more interesting. We intend to present such artifacts in a separate blog post soon.

This open source project is a part of #hakunametadata series (with the earlier module on browsing metadata) is educate about the immense amount of information contained in the metadata that we generate by our day-to-day internet activities. Every bit of data used for this project is from the publically available information on Twitter. Furthermore, this project will be updated periodically and automatically to track the changes.”…(More)”

Big Data Science: Opportunities and Challenges to Address Minority Health and Health Disparities in the 21st Century


Xinzhi Zhang et al in Ethnicity and Disease: “Addressing minority health and health disparities has been a missing piece of the puzzle in Big Data science. This article focuses on three priority opportunities that Big Data science may offer to the reduction of health and health care disparities. One opportunity is to incorporate standardized information on demographic and social determinants in electronic health records in order to target ways to improve quality of care for the most disadvantaged popula­tions over time. A second opportunity is to enhance public health surveillance by linking geographical variables and social determinants of health for geographically defined populations to clinical data and health outcomes. Third and most impor­tantly, Big Data science may lead to a better understanding of the etiology of health disparities and understanding of minority health in order to guide intervention devel­opment. However, the promise of Big Data needs to be considered in light of significant challenges that threaten to widen health dis­parities. Care must be taken to incorporate diverse populations to realize the potential benefits. Specific recommendations include investing in data collection on small sample populations, building a diverse workforce pipeline for data science, actively seeking to reduce digital divides, developing novel ways to assure digital data privacy for small populations, and promoting widespread data sharing to benefit under-resourced minority-serving institutions and minority researchers. With deliberate efforts, Big Data presents a dramatic opportunity for re­ducing health disparities but without active engagement, it risks further widening them….(More)”

How Your Digital Helper May Undermine Your Welfare, and Our Democracy


Essay by Maurice E. Stucke and Ariel Ezrachi: “All you need to do is say,” a recent article proclaimed, “’I want a beer’ and Alexa will oblige. The future is now.” Advances in technology have seemingly increased our choices and opened markets to competition. As we migrate from brick-and-mortar shops to online commerce, we seemingly are getting more of what we desire at better prices and quality. And yet, behind the competitive façade, a more complex reality exists. We explore in our book “Virtual Competition” several emerging threats, namely algorithmic collusion, behavioural discrimination and abuses by dominant super-platforms. But the harm is not just economic. The potential anticompetitive consequences go beyond our pocketbooks. The toll will likely be on our privacy, well-being and democracy.

To see why, this Essay examines the emerging frontier of digital personal assistants. These helpers are being developed by the leading online platforms: Google Assistant, Apple’s Siri, Facebook’s M, and Amazon’s Alexa-powered Echo. These super-platforms are heavily investing to improve their offerings. For those of us who grew up watching The Jetsons, the prospect of our own personal helper might seem marvelous. And yet, despite their promise, can personalized digital assistants actually reduce our welfare? Might their rise reduce the number of gateways to the digital world, increase the market power of a few firms, and limit competition? And if so, what are the potential social, political, and economic concerns?

Our Essay seeks to address these questions. We show how network effects, big data and big analytics will likely undermine attempts to curtail a digital assistant’s power, and will likely allow it to operate below the regulatory and antitrust radar screens. As a result, rather than advance our overall welfare, these digital assistants – if left to their own devices – can undermine our welfare….(More)”