Protecting One’s Own Privacy in a Big Data Economy


Anita L. Allen in the Harvard Law Review Forum: “Big Data is the vast quantities of information amenable to large-scale collection, storage, and analysis. Using such data, companies and researchers can deploy complex algorithms and artificial intelligence technologies to reveal otherwise unascertained patterns, links, behaviors, trends, identities, and practical knowledge. The information that comprises Big Data arises from government and business practices, consumer transactions, and the digital applications sometimes referred to as the “Internet of Things.” Individuals invisibly contribute to Big Data whenever they live digital lifestyles or otherwise participate in the digital economy, such as when they shop with a credit card, get treated at a hospital, apply for a job online, research a topic on Google, or post on Facebook.

Privacy advocates and civil libertarians say Big Data amounts to digital surveillance that potentially results in unwanted personal disclosures, identity theft, and discrimination in contexts such as employment, housing, and financial services. These advocates and activists say typical consumers and internet users do not understand the extent to which their activities generate data that is being collected, analyzed, and put to use for varied governmental and business purposes.

I have argued elsewhere that individuals have a moral obligation to respect not only other people’s privacy but also their own. Here, I wish to comment first on whether the notion that individuals have a moral obligation to protect their own information privacy is rendered utterly implausible by current and likely future Big Data practices; and on whether a conception of an ethical duty to self-help in the Big Data context may be more pragmatically framed as a duty to be part of collective actions encouraging business and government to adopt more robust privacy protections and data security measures….(More)”

The social data revolution will be crowdsourced


Nicholas B. Adams at SSRC Parameters: “It is now abundantly clear to librarians, archivists, computer scientists, and many social scientists that we are in a transformational age. If we can understand and measure meaning from all of these data describing so much of human activity, we will finally be able to test and revise our most intricate theories of how the world is socially constructed through our symbolic interactions….

We cannot write enough rules to teach a computer to read like us. And because the social world is not a game per se, we can’t design a reinforcement-learning scenario teaching a computer to “score points” and just ‘win.’ But AlphaGo’s example does show a path forward. Recall that much of AlphaGo’s training came in the form of supervised machine learning, where humans taught it to play like them by showing the machine how human experts played the game. Already, humans have used this same supervised learning approach to teach computers to classify images, identify parts of speech in text, or categorize inventories into various bins. Without writing any rules, simply by letting the computer guess, then giving it human-generated feedback about whether it guessed right or wrong, humans can teach computers to label data as we do. The problem is (or has been): humans label textual data slowly—very, very slowly. So, we have generated precious little data with which to teach computers to understand natural language as we do. But that is going to change….

The single greatest factor dilating the duration of such large-scale text-labeling projects has been workforce training and turnover. ….The key to organizing work for the crowd, I had learned from talking to computer scientists, was task decomposition. The work had to be broken down into simple pieces that any (moderately intelligent) person could do through a web interface without requiring face-to-face training. I knew from previous experiments with my team that I could not expect a crowd worker to read a whole article, or to know our whole conceptual scheme defining everything of potential interest in those articles. Requiring either or both would be asking too much. But when I realized that my conceptual scheme could actually be treated as multiple smaller conceptual schemes, the idea came to me: Why not have my RAs identify units of text that corresponded with the units of analysis of my conceptual scheme? Then, crowd workers reading those much smaller units of text could just label them according to a smaller sub-scheme. Moreover, I came to realize, we could ask them leading questions about the text to elicit information about the variables and attributes in the scheme, so they wouldn’t have to memorize the scheme either. By having them highlight the words justifying their answers, they would be labeling text according to our scheme without any face-to-face training. Bingo….

This approach promises more, too. The databases generated by crowd workers, citizen scientists, and students can also be used to train machines to see in social data what we humans see comparatively easily. Just as AlphaGo learned from humans how to play a strategy game, our supervision can also help it learn to see the social world in textual or video data. The final products of social data analysis assembly lines, therefore, are not merely rich and massive databases allowing us to refine our most intricate, elaborate, and heretofore data-starved theories; they are also computer algorithms that will do most or all social data labeling in the future. In other words, whether we know it or not, we social scientists hold the key to developing artificial intelligences capable of understanding our social world….

At stake is a social science with the capacity to quantify and qualify so many of our human practices, from the quotidian to mythic, and to lead efforts to improve them. In decades to come, we may even be able to follow the path of other mature sciences (including physics, biology, and chemistry) and shift our focus toward engineering better forms of sociality. All the more so because it engages the public, a crowd-supported social science could enlist a new generation in the confident and competent re-construction of society….(More)”

Artificial Intelligence “Jolted by Success”


Steven Aftergood in SecrecyNews: “Since 2010, the field of artificial intelligence (AI) has been “jolted” by the “broad and unforeseen successes” of one of its component technologies, known as multi-layer neural networks, leading to rapid developments and new applications, according to a new study from the JASON scientific advisory panel.

The JASON panel reviewed the current state of AI research and its potential use by the Department of Defense. See Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD, JSR-16-Task-003, January 2017….

The JASON report distinguishes between artificial intelligence — referring to the ability of computers to perform particular tasks that humans do with their brains — and artificial general intelligence (AGI) — meaning a human-like ability to pursue long-term goals and exercise purposive behavior.

“Where AI is oriented around specific tasks, AGI seeks general cognitive abilities.” Recent progress in AI has not been matched by comparable advances in AGI. Sentient machines, let alone a revolt of robots against their creators, are still somewhere far over the horizon, and may be permanently in the realm of fiction.

While many existing DoD weapon systems “have some degree of ‘autonomy’ relying on the technologies of AI, they are in no sense a step–not even a small step–towards ‘autonomy’ in the sense of AGI, that is, the ability to set independent goals or intent,” the JASONs said.

“Indeed, the word ‘autonomy’ conflates two quite different meanings, one relating to ‘freedom of will or action’ (like humans, or as in AGI), and the other the much more prosaic ability to act in accordance with a possibly complex rule set based on possibly complex sensor input, as in the word ‘automatic’. In using a terminology like ‘autonomous weapons’, the DoD may, as an unintended consequence, enhance the public’s confusion on this point.”…

This week the Department of Defense announced the demonstration of swarms of “autonomous” micro-drones. “The micro-drones demonstrated advanced swarm behaviors such as collective decision-making, adaptive formation flying, and self-healing,” according to a January 9 news release.

A journalistic account of recent breakthroughs in the use of artificial intelligence for machine translation appeared in the New York Times Magazine last month. See “The Great A.I. Awakening” by Gideon Lewis-Kraus, December 14, 2016…(More)”

Crowdsourcing, Citizen Science, and Data-sharing


Sapien Labs: “The future of human neuroscience lies in crowdsourcing, citizen science and data sharing but it is not without its minefields.

A recent Scientific American article by Daniel Goodwin, “Why Neuroscience Needs Hackers,makes the case that neuroscience, like many fields today, is drowning in data, begging for application of advances in computer science like machine learning. Neuroscientists are able to gather realms of neural data, but often without big data mechanisms and frameworks to synthesize them.

The SA article describes the work of Sebastian Seung, a Princeton neuroscientist, who recently mapped the neural connections of the human retina from an “overwhelming mass” of electron microscopy data using state of the art A.I. and massive crowd-sourcing. Seung incorporated the A.I. into a game called “Eyewire” where 1,000s of volunteers scored points while improving the neural map.   Although the article’s title emphasizes advanced A.I., Dr. Seung’s experiment points even more to crowdsourcing and open science, avenues for improving research that have suddenly become easy and powerful with today’s internet. Eyewire perhaps epitomizes successful crowdsourcing — using an application that gathers, represents, and analyzes data uniformly according to researchers’ needs.

Crowdsourcing is seductive in its potential but risky for those who aren’t sure how to control it to get what they want. For researchers who don’t want to become hackers themselves, trying to turn the diversity of data produced by a crowd into conclusive results might seem too much of a headache to make it worthwhile. This is probably why the SA article title says we need hackers. The crowd is there but using it depends on innovative software engineering. A lot of researchers could really use software designed to flexibly support a diversity of crowdsourcing, some AI to enable things like crowd validation and big data tools.

The Potential

The SA article also points to Open BCI (brain-computer interface), mentioned here in other posts, as an example of how traditional divisions between institutional and amateur (or “citizen”) science are now crumbling; Open BCI is a community of professional and citizen scientists doing principled research with cheap, portable EEG-headsets producing professional research quality data. In communities of “neuro-hackers,” like NeurotechX, professional researchers, entrepreneurs, and citizen scientists are coming together to develop all kinds of applications, such as “telepathic” machine control, prostheses, and art. Other companies, like Neurosky sell EEG headsets and biosensors for bio-/neuro-feedback training and health-monitoring at consumer affordable pricing. (Read more in Citizen Science and EEG)

Tan Le, whose company Emotiv Lifesciences, also produces portable EEG head-sets, says, in an article in National Geographic, that neuroscience needs “as much data as possible on as many brains as possible” to advance diagnosis of conditions such as epilepsy and Alzheimer’s. Human neuroscience studies have typically consisted of 20 to 50 participants, an incredibly small sampling of a 7 billion strong humanity. For a single lab to collect larger datasets is difficult but with diverse populations across the planet real understanding may require data not even from thousands of brains but millions. With cheap mobile EEG-headsets, open-source software, and online collaboration, the potential for anyone can participate in such data collection is immense; the potential for crowdsourcing unprecedented. There are, however, significant hurdles to overcome….(More)”

Data capitalism is cashing in on our privacy . . . for now


John Thornhill in the Financial Times: “The buzz at last week’s Consumer Electronics Show in Las Vegas was all about connectivity and machine learning. …The primary effect of these consumer tech products seems limited — but we will need to pay increasing attention to the secondary consequences of these connected devices. They are just the most visible manifestation of a fundamental transformation that is likely to shape our societies far more than Brexit, Donald Trump or squabbles over the South China Sea. It concerns who collects, owns and uses data. The subject of data is so antiseptic that it seldom generates excitement. To make it sound sexy, some have described data as the “new oil”, fuelling our digital economies. In reality, it is likely to prove far more significant than that. Data are increasingly determining economic value, reshaping the practice of power and intruding into the innermost areas of our lives. Some commentators have suggested that this transformation is so profound that we are moving from an era of financial capitalism into one of data capitalism. The Israeli historian Yuval Noah Harari even argues that Dataism, as he calls it, can be compared with the birth of a religion, given the claims of its most fervent disciples to provide universal solutions. …

Sir Nigel Shadbolt, co-founder of the Open Data Institute, argues in a recent FT TechTonic podcast that it is too early to give up on privacy…The next impending revolution, he argues, will be about giving consumers control over their data. Considering the increasing processing power and memory capacity of smartphones, he believes new models of data collection and more localised use may soon gain traction. One example is the Blue Button service used by US veterans, which allows individuals to maintain and update their medical records. “That has turned out to be a really revolutionary step,” he says. “I think we are going to see a lot more of that kind of re-empowering.” According to this view, we can use data to create a far smarter world without sacrificing precious rights. If we truly believe in such a benign future, we had better hurry up and invent it….(More)”

Big Data and the Paradox of Diversity


Bernhard Rieder at Digital Culture & Society: “This paper develops a critique of Big Data and associated analytical techniques by focusing not on errors – skewed or imperfect datasets, false positives, underrepresentation, and so forth – but on data mining that works. After a quick framing of these practices as interested readings of reality, I address the question of how data analytics and, in particular, machine learning reveal and operate on the structured and unequal character of contemporary societies, installing “economic morality” (Allen 2012) as the central guiding principle. Rather than critiquing the methods behind Big Data, I inquire into the way these methods make the many differences in decentred, non-traditional societies knowable and, as a consequence, ready for profitable distinction and decision-making. The objective, in short, is to add to our understanding of the “profound ideological role at the intersection of sociality, research, and commerce” (van Dijck 2014: 201) the collection and analysis of large quantities of multifarious data have come to play. Such an understanding needs to embed Big Data in a larger, more fundamental critique of the societal context it operates in….(More)”.

Discrimination by algorithm: scientists devise test to detect AI bias


 at the Guardian: “There was the voice recognition software that struggled to understand women, the crime prediction algorithm that targeted black neighbourhoods and the online ad platform which was more likely to show men highly paid executive jobs.

Concerns have been growing about AI’s so-called “white guy problem” and now scientists have devised a way to test whether an algorithm is introducing gender or racial biases into decision-making.

Mortiz Hardt, a senior research scientist at Google and a co-author of the paper, said: “Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives … Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking.”

The paper was one of several on detecting discrimination by algorithms to be presented at the Neural Information Processing Systems (NIPS) conference in Barcelona this month, indicating a growing recognition of the problem.

Nathan Srebro, a computer scientist at the Toyota Technological Institute at Chicago and co-author, said: “We are trying to enforce that you will not have inappropriate bias in the statistical prediction.”

The test is aimed at machine learning programs, which learn to make predictions about the future by crunching through vast quantities of existing data. Since the decision-making criteria are essentially learnt by the computer, rather than being pre-programmed by humans, the exact logic behind decisions is often opaque, even to the scientists who wrote the software….“Our criteria does not look at the innards of the learning algorithm,” said Srebro. “It just looks at the predictions it makes.”

Their approach, called Equality of Opportunity in Supervised Learning, works on the basic principle that when an algorithm makes a decision about an individual – be it to show them an online ad or award them parole – the decision should not reveal anything about the individual’s race or gender beyond what might be gleaned from the data itself.

For instance, if men were on average twice as likely to default on bank loans than women, and if you knew that a particular individual in a dataset had defaulted on a loan, you could reasonably conclude they were more likely (but not certain) to be male.

However, if an algorithm calculated that the most profitable strategy for a lender was to reject all loan applications from men and accept all female applications, the decision would precisely confirm a person’s gender.

“This can be interpreted as inappropriate discrimination,” said Srebro….(More)”.

Science Can Restore America’s Faith in Democracy


Ariel Procaccia in Wired: “…Like most other countries, individual states in the US employ the antiquated plurality voting system, in which each voter casts a vote for a single candidate, and the person who amasses the largest number of votes is declared the winner. If there is one thing that voting experts unanimously agree on, it is that plurality voting is a bad idea, or at least a badly outdated one….. Maine recently became the first US state to adopt instant-runoff voting; the approach will be used for choosing the governor and members of Congress and the state legislature….

So why aren’t we already using cutting-edge voting systems in national elections? Perhaps because changing election systems usually itself requires an election, where short-term political considerations may trump long-term, scientifically grounded reasoning….Despite these difficulties, in the last few years state-of-the-art voting systems have made the transition from theory to practice, through not-for-profit online platforms that focus on facilitating elections in cities and organizations, or even just on helping a group of friends decide where to go to dinner. For example, the Stanford Crowdsourced Democracy Team has created an online tool whereby residents of a city can vote on how to allocate the city’s budget for public projects such as parks and roads. This tool has been used by New York City, Boston, Chicago, and Seattle to allocate millions of dollars. Building on this success, the Stanford team is experimenting with groundbreaking methods, inspired by computational thinking, to elicit and aggregate the preferences of residents.

The Princeton-based project All Our Ideas asks voters to compare pairs of ideas, and then aggregates these comparisons via statistical methods, ultimately providing a ranking of all the ideas. To date, roughly 14 million votes have been cast using this system, and it has been employed by major cities and organizations. Among its more whimsical use cases is the Washington Post’s 2010 holiday gift guide, where the question was “what gift would you like to receive this holiday season”; the disappointingly uncreative top idea, based on tens of thousands of votes, was “money”.

Finally, the recently launched website RoboVote (which I created with collaborators at Carnegie Mellon and Harvard) offers AI-driven voting methods to help groups of people make smart collective decisions. Applications range from selecting a spot for a family vacation or a class president, to potentially high-stakes choices such as which product prototype to develop or which movie script to produce.

These examples show that centuries of research on voting can, at long last, make a societal impact in the internet age. They demonstrate what science can do for democracy, albeit on a relatively small scale, for now….(More)’

How Artificial Intelligence Will Usher in the Next Stage of E-Government


Daniel Castro at GovTech: “Since the earliest days of the Internet, most government agencies have eagerly explored how to use technology to better deliver services to citizens, businesses and other public-sector organizations. Early on, observers recognized that these efforts often varied widely in their implementation, and so researchers developed various frameworks to describe the different stages of growth and development of e-government. While each model is different, they all identify the same general progression from the informational, for example websites that make government facts available online, to the interactive, such as two-way communication between government officials and users, to the transactional, like applications that allow users to access government services completely online.

However, we will soon see a new stage of e-government: the perceptive.

The defining feature of the perceptive stage will be that the work involved in interacting with government will be significantly reduced and automated for all parties involved. This will come about principally from the integration of artificial intelligence (AI) — computer systems that can learn, reason and decide at levels similar to that of a human — into government services to make it more insightful and intelligent.

Consider the evolution of the Department of Motor Vehicles. The informational stage made it possible for users to find the hours for the local office; the interactive stage made it possible to ask the agency a question by email; and the transactional stage made it possible to renew a driver’s license online.

In the perceptive stage, the user will simply say, “Siri, I need a driver’s license,” and the individual’s virtual assistant will take over — collecting any additional information from the user, coordinating with the government’s system and scheduling any in-person meetings automatically. That’s right: AI might finally end your wait at the DMV.

In general, there are at least three ways that AI will impact government agencies. First, it will enable government workers to be more productive since the technology can be used to automate many tasks. …

Second, AI will create a faster, more responsive government. AI enables the creation of autonomous, intelligent agents — think online chatbots that answer citizens’ questions, real-time fraud detection systems that constantly monitor government expenditures and virtual legislative assistants that quickly synthesize feedback from citizens to lawmakers.

Third, AI will allow people to interact more naturally with digital government services…(More)”

Artificial Intelligence Could Help Colleges Better Plan What Courses They Should Offer


Jeffrey R. Young at EdSsurge: Big data could help community colleges better predict how industries are changing so they can tailor their IT courses and other programs. After all, if Amazon can forecast what consumers will buy and prestock items in their warehouses to meet the expected demand, why can’t colleges do the same thing when planning their curricula, using predictive analytics to make sure new degree or certificates programs are started just in time for expanding job opportunities?

That’s the argument made by Gordon Freedman, president of the nonprofit National Laboratory for Education Transformation. He’s part of a new center that will do just that, by building a data warehouse that brings together up-to-date information on what skills employers need and what colleges currently offer—and then applying artificial intelligence to attempt to predict when sectors or certain employment needs might be expanding.

He calls the approach “opportunity engineering,” and the center boasts some heavy-hitting players to assist in the efforts, including the University of Chicago, the San Diego Supercomputing Center and Argonne National Laboratory. It’s called the National Center for Opportunity Engineering & Analysis.

Ian Roark, vice president of workforce development at Pima Community College in Arizona, is among those eager for this kind of “opportunity engineering” to emerge.

He explains when colleges want to start new programs, they face a long haul—it takes time to develop a new curriculum, put it through an internal review, and then send it through an accreditor….

Other players are already trying to translate the job market into a giant data set to spot trends. LinkedIn sits on one of the biggest troves of data, with hundreds of millions of job profiles, and ambitions to create what it calls the “economic graph” of the economy. But not everyone is on LinkedIn, which attracts mainly those in white-collar jobs. And companies such as Burning Glass Technologies have scanned hundreds of thousands of job listings and attempt to provide real-time intelligence on what employers say they’re looking for. Those still don’t paint the full picture, Freedman argues, such as what jobs are forming at companies.

“We need better information from the employer, better information from the job seeker and better information from the college, and that’s what we’re going after,” Freedman says…(More)”.