Algorithm Observatory: Where anyone can study any social computing algorithm.


About: “We know that social computing algorithms are used to categorize us, but the way they do so is not always transparent. To take just one example, ProPublica recently uncovered that Facebook allows housing advertisers to exclude users by race.

Even so, there are no simple and accessible resources for us, the public, to study algorithms empirically, and to engage critically with the technologies that are shaping our daily lives in such profound ways.

That is why we created Algorithm Observatory.

Part media literacy project and part citizen experiment, the goal of Algorithm Observatory is to provide a collaborative online lab for the study of social computing algorithms. The data collected through this site is analyzed to compare how a particular algorithm handles data differently depending on the characteristics of users.

Algorithm Observatory is a work in progress. This prototype only allows users to explore Facebook advertising algorithms, and the functionality is limited. We are currently looking for funding to realize the project’s full potential: to allow anyone to study any social computing algorithm….

Our future plans

This is a prototype, which only begins to showcase the things that Algorithm Observatory will be able to do in the future.

Eventually, the website will allow anyone to design an experiment involving a social computing algorithm. The platform will allow researchers to recruit volunteer participants, who will be able to contribute content to the site securely and anonymously. Researchers will then be able to conduct an analysis to compare how the algorithm handles users differently depending on individual characteristics. The results will be shared by publishing a report evaluating the social impact of the algorithm. All data and reports will become publicly available and open for comments and reviews. Researchers will be able to study any algorithm, because the site does not require direct access to the source code, but relies instead on empirical observation of the interaction between the algorithm and volunteer participants….(More)”.

Crowdsourcing as a Platform for Digital Labor Unions


Paper by Payal Arora and Linnea Holter Thompson in the International Journal of Communication: “Global complex supply chains have made it difficult to know the realities in factories. This structure obfuscates the networks, channels, and flows of communication between employers, workers, nongovernmental organizations and other vested intermediaries, creating a lack of transparency. Factories operate far from the brands themselves, often in developing countries where labor is cheap and regulations are weak. However, the emergence of social media and mobile technology has drawn the world closer together. Specifically, crowdsourcing is being used in an innovative way to gather feedback from outsourced laborers with access to digital platforms. This article examines how crowdsourcing platforms are used for both gathering and sharing information to foster accountability. We critically assess how these tools enable dialogue between brands and factory workers, making workers part of the greater conversation. We argue that although there are challenges in designing and implementing these new monitoring systems, these platforms can pave the path for new forms of unionization and corporate social responsibility beyond just rebranding…(More)”

Free Speech is a Triangle


Essay by Jack Balkin: “The vision of free expression that characterized much of the twentieth century is inadequate to protect free expression today.

The twentieth century featured a dyadic or dualist model of speech regulation with two basic kinds of players: territorial governments on the one hand, and speakers on the other. The twenty-first century model is pluralist, with multiple players. It is easiest to think of it as a triangle. On one corner are nation states and the European Union. On the second corner are privately-owned Internet infrastructure companies, including social media companies, search engines, broadband providers, and electronic payment systems. On the third corner are many different kinds of speakers, legacy media, civil society organizations, hackers, and trolls.

Territorial goverments continue to regulate speakers and legacy media through traditional or “old-school” speech regulation. But nation states and the European Union also now employ “new-school” speech regulation that is aimed at Internet infrastructure owners and designed to get these private companies to surveil, censor, and regulate speakers for them. Finally, infrastructure companies like Facebook also regulate and govern speakers through techniques of private governance and surveillance.

The practical ability to speak in the digital world emerges from the struggle for power between these various forces, with old-school, new-school and private regulation directed at speakers, and both nation states and civil society organizations pressuring infrastructure owners to regulate speech.

If the characteristic feature of free speech regulation in our time is a triangle that combines new school speech regulation with private governance, then the best way to protect free speech values today is to combat and compensate for that triangle’s evolving logic of public and private regulation. The first goal is to prevent or ameliorate as much as possible collateral censorship and new forms of digital prior restraint. The second goal is to protect people from new methods of digital surveillance and manipulation—methods that emerged from the rise of large multinational companies that depend on data collection, surveillance, analysis, control, and distribution of personal data.

This essay describes how nation states should and should not regulate the digital infrastructure consistent with the values of freedom of speech and press; it emphasizes that different models of regulation are appropriate for different parts of the digital infrastructure. Some parts of the digital infrastructure are best regulated along the lines of common carriers or places of public accommodation. But governments should not impose First Amendment-style or common carriage obligations on social media and search engines. Rather, governments should require these companies to provide due process toward their end-users. Governments should also treat these companies as information fiduciaries who have duties of good faith and non-manipulation toward their end-users. Governments can implement all of these reforms—properly designed—consistent with constitutional guarantees of free speech and free press….(More)”.

Skills for a Lifetime


Nate Silver’s commencement address at Kenyon College: “….Power has shifted toward people and companies with a lot of proficiency in data science.

I obviously don’t think that’s entirely a bad thing. But it’s by no means entirely a good thing, either. You should still inherently harbor some suspicion of big, powerful institutions and their potentially self-serving and short-sighted motivations. Companies and governments that are capable of using data in powerful ways are also capable of abusing it.

What worries me the most, especially at companies like Facebook and at other Silicon Valley behemoths, is the idea that using data science allows one to remove human judgment from the equation. For instance, in announcing a recent change to Facebook’s News Feed algorithm, Mark Zuckerberg claimed that Facebook was not “comfortable” trying to come up with a way to determine which news organizations were most trustworthy; rather, the “most objective” solution was to have readers vote on trustworthiness instead. Maybe this is a good idea and maybe it isn’t — but what bothered me was in the notion that Facebook could avoid responsibility for its algorithm by outsourcing the judgment to its readers.

I also worry about this attitude when I hear people use terms such as “artificial intelligence” and “machine learning” (instead of simpler terms like “computer program”). Phrases like “machine learning” appeal to people’s notion of a push-button solution — meaning, push a button, and the computer does all your thinking for you, no human judgment required.

But the reality is that working with data requires lots of judgment. First, it requires critical judgment — and experience — when drawing inferences from data. And second, it requires moral judgment in deciding what your goals are and in establishing boundaries for your work.

Let’s talk about that first type of judgment — critical judgment. The more experience you have in working with different data sets, the more you’ll realize that the correct interpretation of the data is rarely obvious, and that the obvious-seeming interpretation isn’t always correct. Sometimes changing a single assumption or a single line of code can radically change your conclusion. In the 2016 U.S. presidential election, for instance, there were a series of models that all used almost exactly the same inputs — but they ranged in giving Trump as high as roughly a one-in-three chance of winning the presidency (that was FiveThirtyEight’s model) to as low as one chance in 100, based on fairly subtle aspects of how each algorithm was designed….(More)”.

Data Activism


Special Issue of Krisis: Journal of Contemporary Philosophy: “Digital data increasingly plays a central role in contemporary politics and public life. Citizen voices are increasingly mediated by proprietary social media platforms and are shaped by algorithmic ranking and re-ordering, but data informs how states act, too. This special issue wants to shift the focus of the conversation. Non-governmental organizations, hackers, and activists of all kinds provide a myriad of ‘alternative’ interventions, interpretations, and imaginaries of what data stands for and what can be done with it.

Jonathan Gray starts off this special issue by suggesting how data can be involved in providing horizons of intelligibility and organising social and political life. Helen Kennedy’s contribution advocates for a focus on emotions and everyday lived experiences with data. Lina Dencik puts forward the notion of ‘surveillance realism’ to explore the pervasiveness of contemporary surveillance and the emergence of alternative imaginaries. Stefan Baack investigates how data are used to facilitate civic engagement. Miren Gutiérrez explores how activists can make use of data infrastructures such as databases, servers, and algorithms. Finally, Leah Horgan and Paul Dourish critically engage with the notion of data activism by looking at everyday data work in a local administration. Further, this issue features an interview with Boris Groys by Thijs Lijster, whose work Über das Neue enjoys its 25th anniversary last year. Lastly, three book reviews illuminate key aspects of datafication. Patricia de Vries reviews Metahavens’ Black Transparency; Niels van Doorn writes on Platform Capitalism by Nick Srnicek and Jan Overwijk comments on The Entrepeneurial Self by Ulrich Bröckling….(More)”.

Tech Platforms and the Knowledge Problem


Frank Pasquale at American Affairs: “Friedrich von Hayek, the preeminent theorist of laissez-faire, called the “knowledge problem” an insuperable barrier to central planning. Knowledge about the price of supplies and labor, and consumers’ ability and willingness to pay, is so scattered and protean that even the wisest authorities cannot access all of it. No person knows everything about how goods and services in an economy should be priced. No central decision-maker can grasp the idiosyncratic preferences, values, and purchasing power of millions of individuals. That kind of knowledge, Hayek said, is distributed.

In an era of artificial intelligence and mass surveillance, however, the possibility of central planning has reemerged—this time in the form of massive firms. Having logged and analyzed billions of transactions, Amazon knows intimate details about all its customers and suppliers. It can carefully calibrate screen displays to herd buyers toward certain products or shopping practices, or to copy sellers with its own, cheaper, in-house offerings. Mark Zuckerberg aspires to omniscience of consumer desires, by profiling nearly everyone on Facebook, Instagram, and WhatsApp, and then leveraging that data trove to track users across the web and into the real world (via mobile usage and device fingerprinting). You don’t even have to use any of those apps to end up in Facebook/Instagram/WhatsApp files—profiles can be assigned to you. Google’s “database of intentions” is legendary, and antitrust authorities around the world have looked with increasing alarm at its ability to squeeze out rivals from search results once it gains an interest in their lines of business. Google knows not merely what consumers are searching for, but also what other businesses are searching, buying, emailing, planning—a truly unparalleled matching of data-processing capacity to raw communication flows.

Nor is this logic limited to the online context. Concentration is paying dividends for the largest banks (widely assumed to be too big to fail), and major health insurers (now squeezing and expanding the medical supply chain like an accordion). Like the digital giants, these finance and insurance firms not only act as middlemen, taking a cut of transactions, but also aspire to capitalize on the knowledge they have gained from monitoring customers and providers in order to supplant them and directly provide services and investment. If it succeeds, the CVS-Aetna merger betokens intense corporate consolidations that will see more vertical integration of insurers, providers, and a baroque series of middlemen (from pharmaceutical benefit managers to group purchasing organizations) into gargantuan health providers. A CVS doctor may eventually refer a patient to a CVS hospital for a CVS surgery, to be followed up by home health care workers employed by CVS who bring CVS pharmaceuticals—allcovered by a CVS/Aetna insurance plan, which might penalize the patient for using any providers outside the CVS network. While such a panoptic firm may sound dystopian, it is a logical outgrowth of health services researchers’ enthusiasm for “integrated delivery systems,” which are supposed to provide “care coordination” and “wraparound services” more efficiently than America’s current, fragmented health care system.

The rise of powerful intermediaries like search engines and insurers may seem like the next logical step in the development of capitalism. But a growing chorus of critics questions the size and scope of leading firms in these fields. The Institute for Local Self-Reliance highlights Amazon’s manipulation of both law and contracts to accumulate unfair advantages. International antitrust authorities have taken Google down a peg, questioning the company’s aggressive use of its search engine and Android operating system to promote its own services (and demote rivals). They also question why Google and Facebook have for years been acquiring companies at a pace of more than two per month. Consumer advocates complain about manipulative advertising. Finance scholars lambaste megabanks for taking advantage of the implicit subsidies that too-big-to-fail status confers….(More)”.

How the Math Men Overthrew the Mad Men


 in the New Yorker: “Once, Mad Men ruled advertising. They’ve now been eclipsed by Math Men—the engineers and data scientists whose province is machines, algorithms, pureed data, and artificial intelligence. Yet Math Men are beleaguered, as Mark Zuckerberg demonstrated when he humbled himself before Congress, in April. Math Men’s adoration of data—coupled with their truculence and an arrogant conviction that their “science” is nearly flawless—has aroused government anger, much as Microsoft did two decades ago.

The power of Math Men is awesome. Google and Facebook each has a market value exceeding the combined value of the six largest advertising and marketing holding companies. Together, they claim six out of every ten dollars spent on digital advertising, and nine out of ten new digital ad dollars. They have become more dominant in what is estimated to be an up to two-trillion-dollar annual global advertising and marketing business. Facebook alone generates more ad dollars than all of America’s newspapers, and Google has twice the ad revenues of Facebook.

In the advertising world, Big Data is the Holy Grail, because it enables marketers to target messages to individuals rather than general groups, creating what’s called addressable advertising. And only the digital giants possess state-of-the-art Big Data. “The game is no longer about sending you a mail order catalogue or even about targeting online advertising,” Shoshana Zuboff, a professor of business administration at the Harvard Business School, wrote on faz.net, in 2016. “The game is selling access to the real-time flow of your daily life—your reality—in order to directly influence and modify your behavior for profit.” Success at this “game” flows to those with the “ability to predict the future—specifically the future of behavior,” Zuboff writes. She dubs this “surveillance capitalism.”

However, to thrash just Facebook and Google is to miss the larger truth: everyone in advertising strives to eliminate risk by perfecting targeting data. Protecting privacy is not foremost among the concerns of marketers; protecting and expanding their business is. The business model adopted by ad agencies and their clients parallels Facebook and Google’s. Each aims to massage data to better identify potential customers. Each aims to influence consumer behavior. To appreciate how alike their aims are, sit in an agency or client marketing meeting and you will hear wails about Facebook and Google’s “walled garden,” their unwillingness to share data on their users. When Facebook or Google counter that they must protect “the privacy” of their users, advertisers cry foul: You’re using the data to target ads we paid for—why won’t you share it, so that we can use it in other ad campaigns?…(More)”

Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing


Paper by Martin Mueller and Marcel Salath: “In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams.

At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community.

Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labelling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labelling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work introduces the technical aspects of the platform and explores its future use cases…(More)”.

How the Enlightenment Ends


Henry Kissinger in the Atlantic: “…Heretofore, the technological advance that most altered the course of modern history was the invention of the printing press in the 15th century, which allowed the search for empirical knowledge to supplant liturgical doctrine, and the Age of Reason to gradually supersede the Age of Religion. Individual insight and scientific knowledge replaced faith as the principal criterion of human consciousness. Information was stored and systematized in expanding libraries. The Age of Reason originated the thoughts and actions that shaped the contemporary world order.

But that order is now in upheaval amid a new, even more sweeping technological revolution whose consequences we have failed to fully reckon with, and whose culmination may be a world relying on machines powered by data and algorithms and ungoverned by ethical or philosophical norms.

he internet age in which we already live prefigures some of the questions and issues that AI will only make more acute. The Enlightenment sought to submit traditional verities to a liberated, analytic human reason. The internet’s purpose is to ratify knowledge through the accumulation and manipulation of ever expanding data. Human cognition loses its personal character. Individuals turn into data, and data become regnant.

Users of the internet emphasize retrieving and manipulating information over contextualizing or conceptualizing its meaning. They rarely interrogate history or philosophy; as a rule, they demand information relevant to their immediate practical needs. In the process, search-engine algorithms acquire the capacity to predict the preferences of individual clients, enabling the algorithms to personalize results and make them available to other parties for political or commercial purposes. Truth becomes relative. Information threatens to overwhelm wisdom.

Inundated via social media with the opinions of multitudes, users are diverted from introspection; in truth many technophiles use the internet to avoid the solitude they dread. All of these pressures weaken the fortitude required to develop and sustain convictions that can be implemented only by traveling a lonely road, which is the essence of creativity.

The impact of internet technology on politics is particularly pronounced. The ability to target micro-groups has broken up the previous consensus on priorities by permitting a focus on specialized purposes or grievances. Political leaders, overwhelmed by niche pressures, are deprived of time to think or reflect on context, contracting the space available for them to develop vision.
The digital world’s emphasis on speed inhibits reflection; its incentive empowers the radical over the thoughtful; its values are shaped by subgroup consensus, not by introspection. For all its achievements, it runs the risk of turning on itself as its impositions overwhelm its conveniences….

There are three areas of special concern:

First, that AI may achieve unintended results….

Second, that in achieving intended goals, AI may change human thought processes and human values….

Third, that AI may reach intended goals, but be unable to explain the rationale for its conclusions…..(More)”

Data Violence and How Bad Engineering Choices Can Damage Society


Blog by Anna Lauren Hoffmann: “…In 2015, a black developer in New York discovered that Google’s algorithmic photo recognition software had tagged pictures of him and his friends as gorillas.

The same year, Facebook auto-suspended Native Americans for using their real names, and in 2016, facial recognition was found to struggle to read black faces.

Software in airport body scanners has flagged transgender bodies as threatsfor years. In 2017, Google Translate took gender-neutral pronouns in Turkish and converted them to gendered pronouns in English — with startlingly biased results.

“Violence” might seem like a dramatic way to talk about these accidents of engineering and the processes of gathering data and using algorithms to interpret it. Yet just like physical violence in the real world, this kind of “data violence” (a term inspired by Dean Spade’s concept of administrative violence) occurs as the result of choices that implicitly and explicitly lead to harmful or even fatal outcomes.

Those choices are built on assumptions and prejudices about people, intimately weaving them into processes and results that reinforce biases and, worse, make them seem natural or given.

Take the experience of being a woman and having to constantly push back against rigid stereotypes and aggressive objectification.

Writer and novelist Kate Zambreno describes these biases as “ghosts,” a violent haunting of our true reality. “A return to these old roles that we play, that we didn’t even originate. All the ghosts of the past. Ghosts that aren’t even our ghosts.”

Structural bias is reinforced by the stereotypes fed to us in novels, films, and a pervasive cultural narrative that shapes the lives of real women every day, Zambreno describes. This extends to data and automated systems that now mediate our lives as well. Our viewing and shopping habits, our health and fitness tracking, our financial information all conspire to create a “data double” of ourselves, produced about us by third parties and standing in for us on data-driven systems and platforms.

These fabrications don’t emerge de novo, disconnected from history or social context. Rather, they often pick up and unwittingly spit out a tangled mess of historical conditions and current realities.

Search engines are a prime example of how data and algorithms can conspire to amplify racist and sexist biases. The academic Safiya Umoja Noble threw these messy entanglements into sharp relief in her book Algorithms of OppressionGoogle Search, she explains, has a history of offering up pages of porn for women from particular racial or ethnic groups, and especially black women. Google have also served up ads for criminal background checksalongside search results for African American–sounding names, as former Federal Trade Commission CTO Latanya Sweeney discovered.

“These search engine results for women whose identities are already maligned in the media, such as Black women and girls, only further debase and erode efforts for social, political, and economic recognition and justice,” Noble says.

These kinds of cultural harms go well beyond search results. Sociologist Rena Bivens has shown how the gender categories employed by platforms like Facebook can inflict symbolic violences against transgender and nonbinary users in ways that may never be made obvious to users….(More)”.