Folksonomies: how to do things with words on social media


Oxford Dictionaries: “Folksonomy, a portmanteau word for ‘folk taxonomy’, is a term for collaborative tagging: the production of user-created ‘tags’ on social media that help readers to find and sort content. In other words, hashtags: #ThrowbackThursday, #DogLife, #MeToo. Because ordinary people create folksonomy tags, folksonomies include categories devised by small communities, subcultures, or even individuals, not merely those by accepted taxonomic systems like the Dewey Decimal System.

The term first arose in the wake of Web 2.0 – the Web’s transition, in the early 2000s, from a read-only platform to a read-write platform that allows users to comment on and collaboratively tag what they read. Rather unusually, we know the exact date it was coined: 24 July, 2004. The information architect Thomas Vander Wal came up with it in response to a query over what to call this kind of informal social classification.

Perhaps the most visible folksonomies are those on social-media platforms like Facebook, Twitter, Tumblr, Flickr, and Instagram. Often, people create tags on these platforms in order to gather under a single tag content that many different users have created, making it easier to find posts related to that tag. (If I’m interested in dogs, I might look at content gathered under the tag #DogLife.) Because tags reflect the interests of people who create them, researchers have pursued ways to use tags to build more comprehensive profiles of users, with an eye to surveillance or to selling them relevant ads.

But people may also use tags as prompts for the creation of new content, not merely the curation of content they would have posted anyway. As I write this post, a trending tag on Twitter, #MakeAHorrorMovieMoreHorrific, is prompting thousands of people to write satirical takes on how classic horror movies might be made more ‘horrifying’ by adding unhappy features of our ordinary lives. (‘I Know What You Did Last Summer, and I Put It on Facebook’; ‘Rosemary’s Baby Is Teething’; ‘The Exercise’)

From a certain perspective, this is not so different from a library’s acknowledgment of a new category of text: if a new academic field, like ‘the history of the book’, catches on, then libraries rearrange their shelves and catalogues to accommodate the history of the book as a category; the new shelf space and catalogue space creates a demand for new books in that category, which encourages authors and publishers to produce new books to meet the demand.

But new folksonomy tags (with important exceptions, as in the realm of activism) are often short-lived and meant to be short-lived, obscure and meant to be obscure. What library cataloguer would think to accommodate the category #glitterhorse, which has a surprising number of posts on Twitter and Instagram? How can Vander Wal’s original definition of folksonomy as a tool for information retrieval accommodate tags that function, not as search terms, but as theatrical asides, like #sorrynotsorry? What about tags that are so narrowly specific that no search could ever turn up more than one usage?

Perhaps the best way to understand the weird things that people do with folksonomy tags is to appeal, not to information science, but to narratology, the study of narrative structures. …(More)”.

The Collaborative Era in Science: Governing the Network


Book by Caroline S. Wagner: “In recent years a global network of science has emerged as a result of thousands of individual scientists seeking to collaborate with colleagues around the world, creating a network which rises above national systems. The globalization of science is part of the underlying shift in knowledge creation generally: the collaborative era in science. Over the past decade, the growth in the amount of knowledge and the speed at which it is available has created a fundamental shift—where data, information, and knowledge were once scarce resources, they are now abundantly available.

Collaboration, openness, customer- or problem-focused research and development, altruism, and reciprocity are notable features of abundance, and they create challenges that economists have not yet studied. This book defines the collaborative era, describes how it came to be, reveals its internal dynamics, and demonstrates how real-world practitioners are changing to take advantage of it. Most importantly, the book lays out a guide for policymakers and entrepreneurs as they shift perspectives to take advantage of the collaborative era in order to create social and economic welfare….(More)”.

Creating and Capturing Value through Crowdsourcing


Cover

Book edited by Allan Afuah, Christopher L. Tucci, and Gianluigi Viscusi: “Examples of the value that can be created and captured through crowdsourcing go back to at least 1714 when the UK used crowdsourcing to solve the Longitude Problem, obtaining a solution that would enable the UK to become the dominant maritime force of its time. Today, Wikipedia uses crowds to provide entries for the world’s largest and free encyclopedia. Partly fueled by the value that can be created and captured through crowdsourcing, interest in researching the phenomenon has been remarkable.

Despite this – or perhaps because of it – research into crowdsourcing has been conducted in different research silos, within the fields of management (from strategy to finance to operations to information systems), biology, communications, computer science, economics, political science, among others. In these silos, crowdsourcing takes names such as broadcast search, innovation tournaments, crowdfunding, community innovation, distributed innovation, collective intelligence, open source, crowdpower, and even open innovation. This book aims to assemble chapters from many of these silos, since the ultimate potential of crowdsourcing research is likely to be attained only by bridging them. Chapters provide a systematic overview of the research on crowdsourcing from different fields based on a more encompassing definition of the concept, its difference for innovation, and its value for both private and public sector….(More)”.

Algorithmic Government: Automating Public Services and Supporting Civil Servants in using Data Science Technologies


Zeynep Engin and Philip Treleaven in the Computer Journal:  “The data science technologies of artificial intelligence (AI), Internet of Things (IoT), big data and behavioral/predictive analytics, and blockchain are poised to revolutionize government and create a new generation of GovTech start-ups. The impact from the ‘smartification’ of public services and the national infrastructure will be much more significant in comparison to any other sector given government’s function and importance to every institution and individual.

Potential GovTech systems include Chatbots and intelligent assistants for public engagement, Robo-advisors to support civil servants, real-time management of the national infrastructure using IoT and blockchain, automated compliance/regulation, public records securely stored in blockchain distributed ledgers, online judicial and dispute resolution systems, and laws/statutes encoded as blockchain smart contracts. Government is potentially the major ‘client’ and also ‘public champion’ for these new data technologies. This review paper uses our simple taxonomy of government services to provide an overview of data science automation being deployed by governments world-wide. The goal of this review paper is to encourage the Computer Science community to engage with government to develop these new systems to transform public services and support the work of civil servants….(More)”.

How Big Tech Is Working With Nonprofits and Governments to Turn Data Into Solutions During Disasters


Kelsey Sutton at Adweek: “As Hurricane Michael approached the Florida Panhandle, the Florida Division of Emergency Management tapped a tech company for help.

Over the past year, Florida’s DEM has worked closely with GasBuddy, a Boston-based app that uses crowdsourced data to identify fuel prices and inform first responders and the public about fuel availability or power outages at gas stations during storms. Since Hurricane Irma in 2017, GasBuddy and DEM have worked together to survey affected areas, helping Florida first responders identify how best to respond to petroleum shortages. With help from the location intelligence company Cuebiq, GasBuddy also provides estimated wait times at gas stations during emergencies.

DEM first noticed GasBuddy’s potential in 2016, when the app was collecting and providing data about fuel availability following a pipeline leak.

“DEM staff recognized how useful such information would be to Florida during any potential future disasters, and reached out to GasBuddy staff to begin a relationship,” a spokesperson for the Florida State Emergency Operations Center explained….

Stefaan Verhulst, co-founder and chief research and development officer at the Governance Laboratory at New York University, advocates for private corporations to partner with public institutions and NGOs. Private data collected by corporations is richer, more granular and more up-to-date than data collected through traditional social science methods, making that data useful for noncorporate purposes like research, Verhulst said. “Those characteristics are extremely valuable if you are trying to understand how society works,” Verhulst said….(More)”.

This is how computers “predict the future”


Dan Kopf at Quartz: “The poetically named “random forest” is one of data science’s most-loved prediction algorithms. Developed primarily by statistician Leo Breiman in the 1990s, the random forest is cherished for its simplicity. Though it is not always the most accurate prediction method for a given problem, it holds a special place in machine learning because even those new to data science can implement and understand this powerful algorithm.

This was the algorithm used in an exciting 2017 study on suicide predictions, conducted by biomedical-informatics specialist Colin Walsh of Vanderbilt University and psychologists Jessica Ribeiro and Joseph Franklin of Florida State University. Their goal was to take what they knew about a set of 5,000 patients with a history of self-injury, and see if they could use those data to predict the likelihood that those patients would commit suicide. The study was done retrospectively. Sadly, almost 2,000 of these patients had killed themselves by the time the research was underway.

Altogether, the researchers had over 1,300 different characteristics they could use to make their predictions, including age, gender, and various aspects of the individuals’ medical histories. If the predictions from the algorithm proved to be accurate, the algorithm could theoretically be used in the future to identify people at high risk of suicide, and deliver targeted programs to them. That would be a very good thing.

Predictive algorithms are everywhere. In an age when data are plentiful and computing power is mighty and cheap, data scientists increasingly take information on people, companies, and markets—whether given willingly or harvested surreptitiously—and use it to guess the future. Algorithms predict what movie we might want to watch next, which stocks will increase in value, and which advertisement we’re most likely to respond to on social media. Artificial-intelligence tools, like those used for self-driving cars, often rely on predictive algorithms for decision making….(More)”.

The role of blockchain, cryptoeconomics, and collective intelligence in building the future of justice


Blog by Federico Ast at Thomson Reuters: “Human communities of every era have had to solve the problem of social order. For this, they developed governance and legal systems. They did it with the technologies and systems of belief of their time….

A better justice system may not come from further streamlining existing processes but from fundamentally rethinking them from a first principles perspective.

In the last decade, we have witnessed how collective intelligence could be leveraged to produce an encyclopaedia like Wikipedia, a transport system like Uber, a restaurant rating system like Yelp!, and a hotel system like Airbnb. These companies innovated by crowdsourcing value creation. Instead of having an in-house team of restaurant critics as the Michelin Guide, Yelp! crowdsourced ratings in users.

Satoshi Nakamoto’s invention of Bitcoin (and the underlying blockchain technology) may be seen as the next step in the rise of the collaborative economy. The Bitcoin Network proved that, given the right incentives, anonymous users could cooperate in creating and updating a distributed ledger which could act as a monetary system. A nationless system, inherently global, and native to the Internet Age.

Cryptoeconomics is a new field of study that leverages cryptography, computer science and game theory to build secure distributed systems. It is the science that underlies the incentive system of open distributed ledgers. But its potential goes well beyond cryptocurrencies.

Kleros is a dispute resolution system which relies on cryptoeconomics. It uses a system of incentives based on “focal points”, a concept developed by game theorist Thomas Schelling, winner of the Nobel Prize in Economics 2005. Using a clever mechanism design, it seeks to produce a set of incentives for randomly selected users to adjudicate different types of disputes in a fast, affordable and secure way. Users who adjudicate disputes honestly will make money. Users who try to abuse the system will lose money.

Kleros does not seek to compete with governments or traditional arbitration systems, but provide a new method that will leverage the wisdom of the crowd to resolve many disputes of the global digital economy for which existing methods fall short: e-commerce, crowdfunding and many types of small claims are among the early adopters….(More)”.

The biggest pandemic risk? Viral misinformation


Heidi J. Larson at Nature: “A hundred years ago this month, the death rate from the 1918 influenza was at its peak. An estimated 500 million people were infected over the course of the pandemic; between 50 million and 100 million died, around 3% of the global population at the time.

A century on, advances in vaccines have made massive outbreaks of flu — and measles, rubella, diphtheria and polio — rare. But people still discount their risks of disease. Few realize that flu and its complications caused an estimated 80,000 deaths in the United States alone this past winter, mainly in the elderly and infirm. Of the 183 children whose deaths were confirmed as flu-related, 80% had not been vaccinated that season, according to the US Centers for Disease Control and Prevention.

I predict that the next major outbreak — whether of a highly fatal strain of influenza or something else — will not be due to a lack of preventive technologies. Instead, emotional contagion, digitally enabled, could erode trust in vaccines so much as to render them moot. The deluge of conflicting information, misinformation and manipulated information on social media should be recognized as a global public-health threat.

So, what is to be done? The Vaccine Confidence Project, which I direct, works to detect early signals of rumours and scares about vaccines, and so to address them before they snowball. The international team comprises experts in anthropology, epidemiology, statistics, political science and more. We monitor news and social media, and we survey attitudes. We have also developed a Vaccine Confidence Index, similar to a consumer-confidence index, to track attitudes.

Emotions around vaccines are volatile, making vigilance and monitoring crucial for effective public outreach. In 2016, our project identified Europe as the region with the highest scepticism around vaccine safety (H. J. Larson et al. EBioMedicine 12, 295–301; 2016). The European Union commissioned us to re-run the survey this summer; results will be released this month. In the Philippines, confidence in vaccine safety dropped from 82% in 2015 to 21% in 2018 (H. J. Larson et al. Hum. Vaccines Immunother. https://doi.org/10.1080/21645515.2018.1522468; 2018), after legitimate concerns arose about new dengue vaccines. Immunization rates for established vaccines for tetanus, polio, tetanus and more also plummeted.

We have found that it is useful to categorize misinformation into several levels….(More).

Babbage among the insurers: big 19th-century data and the public interest.


Wilson, D. C. S.  at the History of the Human Sciences: “This article examines life assurance and the politics of ‘big data’ in mid-19th-century Britain. The datasets generated by life assurance companies were vast archives of information about human longevity. Actuaries distilled these archives into mortality tables – immensely valuable tools for predicting mortality and so pricing risk. The status of the mortality table was ambiguous, being both a public and a private object: often computed from company records they could also be extrapolated from quasi-public projects such as the Census or clerical records. Life assurance more generally straddled the line between private enterprise and collective endeavour, though its advocates stressed the public interest in its success. Reforming actuaries such as Thomas Rowe Edmonds wanted the data on which mortality tables were based to be made publicly available, but faced resistance. Such resistance undermined insurers’ claims to be scientific in spirit and hindered Edmonds’s personal quest for a law of mortality. Edmonds pushed instead for an open actuarial science alongside fellow-travellers at the Statistical Society of London, which was populated by statisticians such as William Farr (whose subsequent work, it is argued, was influenced by Edmonds) as well as by radical mathematicians such as Charles Babbage. The article explores Babbage’s little-known foray into the world of insurance, both as a budding actuary but also as a fierce critic of the industry. These debates over the construction, ownership, and accessibility of insurance datasets show that concern about the politics of big data did not begin in the 21st century….(More)”.

Privacy and Synthetic Datasets


Paper by Steven M. Bellovin, Preetam K. Dutta and Nathan Reitinger: “Sharing is a virtue, instilled in us from childhood. Unfortunately, when it comes to big data — i.e., databases possessing the potential to usher in a whole new world of scientific progress — the legal landscape prefers a hoggish motif. The historic approach to the resulting database–privacy problem has been anonymization, a subtractive technique incurring not only poor privacy results, but also lackluster utility. In anonymization’s stead, differential privacy arose; it provides better, near-perfect privacy, but is nonetheless subtractive in terms of utility.

Today, another solution is leaning into the fore, synthetic data. Using the magic of machine learning, synthetic data offers a generative, additive approach — the creation of almost-but-not-quite replica data. In fact, as we recommend, synthetic data may be combined with differential privacy to achieve a best-of-both-worlds scenario. After unpacking the technical nuances of synthetic data, we analyze its legal implications, finding both over and under inclusive applications. Privacy statutes either overweigh or downplay the potential for synthetic data to leak secrets, inviting ambiguity. We conclude by finding that synthetic data is a valid, privacy-conscious alternative to raw data, but is not a cure-all for every situation. In the end, computer science progress must be met with proper policy in order to move the area of useful data dissemination forward….(More)”.