Virtual memory: the race to save the information age


Review by Richard Ovenden in the Financial Times of:
You Could Look It Up: The Reference Shelf from Ancient Babylon to Wikipedia, by Jack Lynch, Bloomsbury, RRP£25/$30, 464 pages

When We Are No More: How Digital Memory Is Shaping Our Future, by Abbey Smith Rumsey, Bloomsbury, RRP£18.99/$28, 240 pages

Ctrl + Z: The Right to Be Forgotten, by Meg Leta Jones, NYU Press, RRP£20.99/$29.95, 284 pages

“…For millions of people, technological devices have become essential tools in keeping memories alive — to the point where it can feel as though events without an impression in silicon have somehow not been fully experienced. In under three decades, the web has expanded to contain more than a billion sites. Every day about 300m digital photographs, more than 100 terabytes’ worth, are uploaded to Facebook. An estimated 204m emails are sent every minute and, with 5bn mobile devices in existence, the generation of new content looks set to continue its rapid growth.

Is the abundance of information in the age of Google and Facebook storing up problems for future generations? Richard Ovenden, who as Bodley’s Librarian is responsible for the research libraries of the University of Oxford, talks about the opportunites and concerns of the digitisation of memory with John Thornhill, the FT’s innovation editor

We celebrate this growth, and rightly. Today knowledge is created and consumed at a rate that would have been inconceivable a generation ago; instant access to the fruits of millennia of civilisation now seems like a natural state of affairs. Yet we overlook — at our peril — just how unstable and transient much of this information is. Amid the proliferation there is also constant decay: phenomena such as “bit rot” (the degradation of software programs over time), “data rot” (the deterioration of digital storage media) and “link rot” (web links pointing to online resources that have become permanently unavailable) can render information inaccessible. This affects everything from holiday photos and email correspondence to official records: to give just one example, a Harvard study published in 2013 found that 50 per cent of links in the US Supreme Court opinions website were broken.

Are we creating a problem that future generations will not be able to solve? Could the early decades of the 21st century even come to seem, in the words of the internet pioneer Vint Cerf, like a“digital Dark Age”? Whether or not such fears are realised, it is becoming increasingly clear that the migration of knowledge to formats permitting rapid and low-cost copying and dissemination, but in which the base information cannot survive without complex and expensive intervention, requires that we choose, more actively than ever before, what to remember and what to forget….(More)”

Post, Mine, Repeat: Social Media Data Mining Becomes Ordinary


In this book, Helen Kennedy argues that as social media data mining becomes more and more ordinary, as we post, mine and repeat, new data relations emerge. These new data relations are characterised by a widespread desire for numbers and the troubling consequences of this desire, and also by the possibility of doing good with data and resisting data power, by new and old concerns, and by instability and contradiction. Drawing on action research with public sector organisations, interviews with commercial social insights companies and their clients, focus groups with social media users and other research, Kennedy provides a fascinating and detailed account of living with social media data mining inside the organisations that make up the fabric of everyday life….(More)”

We know where you live


MIT News Office: “From location data alone, even low-tech snoopers can identify Twitter users’ homes, workplaces….Researchers at MIT and Oxford University have shown that the location stamps on just a handful of Twitter posts — as few as eight over the course of a single day — can be enough to disclose the addresses of the poster’s home and workplace to a relatively low-tech snooper.

The tweets themselves might be otherwise innocuous — links to funny videos, say, or comments on the news. The location information comes from geographic coordinates automatically associated with the tweets.

Twitter’s location-reporting service is off by default, but many Twitter users choose to activate it. The new study is part of a more general project at MIT’s Internet Policy Research Initiative to help raise awareness about just how much privacy people may be giving up when they use social media.

The researchers describe their research in a paper presented last week at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems, where it received an honorable mention in the best-paper competition, a distinction reserved for only 4 percent of papers accepted to the conference.

“Many people have this idea that only machine-learning techniques can discover interesting patterns in location data,” says Ilaria Liccardi, a research scientist at MIT’s Internet Policy Research Initiative and first author on the paper. “And they feel secure that not everyone has the technical knowledge to do that. With this study, what we wanted to show is that when you send location data as a secondary piece of information, it is extremely simple for people with very little technical knowledge to find out where you work or live.”

Conclusions from clustering

In their study, Liccardi and her colleagues — Alfie Abdul-Rahman and Min Chen of Oxford’s e-Research Centre in the U.K. — used real tweets from Twitter users in the Boston area. The users consented to the use of their data, and they also confirmed their home and work addresses, their commuting routes, and the locations of various leisure destinations from which they had tweeted.

The time and location data associated with the tweets were then presented to a group of 45 study participants, who were asked to try to deduce whether the tweets had originated at the Twitter users’ homes, their workplaces, leisure destinations, or locations along their commutes. The participants were not recruited on the basis of any particular expertise in urban studies or the social sciences; they just drew what conclusions they could from location clustering….

Predictably, participants fared better with map-based representations, correctly identifying Twitter users’ homes roughly 65 percent of the time and their workplaces at closer to 70 percent. Even the tabular representation was informative, however, with accuracy rates of just under 50 percent for homes and a surprisingly high 70 percent for workplaces….(More; Full paper )”

Robot Regulators Could Eliminate Human Error


 in the San Francisco Chronicle and Regblog: “Long a fixture of science fiction, artificial intelligence is now part of our daily lives, even if we do not realize it. Through the use of sophisticated machine learning algorithms, for example, computers now work to filter out spam messages automatically from our email. Algorithms also identify us by our photos on Facebook, match us with new friends on online dating sites, and suggest movies to watch on Netflix.

These uses of artificial intelligence hardly seem very troublesome. But should we worry if government agencies start to use machine learning?

Complaints abound even today about the uncaring “bureaucratic machinery” of government. Yet seeing how machine learning is starting to replace jobs in the private sector, we can easily fathom a literal machinery of government in which decisions made by human public servants increasingly become made by machines.

Technologists warn of an impending “singularity,” when artificial intelligence surpasses human intelligence. Entrepreneur Elon Musk cautions that artificial intelligence poses one of our “biggest existential threats.” Renowned physicist Stephen Hawking eerily forecasts that artificial intelligence might even “spell the end of the human race.”

Are we ready for a world of regulation by robot? Such a world is closer than we think—and it could actually be worth welcoming.

Already government agencies rely on machine learning for a variety of routine functions. The Postal Service uses learning algorithms to sort mail, and cities such as Los Angeles use them to time their traffic lights. But while uses like these seem relatively benign, consider that machine learning could also be used to make more consequential decisions. Disability claims might one day be processed automatically with the aid of artificial intelligence. Licenses could be awarded to airplane pilots based on what kinds of safety risks complex algorithms predict each applicant poses.

Learning algorithms are already being explored by the Environmental Protection Agency to help make regulatory decisions about what toxic chemicals to control. Faced with tens of thousands of new chemicals that could potentially be harmful to human health, federal regulators have supported the development of a program to prioritize which of the many chemicals in production should undergo the more in-depth testing. By some estimates, machine learning could save the EPA up to $980,000 per toxic chemical positively identified.

It’s not hard then to imagine a day in which even more regulatory decisions are automated. Researchers have shown that machine learning can lead to better outcomes when determining whether parolees ought to be released or domestic violence orders should be imposed. Could the imposition of regulatory fines one day be determined by a computer instead of a human inspector or judge? Quite possibly so, and this would be a good thing if machine learning could improve accuracy, eliminate bias and prejudice, and reduce human error, all while saving money.

But can we trust a government that bungled the initial rollout of Healthcare.gov to deploy artificial intelligence responsibly? In some circumstances we should….(More)”

Big data’s ‘streetlight effect’: where and how we look affects what we see


 at the Conversation: “Big data offers us a window on the world. But large and easily available datasets may not show us the world we live in. For instance, epidemiological models of the recent Ebola epidemic in West Africa using big data consistently overestimated the risk of the disease’s spread and underestimated the local initiatives that played a critical role in controlling the outbreak.

Researchers are rightly excited about the possibilities offered by the availability of enormous amounts of computerized data. But there’s reason to stand back for a minute to consider what exactly this treasure trove of information really offers. Ethnographers like me use a cross-cultural approach when we collect our data because family, marriage and household mean different things in different contexts. This approach informs how I think about big data.

We’ve all heard the joke about the drunk who is asked why he is searching for his lost wallet under the streetlight, rather than where he thinks he dropped it. “Because the light is better here,” he said.

This “streetlight effect” is the tendency of researchers to study what is easy to study. I use this story in my course on Research Design and Ethnographic Methods to explain why so much research on disparities in educational outcomes is done in classrooms and not in students’ homes. Children are much easier to study at school than in their homes, even though many studies show that knowing what happens outside the classroom is important. Nevertheless, schools will continue to be the focus of most research because they generate big data and homes don’t.

The streetlight effect is one factor that prevents big data studies from being useful in the real world – especially studies analyzing easily available user-generated data from the Internet. Researchers assume that this data offers a window into reality. It doesn’t necessarily.

Looking at WEIRDOs

Based on the number of tweets following Hurricane Sandy, for example, it might seem as if the storm hit Manhattan the hardest, not the New Jersey shore. Another example: the since-retired Google Flu Trends, which in 2013 tracked online searches relating to flu symptoms to predict doctor visits, but gave estimates twice as high as reports from the Centers for Disease Control and Prevention. Without checking facts on the ground, researchers may fool themselves into thinking that their big data models accurately represent the world they aim to study.

The problem is similar to the “WEIRD” issue in many research studies. Harvard professor Joseph Henrich and colleagues have shown that findings based on research conducted with undergraduates at American universities – whom they describe as “some of the most psychologically unusual people on Earth” – apply only to that population and cannot be used to make any claims about other human populations, including other Americans. Unlike the typical research subject in psychology studies, they argue, most people in the world are not from Western, Educated, Industrialized, Rich and Democratic societies, i.e., WEIRD.

Twitter users are also atypical compared with the rest of humanity, giving rise to what our postdoctoral researcher Sarah Laborde has dubbed the “WEIRDO” problem of data analytics: most people are not Western, Educated, Industrialized, Rich, Democratic and Online.

Context is critical

Understanding the differences between the vast majority of humanity and that small subset of people whose activities are captured in big data sets is critical to correct analysis of the data. Considering the context and meaning of data – not just the data itself – is a key feature of ethnographic research, argues Michael Agar, who has written extensively about how ethnographers come to understand the world….(https://theconversation.com/big-datas-streetlight-effect-where-and-how-we-look-affects-what-we-see-58122More)”

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide


Paper by Jacob Metcalf and Kate Crawford: “There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science,critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce….(More)”

Using Tweets and Posts to Speed Up Organ Donation


David Bornstein in the New York Times: “…But there is a problem: Demand for organ transplants vastly outstrips supply, as my colleague Tina Rosenberg has reported. In 2015 in the United States, there were only about 9,000 deceased donors (each of whom can save up to eight lives) and 6,000 living donors (who most often donate a kidney or liver lobe). Today, more than 121,000 people are on waiting lists, roughly 100,000 for kidney transplants, 15,000 for livers, and 4,000 for hearts. And the lists keep getting longer — 3,000 people are added to the kidney list each month. Last year, more than 4,000 people died while waiting for a new kidney; 3,600 dropped off the waiting list because they became too sick to qualify for a transplant.

Although 95 percent of Americans support organ donation, fewer than half of American adults are registered as donors. Research suggests that the number who donate organs after death could be increased greatly. Moreover, surveys indicate untapped support for living donation, too; nearly one in four people have told pollsters they would be willing to donate a kidney to save the life of a friend, community member or stranger. “If one in 10,000 Americans decided to donate each year, there wouldn’t be a shortage,” said Josh Morrison, who donated a kidney to a stranger and founded WaitList Zero, an organization that works to increase living kidney donation.

What could be done to harness people’s generous impulses more effectively to save lives?

One group attacking the question is Organize, which was founded in 2014 by Rick Segal’s son Greg, and Jenna Arnold, a media producer and educator who has worked with MTV and the United Nations in engaging audiences in social issues. Organize uses technology, open data and insights from behavioral economics to simplify becoming an organ donor.

This approach is shaking up longstanding assumptions.

For example, in the last four decades, people have most often been asked to register as an organ donor as part of renewing or obtaining a driver’s license. This made sense in the 1970s, when the nation’s organ procurement system was being set up, says Blair Sadler, the former president and chief executive of Rady Children’s Hospital in San Diego. He helped draft theUniform Anatomical Gift Act in 1967, which established a national legal framework for organ donation. “Health care leaders were asking, ‘How do we make this more routine?’” he recalled. “It’s hard to get people to put it in their wills. Oh, there’s a place where people have to go every five years” — their state Department of Motor Vehicles.

Today, governments allow individuals to initiate registrations online, but the process can be cumbersome. For example, New York State required me to fill out a digital form on my computer, then print it out and mail it to Albany. Donate Life America, by contrast, allows individuals to register online as an organ donor just by logging in with email or a Facebook or Google account — much easier.

In practice, legal registration may be overemphasized. It may be just as important to simply make your wishes known to your loved ones. When people tell relatives, “If something happens to me, I want to be an organ donor,” families almost always respect their wishes. This is particularly important for minors, who cannot legally register as donors.

Using that insight, Organize is making it easier to conduct social media campaigns to both prompt and collect sentiments about organ donation from Facebook, Twitter and Instagram.

If you post or tweet about organ donation, or include a hashtag like #iwanttobeanorgandonor, #organdonor, #donatemyparts, or any of a number of other relevant terms, Organize captures the information and logs it in a registry. In a year, it has gathered the names of nearly 600,000 people who declare support for organ donation. Now the big question is: Will it actually increase organ donation rates?

We should begin getting an idea pretty soon. Organize has been working with the Nevada Donor Network to test its registry. And in the coming months, several other states will begin using it….(More)”

Regulatory Transformations: An Introduction


Chapter by Bettina Lange and Fiona Haines in the book Regulatory Transformations: “Regulation is no longer the prerogative of either states or markets. Increasingly citizens in association with businesses catalyse regulation which marks the rise of a social sphere in regulation. Around the world, in San Francisco, Melbourne, Munich and Mexico City, citizens have sought to transform how and to what end economic transactions are conducted. For instance, ‘carrot mob’ initiatives use positive economic incentives, not provided by a state legal system, but by a collective of civil society actors in order to change business behaviour. In contrast to ‘negative’ consumer boycotts, ‘carrotmob’ events use ‘buycotts’. They harness competition between businesses as the lever for changing how and for what purpose business transactions are conducted. Through new social media ‘carrotmobs’ mobilize groups of citizens to purchase goods at a particular time in a specific shop. The business that promises to spend the greatest percentage of its takings on, for instance, environmental improvements, such as switching to a supplier of renewable energy, will be selected for an organized shopping spree and financially benefit from the extra income it receives from the ‘carrot mob’ event.’Carrot mob’ campaigns chime with other fundamental challenges to conventional economic activity, such as the shared use of consumer goods through citizens collective consumption which questions traditional conceptions of private property….(More; Other Chapters)”

 

OSoMe: The IUNI observatory on social media


Clayton A Davis et al at Peer J. PrePrint:  “The study of social phenomena is becoming increasingly reliant on big data from online social networks. Broad access to social media data, however, requires software development skills that not all researchers possess. Here we present the IUNI Observatory on Social Media, an open analytics platform designed to facilitate computational social science. The system leverages a historical, ongoing collection of over 70 billion public messages from Twitter. We illustrate a number of interactive open-source tools to retrieve, visualize, and analyze derived data from this collection. The Observatory, now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort coordinated by the Indiana University Network Science Institute.”…(More)”

Citizens breaking out of filter bubbles: Urban screens as civic media


Conference Paper by Satchell, Christine et al :”Social media platforms risk polarising public opinions by employing proprietary algorithms that produce filter bubbles and echo chambers. As a result, the ability of citizens and communities to engage in robust debate in the public sphere is diminished. In response, this paper highlights the capacity of urban interfaces, such as pervasive displays, to counteract this trend by exposing citizens to the socio-cultural diversity of the city. Engagement with different ideas, networks and communities is crucial to both innovation and the functioning of democracy. We discuss examples of urban interfaces designed to play a key role in fostering this engagement. Based on an analysis of works empirically-grounded in field observations and design research, we call for a theoretical framework that positions pervasive displays and other urban interfaces as civic media. We argue that when designed for more than wayfinding, advertisement or television broadcasts, urban screens as civic media can rectify some of the pitfalls of social media by allowing the polarised user to break out of their filter bubble and embrace the cultural diversity and richness of the city….(More)”