Big Data Is Getting Bigger. So Are the Privacy and Ethical Questions.


Goldie Blumenstyk at The Chronicle of Higher Education: “…The next step in using “big data” for student success is upon us. It’s a little cool. And also kind of creepy.

This new approach goes beyond the tactics now used by hundreds of colleges, which depend on data collected from sources like classroom teaching platforms and student-information systems. It not only makes a technological leap; it also raises issues around ethics and privacy.

Here’s how it works: Whenever you log on to a wireless network with your cellphone or computer, you leave a digital footprint. Move from one building to another while staying on the same network, and that network knows how long you stayed and where you went. That data is collected continuously and automatically from the network’s various nodes.

Now, with the help of a company called Degree Analytics, a few colleges are beginning to use location data collected from students’ cellphones and laptops as they move around campus. Some colleges are using it to improve the kind of advice they might send to students, like a text-message reminder to go to class if they’ve been absent.

Others see it as a tool for making decisions on how to use their facilities. St. Edward’s University, in Austin, Tex., used the data to better understand how students were using its computer-equipped spaces. It found that a renovated lounge, with relatively few computers but with Wi-Fi access and several comfy couches, was one of the most popular such sites on campus. Now the university knows it may not need to buy as many computers as it once thought.

As Gary Garofalo, a co-founder and chief revenue officer of Degree Analytics, told me, “the network data has very intriguing advantages” over the forms of data that colleges now collect.

Some of those advantages are obvious: If you’ve got automatic information on every person walking around with a cellphone, your dataset is more complete than if you need to extract it from a learning-management system or from the swipe-card readers some colleges use to track students’ activities. Many colleges now collect such data to determine students’ engagement with their coursework and campus activities.

Of course, the 24-7 reporting of the data is also what makes this approach seem kind of creepy….

I’m not the first to ask questions like this. A couple of years ago, a group of educators organized by Martin Kurzweil of Ithaka S+R and Mitchell Stevens of Stanford University issued a series of guidelines for colleges and companies to consider as they began to embrace data analytics. Among other principles, the guidelines highlighted the importance of being transparent about how the information is used, and ensuring that institutions’ leaders really understand what companies are doing with the data they collect. Experts at New America weighed in too.

I asked Kurzweil what he makes of the use of Wi-Fi information. Location tracking tends toward the “dicey” side of the spectrum, he says, though perhaps not as far out as using students’ social-media habits, health information, or what they check out from the library. The fundamental question, he says, is “how are they managing it?”… So is this the future? Benz, at least, certainly hopes so. Inspired by the Wi-Fi-based StudentLife research project at Dartmouth College and the experiences Purdue University is having with students’ use of its Forecast app, he’s in talks now with a research university about a project that would generate other insights that might be gleaned from students’ Wi-Fi-usage patterns….(More)

Predicting Public Interest Issue Campaign Participation on Social Media


Jungyun Won, Linda Hon, Ah Ram Lee in the Journal of Public Interest Communication: “This study investigates what motivates people to participate in a social media campaign in the context of animal protection issues.

Structural equation modeling (SEM) tested a proposed research model with survey data from 326 respondents.

Situational awareness, participation benefits, and social ties influence were positive predictors of social media campaign participation intentions. Situational awareness also partially mediates the relationship between participation benefits and participation intentions as well as strong ties influence and participation intentions.

When designing social media campaigns, public interest communicators should raise situational awareness and emphasize participation benefits. Messages shared through social networks, especially via strong ties, also may be more effective than those posted only on official websites or social networking sites (SNSs)….(More)”.

Ethics as Methods: Doing Ethics in the Era of Big Data Research—Introduction


Introduction to the Special issue of Social Media + Society on “Ethics as Methods: Doing Ethics in the Era of Big Data Research”: Building on a variety of theoretical paradigms (i.e., critical theory, [new] materialism, feminist ethics, theory of cultural techniques) and frameworks (i.e., contextual integrity, deflationary perspective, ethics of care), the Special Issue contributes specific cases and fine-grained conceptual distinctions to ongoing discussions about the ethics in data-driven research.

In the second decade of the 21st century, a grand narrative is emerging that posits knowledge derived from data analytics as true, because of the objective qualities of data, their means of collection and analysis, and the sheer size of the data set. The by-product of this grand narrative is that the qualitative aspects of behavior and experience that form the data are diminished, and the human is removed from the process of analysis.

This situates data science as a process of analysis performed by the tool, which obscures human decisions in the process. The scholars involved in this Special Issue problematize the assumptions and trends in big data research and point out the crisis in accountability that emerges from using such data to make societal interventions.

Our collaborators offer a range of answers to the question of how to configure ethics through a methodological framework in the context of the prevalence of big data, neural networks, and automated, algorithmic governance of much of human socia(bi)lity…(More)”.

Defending Politically Vulnerable Organizations Online


Center for Long-Term Cybersecurity (CLTC): “A new report …details how media outlets, human rights groups, NGOs, and other politically vulnerable organizations face significant cybersecurity threats—often at the hands of powerful governments—but have limited resources to protect themselves. The paper, “Defending Politically Vulnerable Organizations Online,” by CLTC Research Fellow Sean Brooks, provides an overview of cybersecurity threats to civil society organizations targeted for political purposes, and explores the ecosystem of resources available to help these organizations improve their cybersecurity.

“From mass surveillance of political dissidents in Thailand to spyware attacks on journalists in Mexico, cyberattacks against civil society organizations have become a persistent problem in recent years,” says Steve Weber, Faculty Director of CLTC. “While journalists, activists, and others take steps to protect themselves, such as installing firewalls and anti-virus software, they often lack the technical ability or capital to establish protections better suited to the threats they face, including phishing. Too few organizations and resources are available help them expand their cybersecurity capabilities.”

To compile their report, Brooks and his colleagues at CLTC undertook an extensive open-source review of more than 100 organizations supporting politically vulnerable organizations, and conducted more than 30 interviews with activists, threat researchers, and cybersecurity professionals. The report details the wide range of threats that politically vulnerable organizations face—from phishing emails, troll campaigns, and government-sanctioned censorship to sophisticated “zero-day” attacks—and it exposes the significant resource constraints that limit these organizations’ access to expertise and technology….(More)”.

The Data Transfer Project


About: “The Data Transfer Project was formed in 2017 to create an open-source, service-to-service data portability platform so that all individuals across the web could easily move their data between online service providers whenever they want.

The contributors to the Data Transfer Project believe portability and interoperability are central to innovation. Making it easier for individuals to choose among services facilitates competition, empowers individuals to try new services and enables them to choose the offering that best suits their needs.

Current contributors include Facebook, Google, Microsoft and Twitter.

Individuals have many reasons to transfer data, but we want to highlight a few examples that demonstrate the additional value of service-to-service portability.

  • A user discovers a new photo printing service offering beautiful and innovative photo book formats, but their photos are stored in their social media account. With the Data Transfer Project, they could visit a website or app offered by the photo printing service and initiate a transfer directly from their social media platform to the photo book service.
  • A user doesn’t agree with the privacy policy of their music service. They want to stop using it immediately, but don’t want to lose the playlists they have created. Using this open-source software, they could use the export functionality of the original Provider to save a copy of their playlists to the cloud. This enables them to import the lists to a new Provider, or multiple Providers, once they decide on a new service.
  • A large company is getting requests from customers who would like to import data from a legacy Provider that is going out of business. The legacy Provider has limited options for letting customers move their data. The large company writes an Adapter for the legacy Provider’s Application Program Interfaces (APIs) that permits users to transfer data to their service, also benefiting other Providers that handle the same data type.
  • A user in a low bandwidth area has been working with an architect on drawings and graphics for a new house. At the end of the project, they both want to transfer all the files from a shared storage system to the user’s cloud storage drive. They go to the cloud storage Data Transfer Project User Interface (UI) and move hundreds of large files directly, without straining their bandwidth.
  • An industry association for supermarkets wants to allow customers to transfer their loyalty card data from one member grocer to another, so they can get coupons based on buying habits between stores. The Association would do this by hosting an industry-specific Host Platform of DTP.

The innovation in each of these examples lies behind the scenes: Data Transfer Project makes it easy for Providers to allow their customers to interact with their data in ways their customers would expect. In most cases, the direct-data transfer experience will be branded and managed by the receiving Provider, and the customer wouldn’t need to see DTP branding or infrastructure at all….

To get a more in-depth understanding of the project, its fundamentals and the details involved, please download “Data Transfer Project Overview and Fundamentals”….(More)”.

Let’s make private data into a public good


Article by Mariana Mazzucato: “The internet giants depend on our data. A new relationship between us and them could deliver real value to society….We should ask how the value of these companies has been created, how that value has been measured, and who benefits from it. If we go by national accounts, the contribution of internet platforms to national income (as measured, for example, by GDP) is represented by the advertisement-related services they sell. But does that make sense? It’s not clear that ads really contribute to the national product, let alone to social well-being—which should be the aim of economic activity. Measuring the value of a company like Google or Facebook by the number of ads it sells is consistent with standard neoclassical economics, which interprets any market-based transaction as signaling the production of some kind of output—in other words, no matter what the thing is, as long as a price is received, it must be valuable. But in the case of these internet companies, that’s misleading: if online giants contribute to social well-being, they do it through the services they provide to users, not through the accompanying advertisements.

This way we have of ascribing value to what the internet giants produce is completely confusing, and it’s generating a paradoxical result: their advertising activities are counted as a net contribution to national income, while the more valuable services they provide to users are not.

Let’s not forget that a large part of the technology and necessary data was created by all of us, and should thus belong to all of us. The underlying infrastructure that all these companies rely on was created collectively (via the tax dollars that built the internet), and it also feeds off network effects that are produced collectively. There is indeed no reason why the public’s data should not be owned by a public repository that sells the data to the tech giants, rather than vice versa. But the key issue here is not just sending a portion of the profits from data back to citizens but also allowing them to shape the digital economy in a way that satisfies public needs. Using big data and AI to improve the services provided by the welfare state—from health care to social housing—is just one example.

Only by thinking about digital platforms as collective creations can we construct a new model that offers something of real value, driven by public purpose. We’re never far from a media story that stirs up a debate about the need to regulate tech companies, which creates a sense that there’s a war between their interests and those of national governments. We need to move beyond this narrative. The digital economy must be subject to the needs of all sides; it’s a partnership of equals where regulators should have the confidence to be market shapers and value creators….(More)”.

Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates


Marshall Allen at ProPublica: “With little public scrutiny, the health insurance industry has joined forces with data brokers to vacuum up personal details about hundreds of millions of Americans, including, odds are, many readers of this story. The companies are tracking your race, education level, TV habits, marital status, net worth. They’re collecting what you post on social media, whether you’re behind on your bills, what you order online. Then they feed this information into complicated computer algorithms that spit out predictions about how much your health care could cost them.

Are you a woman who recently changed your name? You could be newly married and have a pricey pregnancy pending. Or maybe you’re stressed and anxious from a recent divorce. That, too, the computer models predict, may run up your medical bills.

Are you a woman who’s purchased plus-size clothing? You’re considered at risk of depression. Mental health care can be expensive.

Low-income and a minority? That means, the data brokers say, you are more likely to live in a dilapidated and dangerous neighborhood, increasing your health risks.

“We sit on oceans of data,” said Eric McCulley, director of strategic solutions for LexisNexis Risk Solutions, during a conversation at the data firm’s booth. And he isn’t apologetic about using it. “The fact is, our data is in the public domain,” he said. “We didn’t put it out there.”

Insurers contend they use the information to spot health issues in their clients — and flag them so they get services they need. And companies like LexisNexis say the data shouldn’t be used to set prices. But as a research scientist from one company told me: “I can’t say it hasn’t happened.”

At a time when every week brings a new privacy scandal and worries abound about the misuse of personal information, patient advocates and privacy scholars say the insurance industry’s data gathering runs counter to its touted, and federally required, allegiance to patients’ medical privacy. The Health Insurance Portability and Accountability Act, or HIPAA, only protects medical information.

“We have a health privacy machine that’s in crisis,” said Frank Pasquale, a professor at the University of Maryland Carey School of Law who specializes in issues related to machine learning and algorithms. “We have a law that only covers one source of health information. They are rapidly developing another source.”…(More)”.

Activism in the Social Media Age


PewInternet: “This month marks the fifth anniversary of the #BlackLivesMatter hashtag, which was first coined following the acquittal of George Zimmerman in the shooting death of unarmed black teenager Trayvon Martin. In the course of those five years, #BlackLivesMatter has become an archetypal example of modern protests and political engagement on social media: A new Pew Research Center analysis of public tweets finds the hashtag has been used nearly 30 million times on Twitter – an average of 17,002 times per day – as of May 1, 2018.

Use of the #BlackLivesMatter hashtag on Twitter periodically spikes in response to major news events

The conversations surrounding this hashtag often center on issues related to race, violence and law enforcement, and its usage periodically surges surrounding real-world events – most prominently, during the police-related deaths of Alton Sterling and Philando Castile and the subsequent shooting of police officers in Dallas, Texas, and Baton Rouge, Louisiana, in July 2016.1

The rise of the #BlackLivesMatter hashtag – along with others like #MeToo and #MAGA (Make America Great Again) – has sparked a broader discussion about the effectiveness and viability of using social media for political engagement and social activism. To that end, a new survey by the Center finds that majorities of Americans do believe these sites are very or somewhat important for accomplishing a range of political goals, such as getting politicians to pay attention to issues (69% of Americans feel these platforms are important for this purpose) or creating sustained movements for social change (67%).

Certain groups of social media users – most notably, those who are black or Hispanic – view these platforms as an especially important tool for their own political engagement. For example, roughly half of black social media users say these platforms are at least somewhat personally important to them as a venue for expressing their political views or for getting involved with issues that are important to them. Those shares fall to around a third among white social media users.2

At the same time, the public as a whole expresses mixed views about the potential broader impact these sites might be having on political discourse and the nature of political activism. Some 64% of Americans feel that the statement “social media help give a voice to underrepresented groups” describes these sites very or somewhat well. But a larger share say social networking sites distract people from issues that are truly important (77% feel this way), and 71% agree with the assertion that “social media makes people believe they’re making a difference when they really aren’t.” Blacks and whites alike offer somewhat mixed assessments of the benefits and costs of activism on social media. But larger majorities of black Americans say these sites promote important issues or give voice to underrepresented groups, while smaller shares of blacks feel that political engagement on social media produces significant downsides in the form of a distracted public or “slacktivism.”…(More)”.

What if people were paid for their data?


The Economist: “Data Slavery” Jennifer Lyn Morone, an American artist, thinks this is the state in which most people now live. To get free online services, she laments, they hand over intimate information to technology firms. “Personal data are much more valuable than you think,” she says. To highlight this sorry state of affairs, Ms Morone has resorted to what she calls “extreme capitalism”: she registered herself as a company in Delaware in an effort to exploit her personal data for financial gain. She created dossiers containing different subsets of data, which she displayed in a London gallery in 2016 and offered for sale, starting at £100 ($135). The entire collection, including her health data and social-security number, can be had for £7,000.

Only a few buyers have taken her up on this offer and she finds “the whole thing really absurd”. ..Given the current state of digital affairs, in which the collection and exploitation of personal data is dominated by big tech firms, Ms Morone’s approach, in which individuals offer their data for sale, seems unlikely to catch on. But what if people really controlled their data—and the tech giants were required to pay for access? What would such a data economy look like?…

Labour, like data, is a resource that is hard to pin down. Workers were not properly compensated for labour for most of human history. Even once people were free to sell their labour, it took decades for wages to reach liveable levels on average. History won’t repeat itself, but chances are that it will rhyme, Mr Weyl predicts in “Radical Markets”, a provocative new book he has co-written with Eric Posner of the University of Chicago. He argues that in the age of artificial intelligence, it makes sense to treat data as a form of labour.

To understand why, it helps to keep in mind that “artificial intelligence” is something of a misnomer. Messrs Weyl and Posner call it “collective intelligence”: most AI algorithms need to be trained using reams of human-generated examples, in a process called machine learning. Unless they know what the right answers (provided by humans) are meant to be, algorithms cannot translate languages, understand speech or recognise objects in images. Data provided by humans can thus be seen as a form of labour which powers AI. As the data economy grows up, such data work will take many forms. Much of it will be passive, as people engage in all kinds of activities—liking social-media posts, listening to music, recommending restaurants—that generate the data needed to power new services. But some people’s data work will be more active, as they make decisions (such as labelling images or steering a car through a busy city) that can be used as the basis for training AI systems….

But much still needs to happen for personal data to be widely considered as labour, and paid for as such. For one thing, the right legal framework will be needed to encourage the emergence of a new data economy. The European Union’s new General Data Protection Regulation, which came into effect in May, already gives people extensive rights to check, download and even delete personal data held by companies. Second, the technology to keep track of data flows needs to become much more capable. Research to calculate the value of particular data to an AI service is in its infancy.

Third, and most important, people will have to develop a “class consciousness” as data workers. Most people say they want their personal information to be protected, but then trade it away for nearly nothing, something known as the “privacy paradox”. Yet things may be changing: more than 90% of Americans think being in control of who can get data on them is important, according to the Pew Research Centre, a think-tank….(More)”.

Sentiment Analysis of Big Data: Methods, Applications, and Open Challenges


Paper by Shahid Shayaa et al at IEEE: “The development of IoT technologies and the massive admiration and acceptance of social media tools and applications, new doors of opportunity have been opened for using data analytics in gaining meaningful insights from unstructured information. The application of opinion mining and sentiment analysis (OMSA) in the era of big data have been used a useful way in categorize the opinion into different sentiment and in general evaluating the mood of the public. Moreover, different techniques of OMSA have been developed over the years in different datasets and applied to various experimental settings. In this regard, this study presents a comprehensive systematic literature review, aims to discuss both technical aspect of OMSA (techniques, types) and non-technical aspect in the form of application areas are discussed. Furthermore, the study also highlighted both technical aspect of OMSA in the form of challenges in the development of its technique and non-technical challenges mainly based on its application. These challenges are presented as a future direction for research….(More)”.