Why Protecting Data Privacy Matters, and When


Anne Russell at Data Science Central: “It’s official. Public concerns over the privacy of data used in digital approaches have reached an apex. Worried about the safety of digital networks, consumers want to gain control over what they increasingly sense as a loss of power over how their data is used. It’s not hard to wonder why. Look at the extent of coverage on the U.S. Government data breach last month and the sheer growth in the number of attacks against government and others overall. Then there is the increasing coverage on the inherent security flaws built into the internet, through which most of our data flows. The costs of data breaches to individuals, industries, and government are adding up. And users are taking note…..
If you’re not sure whether the data fueling your approach will raise privacy and security flags, consider the following. When it comes to data privacy and security, not all data is going to be of equal concern. Much depends on the level of detail in data content, data type, data structure, volume, and velocity, and indeed how the data itself will be used and released.

First there is the data where security and privacy has always mattered and for which there is already an existing and well galvanized body of law in place. Foremost among these is classified or national security data where data usage is highly regulated and enforced. Other data for which there exists a considerable body of international and national law regulating usage includes:

  • Proprietary Data – specifically the data that makes up the intellectual capital of individual businesses and gives them their competitive economic advantage over others, including data protected under copyright, patent, or trade secret laws and the sensitive, protected data that companies collect on behalf of its customers;
  • Infrastructure Data – data from the physical facilities and systems – such as roads, electrical systems, communications services, etc. – that enable local, regional, national, and international economic activity; and
  • Controlled Technical Data – technical, biological, chemical, and military-related data and research that could be considered of national interest and be under foreign export restrictions….

The second group of data that raises privacy and security concerns is personal data. Commonly referred to as Personally Identifiable Information (PII), it is any data that distinguishes individuals from each other. It is also the data that an increasing number of digital approaches rely on, and the data whose use tends to raise the most public ire. …

A third category of data needing privacy consideration is the data related to good people working in difficult or dangerous places. Activists, journalists, politicians, whistle-blowers, business owners, and others working in contentious areas and conflict zones need secure means to communicate and share data without fear of retribution and personal harm.  That there are parts of the world where individuals can be in mortal danger for speaking out is one of the reason that TOR (The Onion Router) has received substantial funding from multiple government and philanthropic groups, even at the high risk of enabling anonymized criminal behavior. Indeed, in the absence of alternate secure networks on which to pass data, many would be in grave danger, including those such as the organizers of the Arab Spring in 2010 as well as dissidents in Syria and elsewhere….(More)”

 

The Data Revolution


Review of Rob Kitchin’s The Data Revolution: Big Data, Open Data, Data Infrastructures & their Consequences by David Moats in Theory, Culture and Society: “…As an industry, academia is not immune to cycles of hype and fashion. Terms like ‘postmodernism’, ‘globalisation’, and ‘new media’ have each had their turn filling the top line of funding proposals. Although they are each grounded in tangible shifts, these terms become stretched and fudged to the point of becoming almost meaningless. Yet, they elicit strong, polarised reactions. For at least the past few years, ‘big data’ seems to be the buzzword, which elicits funding, as well as the ire of many in the social sciences and humanities.

Rob Kitchin’s book The Data Revolution is one of the first systematic attempts to strip back the hype surrounding our current data deluge and take stock of what is really going on. This is crucial because this hype is underpinned by very real societal change, threats to personal privacy and shifts in store for research methods. The book acts as a helpful wayfinding device in an unfamiliar terrain, which is still being reshaped, and is admirably written in a language relevant to social scientists, comprehensible to policy makers and accessible even to the less tech savvy among us.

The Data Revolution seems to present itself as the definitive account of this phenomena but in filling this role ends up adopting a somewhat diplomatic posture. Kitchin takes all the correct and reasonable stances on the matter and advocates all the right courses of action but he is not able to, in the context of this book, pursue these propositions fully. This review will attempt to tease out some of these latent potentials and how they might be pushed in future work, in particular the implications of the ‘performative’ character of both big data narratives and data infrastructures for social science research.

Kitchin’s book starts with the observation that ‘data’ is a misnomer – etymologically data should refer to phenomena in the world which can be abstracted, measured etc. as opposed to the representations and measurements themselves, which should by all rights be called ‘capta’. This is ironic because the worst offenders in what Kitchin calls “data boosterism” seem to conflate data with ‘reality’, unmooring data from its conditions of production and making relationship between the two given or natural.

As Kitchin notes, following Bowker (2005), ‘raw data’ is an oxymoron: data are not so much mined as produced and are necessarily framed technically, ethically, temporally, spatially and philosophically. This is the central thesis of the book, that data and data infrastructures are not neutral and technical but also social and political phenomena. For those at the critical end of research with data, this is a starting assumption, but one which not enough practitioners heed. Most of the book is thus an attempt to flesh out these rapidly expanding data infrastructures and their politics….

Kitchin is at his best when revealing the gap between the narratives and the reality of data analysis such as the fallacy of empiricism – the assertion that, given the granularity and completeness of big data sets and the availability of machine learning algorithms which identify patterns within data (with or without the supervision of human coders), data can “speak for themselves”. Kitchin reminds us that no data set is complete and even these out-of-the-box algorithms are underpinned by theories and assumptions in their creation, and require context specific knowledge to unpack their findings. Kitchin also rightly raises concerns about the limits of big data, that access and interoperability of data is not given and that these gaps and silences are also patterned (Twitter is biased as a sample towards middle class, white, tech savy people). Yet, this language of veracity and reliability seems to suggest that big data is being conceptualised in relation to traditional surveys, or that our population is still the nation state, when big data could helpfully force us to reimagine our analytic objects and truth conditions and more pressingly, our ethics (Rieder, 2013).

However, performativity may again complicate things. As Kitchin observes, supermarket loyalty cards do not just create data about shopping, they encourage particular sorts of shopping; when research subjects change their behaviour to cater to the metrics and surveillance apparatuses built into platforms like Facebook (Bucher, 2012), then these are no longer just data points representing the social, but partially constitutive of new forms of sociality (this is also true of other types of data as discussed by Savage (2010), but in perhaps less obvious ways). This might have implications for how we interpret data, the distribution between quantitative and qualitative approaches (Latour et al., 2012) or even more radical experiments (Wilkie et al., 2014). Kitchin is relatively cautious about proposing these sorts of possibilities, which is not the remit of the book, though it clearly leaves the door open…(More)”

The science prize that’s making waves


Gillian Tett at the Financial Times: “The Ocean Health XPrize reveals a new fashion among philanthropists’…There is another reason why the Ocean Health XPrize fascinates me: what it reveals about the new fashion among philanthropists for handing out big scientific prizes. The idea is not a new one: wealthy people and governments have been giving prizes for centuries. In 1714, for example, the British government passed the Longitude Act, establishing a board to offer reward money for innovation in navigation — the most money was won by John Harrison, a clockmaker who invented the marine chronometer.

But a fascinating shift has taken place in the prize-giving game. In previous decades, governments or philanthropists usually bestowed money to recognise past achievements, often in relation to the arts. In 2012, McKinsey, the management consultants, estimated that before 1991, 97 per cent of prize money was a “recognition” award — for example, the Nobel Prizes. Today, however, four-fifths of all prize money is “incentive” or “inducement” awards. This is because many philanthropists and government agencies have started staging competitions to spur innovation in different fields, particularly science.

The best known of these is the XPrize Foundation, initiated two decades ago by Peter Diamandis, the entrepreneur. The original award, the Ansari XPrize, offered $10m to the first privately financed team to put a vehicle into space. Since then, the XPrize has spread its wings into numerous different fields, including education and life sciences. Indeed, having given $30m in prize money so far, it has another $70m of competitions running, including the Google Lunar XPrize, which is offering $30m to land a privately funded robot on the moon.

McKinsey estimates that if you look across the field of prize-giving around the world, “total funds available from large prizes have more than tripled over the last decade to reach $350m”, while the “total prize sector could already be worth as much as $1bn to $2bn”. The Ocean Health XPrize, in other words, is barely a drop in the prize-giving ocean.

Is this a good thing? Not always, it might seem. As the prizes proliferate, they can sometimes overlap. The money being awarded tends — inevitably — to reflect the pet obsessions of philanthropists, rather than what scientists themselves would like to explore. And even the people running the prizes admit that these only work when there is a clear problem to be solved….(More)”

Science to the people!


John Magan, at Digital Agenda for Europe:” …I attended the 2nd Barcelona Citizen Science Day organised as part of the city’s Science Festival. The programme was full and varied and in itself a great example of the wonderful world of do-it-yourself, hands-on, accessible, practical science. A huge variety of projects (see below) was delivered with enthusiasm, passion, and energy!

The day was rounded off with a presentation by Public Lab who showed how a bit of technical ingenuity like cheap cameras on kites and balloons can be used to keep governments and large businesses more honest and accountable – for example, data they collected is being used in court cases against BP for the Deepwater Horizon oil spill in the Gulf of Mexico.

But what was most striking is the empowerment that these Citizen Science projects give individuals to do things for themselves – to take measures to monitor, protect or improve their urban or rural environment; to indulge their curiosity or passions; to improve their finances; to work with others; to do good while having serious fun….If you want to have a deeper look, here are some of the many projects presented on a great variety of themes:

Water

Wildlife

Climate

Arts

Public health

Human

A nice booklet capturing them is available and there’s aslo a summary in Catalan only.

Read more about citizen science in the European Commission….(More)”

Researcher uncovers inherent biases of big data collected from social media sites


Phys.org: “With every click, Facebook, Twitter and other social media users leave behind digital traces of themselves, information that can be used by businesses, government agencies and other groups that rely on “big data.”

But while the information derived from social network sites can shed light on social behavioral traits, some analyses based on this type of data collection are prone to bias from the get-go, according to new research by Northwestern University professor Eszter Hargittai, who heads the Web Use Project.

Since people don’t randomly join Facebook, Twitter or LinkedIn—they deliberately choose to engage —the data are potentially biased in terms of demographics, socioeconomic background or Internet skills, according to the research. This has implications for businesses, municipalities and other groups who use because it excludes certain segments of the population and could lead to unwarranted or faulty conclusions, Hargittai said.

The study, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” was published last month in the journal The Annals of the American Academy of Political and Social Science and is part of a larger, ongoing study.

The buzzword “big data” refers to automatically generated information about people’s behavior. It’s called “big” because it can easily include millions of observations if not more. In contrast to surveys, which require explicit responses to questions, big data is created when people do things using a service or system.

“The problem is that the only people whose behaviors and opinions are represented are those who decided to join the site in the first place,” said Hargittai, the April McClain-Delaney and John Delaney Professor in the School of Communication. “If people are analyzing big data to answer certain questions, they may be leaving out entire groups of people and their voices.”

For example, a city could use Twitter to collect local opinion regarding how to make the community more “age-friendly” or whether more bike lanes are needed. In those cases, “it’s really important to know that people aren’t on Twitter randomly, and you would only get a certain type of person’s response to the question,” said Hargittai.

“You could be missing half the population, if not more. The same holds true for companies who only use Twitter and Facebook and are looking for feedback about their products,” she said. “It really has implications for every kind of group.”…

More information: “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” The Annals of the American Academy of Political and Social Science May 2015 659: 63-76, DOI: 10.1177/0002716215570866

Open Innovation, Open Science, Open to the World


Speech by Carlos Moedas, EU Commissioner for Research, Science and Innovation: “On 25 April this year, an earthquake of magnitude 7.3 hit Nepal. To get real-time geographical information, the response teams used an online mapping tool called Open Street Map. Open Street Map has created an entire online map of the world using local knowledge, GPS tracks and donated sources, all provided on a voluntary basis. It is open license for any use.

Open Street Map was created by a 24 year-old computer science student at University College London in 2004, has today 2 million users and has been used for many digital humanitarian and commercial purposes: From the earthquakes in Haiti and Nepal to the Ebola outbreak in West Africa.

This story is one of many that demonstrate that we are moving into a world of open innovation and user innovation. A world where the digital and physical are coming together. A world where new knowledge is created through global collaborations involving thousands of people from across the world and from all walks of life.

Ladies and gentlemen, over the next two days I would like us to chart a new path for European research and innovation policy. A new strategy that is fit for purpose for a world that is open, digital and global. And I would like to set out at the start of this important conference my own ambitions for the coming years….

Open innovation is about involving far more actors in the innovation process, from researchers, to entrepreneurs, to users, to governments and civil society. We need open innovation to capitalise on the results of European research and innovation. This means creating the right ecosystems, increasing investment, and bringing more companies and regions into the knowledge economy. I would like to go further and faster towards open innovation….

I am convinced that excellent science is the foundation of future prosperity, and that openness is the key to excellence. We are often told that it takes many decades for scientific breakthroughs to find commercial application.

Let me tell you a story which shows the opposite. Graphene was first isolated in the laboratory by Profs. Geim and Novoselov at the University of Manchester in 2003 (Nobel Prizes 2010). The development of graphene has since benefitted from major EU support, including ERC grants for Profs. Geim and Novoselov. So I am proud to show you one of the new graphene products that will soon be available on the market.

This light bulb uses the unique thermal dissipation properties of graphene to achieve greater energy efficiencies and a longer lifetime that LED bulbs. It was developed by a spin out company from the University of Manchester, called Graphene Lighting, as is expected to go on sale by the end of the year.

But we must not be complacent. If we look at indicators of the most excellent science, we find that Europe is not top of the rankings in certain areas. Our ultimate goal should always be to promote excellence not only through ERC and Marie Skłodowska-Curie but throughout the entire H2020.

For such an objective we have to move forward on two fronts:

First, we are preparing a call for European Science Cloud Project in order to identify the possibility of creating a cloud for our scientists. We need more open access to research results and the underlying data. Open access publication is already a requirement under Horizon 2020, but we now need to look seriously at open data…

When innovators like LEGO start fusing real bricks with digital magic, when citizens conduct their own R&D through online community projects, when doctors start printing live tissues for patients … Policymakers must follow suit…(More)”

Improving Crowdsourcing and Citizen Science as a Policy Mechanism for NASA


Paper by Balcom Brittany: “This article examines citizen science projects, defined as “a form of open collaboration where members of the public participate in the scientific process, including identifying research questions, collecting and analyzing the data, interpreting the results, and problem solving,” as an effective and innovative tool for National Aeronautics and Space Administration (NASA) science in line with the Obama Administration’s Open Government Directive. Citizen science projects allow volunteers with no technical training to participate in analysis of large sets of data that would otherwise constitute prohibitively tedious and lengthy work for research scientists. Zooniverse.com hosts a multitude of popular space-focused citizen science projects, many of which have been extraordinarily successful and have enabled new research publications and major discoveries. This article takes a multifaceted look at such projects by examining the benefits of citizen science, effective game design, and current desktop computer and mobile device usage trends. It offers suggestions of potential research topics to be studied with emerging technologies, policy considerations, and opportunities for outreach. This analysis includes an overview of other crowdsourced research methods such as distributed computing and contests. New research and data analysis of mobile phone usage, scientific curiosity, and political engagement among Zooniverse.com project participants has been conducted for this study…(More)”

When America Says Yes to Government


Cass Sunstein in the New York Times: “In recent years, the federal government has adopted a large number of soft interventions that are meant to change behavior without mandates and bans. Among them: disclosure of information, such as calorie labels at chain restaurants; graphic warnings against, for example, distracted driving; and automatic enrollment in programs designed to benefit employees, like pension plans.

Informed by behavioral science, such reforms can have large effects while preserving freedom of choice. But skeptics deride these soft interventions as unjustified paternalism, an insult to dignity and a contemporary version of the nanny state. Some people fear that uses of behavioral science will turn out to be manipulative. They don’t want to be nudged.

But what do Americans actually think about soft interventions? I recently conducted a nationally representative survey of 563 people. Small though that number may seem, it gives a reasonable picture of what Americans think, with a margin of error of plus or minus 4.1 percentage points.

The remarkable finding is that most Americans approve of these reforms and want a lot more of them — and their approval generally cuts across partisan lines….(More)

Forging Trust Communities: How Technology Changes Politics


Book by Irene S. Wu: “Bloggers in India used social media and wikis to broadcast news and bring humanitarian aid to tsunami victims in South Asia. Terrorist groups like ISIS pour out messages and recruit new members on websites. The Internet is the new public square, bringing to politics a platform on which to create community at both the grassroots and bureaucratic level. Drawing on historical and contemporary case studies from more than ten countries, Irene S. Wu’s Forging Trust Communities argues that the Internet, and the technologies that predate it, catalyze political change by creating new opportunities for cooperation. The Internet does not simply enable faster and easier communication, but makes it possible for people around the world to interact closely, reciprocate favors, and build trust. The information and ideas exchanged by members of these cooperative communities become key sources of political power akin to military might and economic strength.

Wu illustrates the rich world history of citizens and leaders exercising political power through communications technology. People in nineteenth-century China, for example, used the telegraph and newspapers to mobilize against the emperor. In 1970, Taiwanese cable television gave voice to a political opposition demanding democracy. Both Qatar (in the 1990s) and Great Britain (in the 1930s) relied on public broadcasters to enhance their influence abroad. Additional case studies from Brazil, Egypt, the United States, Russia, India, the Philippines, and Tunisia reveal how various technologies function to create new political energy, enabling activists to challenge institutions while allowing governments to increase their power at home and abroad.

Forging Trust Communities demonstrates that the way people receive and share information through network communities reveals as much about their political identity as their socioeconomic class, ethnicity, or religion. Scholars and students in political science, public administration, international studies, sociology, and the history of science and technology will find this to be an insightful and indispensable work…(More)”

A computational algorithm for fact-checking


Kurzweil News: “Computers can now do fact-checking for any body of knowledge, according to Indiana University network scientists, writing in an open-access paper published June 17 in PLoS ONE.

Using factual information from summary infoboxes from Wikipedia* as a source, they built a “knowledge graph” with 3 million concepts and 23 million links between them. A link between two concepts in the graph can be read as a simple factual statement, such as “Socrates is a person” or “Paris is the capital of France.”

In the first use of this method, IU scientists created a simple computational fact-checker that assigns “truth scores” to statements concerning history, geography and entertainment, as well as random statements drawn from the text of Wikipedia. In multiple experiments, the automated system consistently matched the assessment of human fact-checkers in terms of the humans’ certitude about the accuracy of these statements.

Dealing with misinformation and disinformation

In what the IU scientists describe as an “automatic game of trivia,” the team applied their algorithm to answer simple questions related to geography, history, and entertainment, including statements that matched states or nations with their capitals, presidents with their spouses, and Oscar-winning film directors with the movie for which they won the Best Picture awards. The majority of tests returned highly accurate truth scores.

Lastly, the scientists used the algorithm to fact-check excerpts from the main text of Wikipedia, which were previously labeled by human fact-checkers as true or false, and found a positive correlation between the truth scores produced by the algorithm and the answers provided by the fact-checkers.

Significantly, the IU team found their computational method could even assess the truthfulness of statements about information not directly contained in the infoboxes. For example, the fact that Steve Tesich — the Serbian-American screenwriter of the classic Hoosier film “Breaking Away” — graduated from IU, despite the information not being specifically addressed in the infobox about him.

Using multiple sources to improve accuracy and richness of data

“The measurement of the truthfulness of statements appears to rely strongly on indirect connections, or ‘paths,’ between concepts,” said Giovanni Luca Ciampaglia, a postdoctoral fellow at the Center for Complex Networks and Systems Research in the IU Bloomington School of Informatics and Computing, who led the study….

“These results are encouraging and exciting. We live in an age of information overload, including abundant misinformation, unsubstantiated rumors and conspiracy theories whose volume threatens to overwhelm journalists and the public. Our experiments point to methods to abstract the vital and complex human task of fact-checking into a network analysis problem, which is easy to solve computationally.”

Expanding the knowledge base

Although the experiments were conducted using Wikipedia, the IU team’s method does not assume any particular source of knowledge. The scientists aim to conduct additional experiments using knowledge graphs built from other sources of human knowledge, such as Freebase, the open-knowledge base built by Google, and note that multiple information sources could be used together to account for different belief systems….(More)”