Time for sharing data to become routine: the seven excuses for not doing so are all invalid


Paper by Richard Smith and Ian Roberts: “Data are more valuable than scientific papers but researchers are incentivised to publish papers not share data. Patients are the main beneficiaries of data sharing but researchers have several incentives not to share: others might use their data to get ahead in the academic rat race; they might be scooped; their results might not be replicable; competitors may reach different conclusions; their data management might be exposed as poor; patient confidentiality might be breached; and technical difficulties make sharing impossible. All of these barriers can be overcome and researchers should be rewarded for sharing data. Data sharing must become routine….(More)”

Case Studies of Government Use of Big Data in Latin America: Brazil and Mexico


Chapter by Roberto da Mota Ueti, Daniela Fernandez Espinosa, Laura Rafferty, Patrick C. K. Hung in Big Data Applications and Use Cases: “Big Data is changing our world with masses of information stored in huge servers spread across the planet. This new technology is changing not only companies but governments as well. Mexico and Brazil, two of the most influential countries in Latin America, are entering a new era and as a result, facing challenges in all aspects of public policy. Using Big Data, the Brazilian Government is trying to decrease spending and use public money better by grouping public information with stored information on citizens in public services. With new reforms in education, finances and telecommunications, the Mexican Government is taking on a bigger role in efforts to channel the country’s economic policy into an improvement of the quality of life of their habitants. It is known that technology is an important part for sub-developed countries, who are trying to make a difference in certain contexts such as reducing inequality or regulating the good usage of economic resources. The good use of Big Data, a new technology that is in charge of managing a big quantity of information, can be crucial for the Mexican Government to reach the goals that have been set in the past under Peña Nieto’s administration. This article focuses on how the Brazilian and Mexican Governments are managing the emerging technologies of Big Data and how it includes them in social and industrial projects to enhance the growth of their economies. The article also discusses the benefits of these uses of Big Data and the possible problems that occur related to security and privacy of information….(More)’

Big data: big power shifts?


Special issue of Internet Policy Review: “Facing general conceptions of the power effects of big data, this thematic edition is interested in studies that scrutinise big data and power in concrete fields of application. It brings together scholars from different disciplines who analyse the fields agriculture, education, border control and consumer policy. As will be made explicit in the following, each of the articles tells us something about firstly, what big data is and how it relates to power. They secondly also shed light on how we should shape “the big data society” and what research questions need to be answered to be able to do so….

The ethics of big data in big agriculture
Isabelle M. Carbonell, University of California, Santa Cruz

Regulating “big data education” in Europe: lessons learned from the US
Yoni Har Carmel, University of Haifa

The borders, they are a-changin’! The emergence of socio-digital borders in the EU
Magdalena König, Maastricht University

Beyond consent: improving data protection through consumer protection law
Michiel Rhoen, Leiden University…

(More)”

Reining in the Big Promise of Big Data: Transparency, Inequality, and New Regulatory Frontiers


Paper by Philipp Hacker and Bilyana Petkova: “The growing differentiation of services based on Big Data harbors the potential for both greater societal inequality and for greater equality. Anti-discrimination law and transparency alone, however, cannot do the job of curbing Big Data’s negative externalities while fostering its positive effects.

To rein in Big Data’s potential, we adapt regulatory strategies from behavioral economics, contracts and criminal law theory. Four instruments stand out: First, active choice may be mandated between data collecting services (paid by data) and data free services (paid by money). Our suggestion provides concrete estimates for the price range of a data free option, sheds new light on the monetization of data collecting services, and proposes an “inverse predatory pricing” instrument to limit excessive pricing of the data free option. Second, we propose using the doctrine of unconscionability to prevent contracts that unreasonably favor data collecting companies. Third, we suggest democratizing data collection by regular user surveys and data compliance officers partially elected by users. Finally, we trace back new Big Data personalization techniques to the old Hartian precept of treating like cases alike and different cases – differently. If it is true that a speeding ticket over $50 is less of a disutility for a millionaire than for a welfare recipient, the income and wealth-responsive fines powered by Big Data that we suggest offer a glimpse into the future of the mitigation of economic and legal inequality by personalized law. Throughout these different strategies, we show how salience of data collection can be coupled with attempts to prevent discrimination against and exploitation of users. Finally, we discuss all four proposals in the context of different test cases: social media, student education software and credit and cell phone markets.

Many more examples could and should be discussed. In the face of increasing unease about the asymmetry of power between Big Data collectors and dispersed users, about differential legal treatment, and about the unprecedented dimensions of economic inequality, this paper proposes a new regulatory framework and research agenda to put the powerful engine of Big Data to the benefit of both the individual and societies adhering to basic notions of equality and non-discrimination….(More)”

Scientists Are Just as Confused About the Ethics of Big-Data Research as You


Sarah Zhang at Wired: “When a rogue researcher last week released 70,000 OkCupid profiles, complete with usernames and sexual preferences, people were pissed. When Facebook researchers manipulated stories appearing in Newsfeeds for a mood contagion study in 2014, people were really pissed. OkCupid filed a copyright claim to take down the dataset; the journal that published Facebook’s study issued an “expression of concern.” Outrage has a way of shaping ethical boundaries. We learn from mistakes.

Shockingly, though, the researchers behind both of those big data blowups never anticipated public outrage. (The OkCupid research does not seem to have gone through any kind of ethical review process, and a Cornell ethics review board approved the Facebook experiment.) And that shows just how untested the ethics of this new field of research is. Unlike medical research, which has been shaped by decades of clinical trials, the risks—and rewards—of analyzing big, semi-public databases are just beginning to become clear.

And the patchwork of review boards responsible for overseeing those risks are only slowly inching into the 21st century. Under the Common Rule in the US, federally funded research has to go through ethical review. Rather than one unified system though, every single university has its own institutional review board, or IRB. Most IRB members are researchers at the university, most often in the biomedical sciences. Few are professional ethicists.

Even fewer have computer science or security expertise, which may be necessary to protect participants in this new kind of research. “The IRB may make very different decisions based on who is on the board, what university it is, and what they’re feeling that day,” says Kelsey Finch, policy counsel at the Future of Privacy Forum. There are hundreds of these IRBs in the US—and they’re grappling with research ethics in the digital age largely on their own….

Or maybe other institutions, like the open science repositories asking researchers to share data, should be picking up the slack on ethical issues. “Someone needs to provide oversight, but the optimal body is unlikely to be an IRB, which usually lacks subject matter expertise in de-identification and re-identification techniques,” Michelle Meyer, a bioethicist at Mount Sinai, writes in an email.

Even among Internet researchers familiar with the power of big data, attitudes vary. When Katie Shilton, an information technology research at the University of Maryland, interviewed 20 online data researchers, she found “significant disagreement” over issues like the ethics of ignoring Terms of Service and obtaining informed consent. Surprisingly, the researchers also said that ethical review boards had never challenged the ethics of their work—but peer reviewers and colleagues had. Various groups like theAssociation of Internet Researchers and the Center for Applied Internet Data Analysis have issued guidelines, but the people who actually have power—those on institutional review boards–are only just catching up.

Outside of academia, companies like Microsoft have started to institute their own ethical review processes. In December, Finch at the Future of Privacy Forum organized a workshop called Beyond IRBs to consider processes for ethical review outside of federally funded research. After all, modern tech companies like Facebook, OkCupid, Snapchat, Netflix sit atop a trove of data 20th century social scientists could have only dreamed up.

Of course, companies experiment on us all the time, whether it’s websites A/B testing headlines or grocery stores changing the configuration of their checkout line. But as these companies hire more data scientists out of PhD programs, academics are seeing an opportunity to bridge the divide and use that data to contribute to public knowledge. Maybe updated ethical guidelines can be forged out of those collaborations. Or it just might be a mess for a while….(More)”

BeMyEye: Crowdsourcing is making it easier to gather data fast


Jack Torrance at Management Today: “The era of big data is upon us. Dozens of well-funded start-ups have sprung up of late claiming to be able to turn raw data into ‘actionable insights’ that would have been unimaginable a few years ago. But the process of actually collecting data is still not always straightforward….

London-based start-up BeMyEye (not to be confused with Be My Eyes, an iPhone app that claims to ‘help the blind see’) has built an army of casual data gatherers that report back via their phones. ‘For companies that sell their product to high street retailers or supermarkets, being able to verify the presence of their product, the adequacy of the promotions, the positioning in relation to competitors, this is all invaluable intelligence,’ CEO Luca Pagano tells MT. ‘Our crowd is able to observe and feed this information back to these brands very, very quickly.’…

They can do more than check prices in shops. Some of its clients (which include Heineken, Illy and Three) have used the service to check billboards they are paying for have actually been put up correctly. ‘We realised the level of [billboard] compliance is actually below 90%,’ says Pagano. It can also be used to generate sales leads….

BeMyEyes isn’t the only company that’s exploring this business model. San Francisco company Premise is using a similar network of data gatherers to monitor food prices and other metrics in developing countries for NGOs and governments as well as commercial organisations. It’s not hard to see why they would be an attractive proposition for clients, but the challenge for both of these businesses will be ensuring they can find enough reliable and effective data gatherers to keep the information flowing in at a high enough quality….(More)”

Twelve principles for open innovation 2.0


Martin Curley in Nature: “A new mode of innovation is emerging that blurs the lines between universities, industry, governments and communities. It exploits disruptive technologies — such as cloud computing, the Internet of Things and big data — to solve societal challenges sustainably and profitably, and more quickly and ably than before. It is called open innovation 2.0 (ref. 1).

Such innovations are being tested in ‘living labs’ in hundreds of cities. In Dublin, for example, the city council has partnered with my company, the technology firm Intel (of which I am a vice-president), to install a pilot network of sensors to improve flood management by measuring local rain fall and river levels, and detecting blocked drains. Eindhoven in the Netherlands is working with electronics firm Philips and others to develop intelligent street lighting. Communications-technology firm Ericsson, the KTH Royal Institute of Technology, IBM and others are collaborating to test self-driving buses in Kista, Sweden.

Yet many institutions and companies remain unaware of this radical shift. They often confuse invention and innovation. Invention is the creation of a technology or method. Innovation concerns the use of that technology or method to create value. The agile approaches needed for open innovation 2.0 conflict with the ‘command and control’ organizations of the industrial age (see ‘How innovation modes have evolved’). Institutional or societal cultures can inhibit user and citizen involvement. Intellectual-property (IP) models may inhibit collaboration. Government funders can stifle the emergence of ideas by requiring that detailed descriptions of proposed work are specified before research can begin. Measures of success, such as citations, discount innovation and impact. Policymaking lags behind the market place….

Keys to collaborative innovation

  1. Purpose. Efforts and intellects aligned through commitment rather than compliance deliver an impact greater than the sum of their parts. A great example is former US President John F. Kennedy’s vision of putting a man on the Moon. Articulating a shared value that can be created is important. A win–win scenario is more sustainable than a win–lose outcome.
  2. Partner. The ‘quadruple helix’ of government, industry, academia and citizens joining forces aligns goals, amplifies resources, attenuates risk and accelerates progress. A collaboration between Intel, University College London, Imperial College London and Innovate UK’s Future Cities Catapult is working in the Intel Collaborative Research Institute to improve people’s well-being in cities, for example to enable reduction of air pollution.
  3. Platform. An environment for collaboration is a basic requirement. Platforms should be integrated and modular, allowing a plug-and-play approach. They must be open to ensure low barriers to use, catalysing the evolution of a community. Challenges in security, standards, trust and privacy need to be addressed. For example, the Open Connectivity Foundation is securing interoperability for the Internet of Things.
  4. Possibilities. Returns may not come from a product but from the business model that enabled it, a better process or a new user experience. Strategic tools are available, such as industrial designer Larry Keeley’s breakdown of innovations into ten types in four categories: finance, process, offerings and delivery.
  5. Plan. Adoption and scale should be the focus of innovation efforts, not product creation. Around 20% of value is created when an innovation is established; more than 80% comes when it is widely adopted7. Focus on the ‘four Us’: utility (value to the user); usability; user experience; and ubiquity (designing in network effects).
  6. Pyramid. Enable users to drive innovation. They inspired two-thirds of innovations in semiconductors and printed circuit boards, for example. Lego Ideas encourages children and others to submit product proposals — submitters must get 10,000 supporters for their idea to be reviewed. Successful inventors get 1% of royalties.
  7. Problem. Most innovations come from a stated need. Ethnographic research with users, customers or the environment can identify problems and support brainstorming of solutions. Create a road map to ensure the shortest path to a solution.
  8. Prototype. Solutions need to be tested and improved through rapid experimentation with users and citizens. Prototyping shows how applicable a solution is, reduces the risks of failures and can reveal pain points. ‘Hackathons’, where developers come together to rapidly try things, are increasingly common.
  9. Pilot. Projects need to be implemented in the real world on small scales first. The Intel Collaborative Research Institute runs research projects in London’s parks, neighbourhoods and schools. Barcelona’s Laboratori — which involves the quadruple helix — is pioneering open ‘living lab’ methods in the city to boost culture, knowledge, creativity and innovation.
  10. Product. Prototypes need to be converted into viable commercial products or services through scaling up and new infrastructure globally. Cloud computing allows even small start-ups to scale with volume, velocity and resilience.
  11. Product service systems. Organizations need to move from just delivering products to also delivering related services that improve sustainability as well as profitability. Rolls-Royce sells ‘power by the hour’ — hours of flight time rather than jet engines — enabled by advanced telemetry. The ultimate goal of open innovation 2.0 is a circular or performance economy, focused on services and reuse rather than consumption and waste.
  12. Process. Innovation is a team sport. Organizations, ecosystems and communities should measure, manage and improve their innovation processes to deliver results that are predictable, probable and profitable. Agile methods supported by automation shorten the time from idea to implementation….(More)”

Big Data for public policy: the quadruple helix


Julia Lane in the Journal of Policy Analysis and Management: “Data from the federal statistical system, particularly the Census Bureau, have long been a key resource for public policy. Although most of those data have been collected through purposive surveys, there have been enormous strides in the use of administrative records on business (Jarmin & Miranda, 2002), jobs (Abowd, Halti- wanger, & Lane, 2004), and individuals (Wagner & Layne, 2014). Those strides are now becoming institutionalized. The President has allocated $10 million to an Administrative Records Clearing House in his FY2016 budget. Congress is considering a bill to use administrative records, entitled the Evidence-Based Policymaking Commission Act, sponsored by Patty Murray and Paul Ryan. In addition, the Census Bureau has established a Center for “Big Data.” In my view, these steps represent important strides for public policy, but they are only part of the story. Public policy researchers must look beyond the federal statistical system and make use of the vast resources now available for research and evaluation.

All politics is local; “Big Data” now mean that policy analysis can increasingly be local. Modern empirical policy should be grounded in data provided by a network of city/university data centers. Public policy schools should partner with scholars in the emerging field of data science to train the next generation of policy researchers in the thoughtful use of the new types of data; the apparent secular decline in the applications to public policy schools is coincident with the emergence of data science as a field of study in its own right. The role of national statistical agencies should be fundamentally rethought—and reformulated to one of four necessary strands in the data infrastructure; that of providing benchmarks, confidentiality protections, and national statistics….(More)”

Big data’s ‘streetlight effect’: where and how we look affects what we see


 at the Conversation: “Big data offers us a window on the world. But large and easily available datasets may not show us the world we live in. For instance, epidemiological models of the recent Ebola epidemic in West Africa using big data consistently overestimated the risk of the disease’s spread and underestimated the local initiatives that played a critical role in controlling the outbreak.

Researchers are rightly excited about the possibilities offered by the availability of enormous amounts of computerized data. But there’s reason to stand back for a minute to consider what exactly this treasure trove of information really offers. Ethnographers like me use a cross-cultural approach when we collect our data because family, marriage and household mean different things in different contexts. This approach informs how I think about big data.

We’ve all heard the joke about the drunk who is asked why he is searching for his lost wallet under the streetlight, rather than where he thinks he dropped it. “Because the light is better here,” he said.

This “streetlight effect” is the tendency of researchers to study what is easy to study. I use this story in my course on Research Design and Ethnographic Methods to explain why so much research on disparities in educational outcomes is done in classrooms and not in students’ homes. Children are much easier to study at school than in their homes, even though many studies show that knowing what happens outside the classroom is important. Nevertheless, schools will continue to be the focus of most research because they generate big data and homes don’t.

The streetlight effect is one factor that prevents big data studies from being useful in the real world – especially studies analyzing easily available user-generated data from the Internet. Researchers assume that this data offers a window into reality. It doesn’t necessarily.

Looking at WEIRDOs

Based on the number of tweets following Hurricane Sandy, for example, it might seem as if the storm hit Manhattan the hardest, not the New Jersey shore. Another example: the since-retired Google Flu Trends, which in 2013 tracked online searches relating to flu symptoms to predict doctor visits, but gave estimates twice as high as reports from the Centers for Disease Control and Prevention. Without checking facts on the ground, researchers may fool themselves into thinking that their big data models accurately represent the world they aim to study.

The problem is similar to the “WEIRD” issue in many research studies. Harvard professor Joseph Henrich and colleagues have shown that findings based on research conducted with undergraduates at American universities – whom they describe as “some of the most psychologically unusual people on Earth” – apply only to that population and cannot be used to make any claims about other human populations, including other Americans. Unlike the typical research subject in psychology studies, they argue, most people in the world are not from Western, Educated, Industrialized, Rich and Democratic societies, i.e., WEIRD.

Twitter users are also atypical compared with the rest of humanity, giving rise to what our postdoctoral researcher Sarah Laborde has dubbed the “WEIRDO” problem of data analytics: most people are not Western, Educated, Industrialized, Rich, Democratic and Online.

Context is critical

Understanding the differences between the vast majority of humanity and that small subset of people whose activities are captured in big data sets is critical to correct analysis of the data. Considering the context and meaning of data – not just the data itself – is a key feature of ethnographic research, argues Michael Agar, who has written extensively about how ethnographers come to understand the world….(https://theconversation.com/big-datas-streetlight-effect-where-and-how-we-look-affects-what-we-see-58122More)”

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide


Paper by Jacob Metcalf and Kate Crawford: “There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science,critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce….(More)”