Curating Research Data: Practical Strategies for Your Digital Repository


Two books edited by Lisa R. Johnston: “Data are becoming the proverbial coin of the digital realm: a research commodity that might purchase reputation credit in a disciplinary culture of data sharing, or buy transparency when faced with funding agency mandates or publisher scrutiny. Unlike most monetary systems, however, digital data can flow in all too great an abundance. Not only does this currency actually “grow” on trees, but it comes from animals, books, thoughts, and each of us! And that is what makes data curation so essential. The abundance of digital research data challenges library and information science professionals to harness this flow of information streaming from research discovery and scholarly pursuit and preserve the unique evidence for future use.

In two volumes—Practical Strategies for Your Digital Repository and A Handbook of Current PracticeCurating Research Data presents those tasked with long-term stewardship of digital research data a blueprint for how to curate those data for eventual reuse. Volume One explores the concepts of research data and the types and drivers for establishing digital data repositories. Volume Two guides you across the data lifecycle through the practical strategies and techniques for curating research data in a digital repository setting. Data curators, archivists, research data management specialists, subject librarians, institutional repository managers, and digital library staff will benefit from these current and practical approaches to data curation.

Digital data is ubiquitous and rapidly reshaping how scholarship progresses now and into the future. The information expertise of librarians can help ensure the resiliency of digital data, and the information it represents, by addressing how the meaning, integrity, and provenance of digital data generated by researchers today will be captured and conveyed to future researchers….(More)”

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

Data in public health


Jeremy Berg in Science: “In 1854, physician John Snow helped curtail a cholera outbreak in a London neighborhood by mapping cases and identifying a central public water pump as the potential source. This event is considered by many to represent the founding of modern epidemiology. Data and analysis play an increasingly important role in public health today. This can be illustrated by examining the rise in the prevalence of autism spectrum disorders (ASDs), where data from varied sources highlight potential factors while ruling out others, such as childhood vaccines, facilitating wise policy choices…. A collaboration between the research community, a patient advocacy group, and a technology company (www.mss.ng) seeks to sequence the genomes of 10,000 well-phenotyped individuals from families affected by ASD, making the data freely available to researchers. Studies to date have confirmed that the genetics of autism are extremely complicated—a small number of genomic variations are closely associated with ASD, but many other variations have much lower predictive power. More than half of siblings, each of whom has ASD, have different ASD-associated variations. Future studies, facilitated by an open data approach, will no doubt help advance our understanding of this complex disorder….

A new data collection strategy was reported in 2013 to examine contagious diseases across the United States, including the impact of vaccines. Researchers digitized all available city and state notifiable disease data from 1888 to 2011, mostly from hard-copy sources. Information corresponding to nearly 88 million cases has been stored in a database that is open to interested parties without restriction (www.tycho.pitt.edu). Analyses of these data revealed that vaccine development and systematic vaccination programs have led to dramatic reductions in the number of cases. Overall, it is estimated that ∼100 million cases of serious childhood diseases have been prevented through these vaccination programs.

These examples illustrate how data collection and sharing through publication and other innovative means can drive research progress on major public health challenges. Such evidence, particularly on large populations, can help researchers and policy-makers move beyond anecdotes—which can be personally compelling, but often misleading—for the good of individuals and society….(More)”

How a Political Scientist Knows What Our Enemies Will Do (Often Before They Do)


Political scientists have now added rigorous mathematical techniques to their social-science toolbox, creating methods to explain—and even predict—the actions of adversaries, thus making society safer as well as smarter. Such techniques allowed the U.S. government to predict the fall of President Ferdinand Marcos of the Philippines in 1986, helping hatch a strategy to ease him out of office and avoid political chaos in that nation. And at Los Angeles International Airport a computer system predicts the tactical calculations of criminals and terrorists, making sure that patrols and checkpoints are placed in ways that adversaries can’t exploit.

The advances in solving the puzzle of human behavior represent a dramatic turnaround for the field of political science, notes Bruce Bueno de Mesquita, a professor of politics at New York University. “In the mid-1960s, I took a statistics course,” he recalls, “and my undergraduate advisor was appalled. He told me that I was wasting my time.” It took researchers many years of patient work, putting piece after piece of the puzzle of human behavior together, to arrive at today’s new knowledge. The result has been dramatic progress in the nation’s ability to protect its interests at home and abroad.

Social scientists have not abandoned the proven tools that Bueno de Mesquita and generations of other scholars acquired as they mastered their discipline. Rather, adding the rigor of mathematical analysis has allowed them to solve more of the puzzle. Mathematical models of human behavior let social scientists assemble a picture of the previously unnoticed forces that drive behavior—forces common to all situations, operating below the emotions, drama, and history that make each conflict unique….(More)”

Crowdsourced Science: Sociotechnical Epistemology in the e-Research Paradigm


Paper by David Watson and Luciano Floridi: “Recent years have seen a surge in online collaboration between experts and amateurs on scientific research. In this article, we analyse the epistemological implications of these crowdsourced projects, with a focus on Zooniverse, the world’s largest citizen science web portal. We use quantitative methods to evaluate the platform’s success in producing large volumes of observation statements and high impact scientific discoveries relative to more conventional means of data processing. Through empirical evidence, Bayesian reasoning, and conceptual analysis, we show how information and communication technologies enhance the reliability, scalability, and connectivity of crowdsourced e-research, giving online citizen science projects powerful epistemic advantages over more traditional modes of scientific investigation. These results highlight the essential role played by technologically mediated social interaction in contemporary knowledge production. We conclude by calling for an explicitly sociotechnical turn in the philosophy of science that combines insights from statistics and logic to analyse the latest developments in scientific research….(More)”

Big data may be reinforcing racial bias in the criminal justice system


Laurel Eckhouse at the Washington Post: “Big data has expanded to the criminal justice system. In Los Angeles, police use computerized “predictive policing” to anticipate crimes and allocate officers. In Fort Lauderdale, Fla., machine-learning algorithms are used to set bond amounts. In states across the country, data-driven estimates of the risk of recidivism are being used to set jail sentences.

Advocates say these data-driven tools remove human bias from the system, making it more fair as well as more effective. But even as they have become widespread, we have little information about exactly how they work. Few of the organizations producing them have released the data and algorithms they use to determine risk.

 We need to know more, because it’s clear that such systems face a fundamental problem: The data they rely on are collected by a criminal justice system in which race makes a big difference in the probability of arrest — even for people who behave identically. Inputs derived from biased policing will inevitably make black and Latino defendants look riskier than white defendants to a computer. As a result, data-driven decision-making risks exacerbating, rather than eliminating, racial bias in criminal justice.
Consider a judge tasked with making a decision about bail for two defendants, one black and one white. Our two defendants have behaved in exactly the same way prior to their arrest: They used drugs in the same amount, have committed the same traffic offenses, owned similar homes and took their two children to the same school every morning. But the criminal justice algorithms do not rely on all of a defendant’s prior actions to reach a bail assessment — just those actions for which he or she has been previously arrested and convicted. Because of racial biases in arrest and conviction rates, the black defendant is more likely to have a prior conviction than the white one, despite identical conduct. A risk assessment relying on racially compromised criminal-history data will unfairly rate the black defendant as riskier than the white defendant.

To make matters worse, risk-assessment tools typically evaluate their success in predicting a defendant’s dangerousness on rearrests — not on defendants’ overall behavior after release. If our two defendants return to the same neighborhood and continue their identical lives, the black defendant is more likely to be arrested. Thus, the tool will falsely appear to predict dangerousness effectively, because the entire process is circular: Racial disparities in arrests bias both the predictions and the justification for those predictions.

We know that a black person and a white person are not equally likely to be stopped by police: Evidence on New York’s stop-and-frisk policy, investigatory stops, vehicle searches and drug arrests show that black and Latino civilians are more likely to be stopped, searched and arrested than whites. In 2012, a white attorney spent days trying to get himself arrested in Brooklyn for carrying graffiti stencils and spray paint, a Class B misdemeanor. Even when police saw him tagging the City Hall gateposts, they sped past him, ignoring a crime for which 3,598 people were arrested by the New York Police Department the following year.

Before adopting risk-assessment tools in the judicial decision-making process, jurisdictions should demand that any tool being implemented undergo a thorough and independent peer-review process. We need more transparencyand better data to learn whether these risk assessments have disparate impacts on defendants of different races. Foundations and organizations developing risk-assessment tools should be willing to release the data used to build these tools to researchers to evaluate their techniques for internal racial bias and problems of statistical interpretation. Even better, with multiple sources of data, researchers could identify biases in data generated by the criminal justice system before the data is used to make decisions about liberty. Unfortunately, producers of risk-assessment tools — even nonprofit organizations — have not voluntarily released anonymized data and computational details to other researchers, as is now standard in quantitative social science research….(More)”.

How to Do Social Science Without Data


Neil Gross in the New York Times: With the death last month of the sociologist Zygmunt Bauman at age 91, the intellectual world lost a thinker of rare insight and range. Because his style of work was radically different from that of most social scientists in the United States today, his passing is an occasion to consider what might be gained if more members of our profession were to follow his example….

Weber saw bureaucracies as powerful, but dispiritingly impersonal. Mr. Bauman amended this: Bureaucracy can be inhuman. Bureaucratic structures had deadened the moral sense of ordinary German soldiers, he contended, which made the Holocaust possible. They could tell themselves they were just doing their job and following orders.

Later, Mr. Bauman turned his scholarly attention to the postwar and late-20th-century worlds, where the nature and role of all-encompassing institutions were again his focal point. Craving stability after the war, he argued, people had set up such institutions to direct their lives — more benign versions of Weber’s bureaucracy. You could go to work for a company at a young age and know that it would be a sheltering umbrella for you until you retired. Governments kept the peace and helped those who couldn’t help themselves. Marriages were formed through community ties and were expected to last.

But by the end of the century, under pressure from various sources, those institutions were withering. Economically, global trade had expanded, while in Europe and North America manufacturing went into decline; job security vanished. Politically, too, changes were afoot: The Cold War drew to an end, Europe integrated and politicians trimmed back the welfare state. Culturally, consumerism seemed to pervade everything. Mr. Bauman noted major shifts in love and intimacy as well, including a growing belief in the contingency of marriage and — eventually — the popularity of online dating.

In Mr. Bauman’s view, it all connected. He argued we were witnessing a transition from the “solid modernity” of the mid-20th century to the “liquid modernity” of today. Life had become freer, more fluid and a lot more risky. In principle, contemporary workers could change jobs whenever they got bored. They could relocate abroad or reinvent themselves through shopping. They could find new sexual partners with the push of a button. But there was little continuity.

Mr. Bauman considered the implications. Some thrived in this new atmosphere; the institutions and norms previously in place could be stultifying, oppressive. But could a transient work force come together to fight for a more equitable distribution of resources? Could shopping-obsessed consumers return to the task of being responsible, engaged citizens? Could intimate partners motivated by short-term desire ever learn the value of commitment?…(More)”

Beyond prediction: Using big data for policy problems


Susan Athey at Science: “Machine-learning prediction methods have been extremely productive in applications ranging from medicine to allocating fire and health inspectors in cities. However, there are a number of gaps between making a prediction and making a decision, and underlying assumptions need to be understood in order to optimize data-driven decision-making…(More)”

Crowdsourcing to Be the Future for Medical Research


PCORI: “Crowdsourcing isn’t just a quick way to get things done on the Internet. When used right, it can accelerate medical research and improve global cardiovascular health, according to a new best-practices “playbook” released by the American Heart Association (AHA) and the Patient-Centered Outcomes Research Institute (PCORI).

“The benefits of crowdsourcing are substantial,” said Rose Marie Robertson, MD, Chief Science Officer of the AHA, who took part in writing the guide. “You can get information from new perspectives and highly innovative ideas that might well not have occurred to you.”

Crowdsourcing Medical Research Priorities: A Guide for Funding Agencies is the work of Precision Medicine Advances using Nationally Crowdsourced Comparative Effectiveness Research (PRANCCER), a joint initiative launched in 2015 by the AHA and PCORI.

“Acknowledging the power of open, multidisciplinary research to drive medical progress, AHA and PCORI turned to the rapidly evolving methodology of crowdsourcing to find out what patients, clinicians, and researchers consider the most urgent priorities in cardiovascular medicine and to shape the direction and design of research targeting those priorities,” according to the guide.

“Engaging patients and other healthcare decision makers in identifying research needs and guiding studies is a hallmark of our patient-centered approach to research, and crowdsourcing offers great potential to catalyze such engagement,” said PCORI Executive Director Joe V. Selby, MD. “We hope the input we’ve received will help us develop new research funding opportunities that will lead to improved care for people with cardiovascular conditions.”

The playbook offers more than a dozen recommendations on the ins and outs of medical crowdsourcing. It stresses the need to have crystal clear objectives and questions, whether you’re dealing with patients, researchers, or clinicians. … (More)”

Mass Observation: The amazing 80-year experiment to record our daily lives


William Cook at BBC Arts: “Eighty years ago, on 30th January 1937, the New Statesman published a letter which launched the largest (and strangest) writers’ group in British literary history.

An anthropologist called Tom Harrisson, a journalist called Charles Madge and a filmmaker called Humphrey Jennings wrote to the magazine asking for volunteers to take part in a new project called Mass Observation. Over a thousand readers responded, offering their services. Remarkably, this ‘scientific study of human social behaviour’ is still going strong today.

Mass Observation was the product of a growing interest in the social sciences, and a growing belief that the mass media wasn’t accurately reflecting the lives of so-called ordinary people. Instead of entrusting news gathering to jobbing journalists, who were under pressure to provide the stories their editors and proprietors wanted, Mass Observation recruited a secret army of amateur reporters, to track the habits and opinions of ‘the man in the street.’

Ironically, the three founders of this egalitarian movement were all extremely well-to-do. They’d all been to public schools and Oxbridge, but this was the ‘Age of Anxiety’, when capitalism was in chaos and dangerous demagogues were on the rise (plus ça change…).

For these idealistic public schoolboys, socialism was the answer, and Mass Observation was the future. By finding out what ‘ordinary’ folk were really doing, and really thinking, they would forge a new society, more attuned to the needs of the common man.

Mass Observation selected 500 citizen journalists, and gave them regular ‘directives’ to report back on virtually every aspect of their daily lives. They were guaranteed anonymity, which gave them enormous freedom. People opened up about themselves (and their peers) to an unprecedented degree.

Even though they were all unpaid, correspondents devoted a great deal of time to this endeavour – writing at great length, in great detail, over many years. As well as its academic value, Mass Observation proved that autobiography is not the sole preserve of the professional writer. For all of us, the urge to record and reflect upon our lives is a basic human need.

The Second World War was the perfect forum for this vast collective enterprise. Mass Observation became a national diary of life on the home front. For historians, the value of such uncensored revelations is enormous. These intimate accounts of air raids and rationing are far more revealing and evocative than the jolly state-sanctioned reportage of the war years.

After the war, Mass Observation became more commercial, supplying data for market research, and during the 1960s this extraordinary experiment gradually wound down. It was rescued from extinction by the historian Asa Briggs….

The founders of Mass Observation were horrified by what they called “the revival of racial superstition.” Hitler, Franco and Mussolini were in the forefront of their minds. “We are all in danger of extinction from such outbursts of atavism,” they wrote, in 1937. “We look to science to help us, only to find that science is too busy forging new weapons of mass destruction.”

For its founders, Mass Observation was a new science which would build a better future. For its countless correspondents, however, it became something more than that – not merely a social science, but a communal work of art….(More)”.