Crowdcrafting


Crowdcrafting is a web-based service that invites volunteers to contribute to scientific projects developed by citizens, professionals or institutions that need help to solve problems, analyze data or complete challenging tasks that cant be done by machines alone, but require human intelligence. The platform is 100% open source – that is its software is developed and distributed freely – and 100% open-science, making scientific research accessible to everyone.

Crowdcrafting uses PyBossa software: Our open source framework for crowdsourcing projects. Institutions, such as the British Museum, CERN and United Nations (UNITAR), are also PyBossa users.

What is citizen science?

Citizen science is the active contribution of people who are not professional scientists to science. It provides volunteers with the opportunity to contribute intellectually to the research of others, to share resources or tools at their disposal, or even to start their own research projects. Volunteers provide real value to ongoing research while they themselves acquire a better understanding of the scientific method.

Citizen science opens the doors of laboratories and makes science accessible to all. It facilitates a direct conversation between scientists and enthusiasts who wish to contribute to scientific endeavour.

Who and how you can collaborate?

Anyone can create a new project or contribute to an existing project in Crowdcrafting.

All projects start with a simple tutorial explaining how they work and providing all the information required to participate. There is thus no specific knowledge or experience required to complete proposed tasks. All volunteers need is a keen attitude to learn and share science with everyone….(More)”

Why our peer review system is a toothless watchdog


Ivan Oransky and Adam Marcus at StatNews: “While some — namely, journal editors and publishers — would like us to consider it the opposable thumb of scientific publishing, the key to differentiating rigor from rubbish, some of those very same people seem to think it’s good for nothing. Here is a partial list of the things that editors, publishers, and others have told the world peer review is not designed to do:

1. Detect irresponsible practices

Don’t expect peer reviewers to figure out if authors are “using public data as if it were the author’s own, submitting papers with the same content to different journals, or submitting an article that has already been published in another language without reference to the original,” said the InterAcademy Partnership, a consortium of national scientific academies.

2. Detect fraud

“Journal editors will tell you that peer review is not designed to detect fraud — clever misinformation will sail right through no matter how scrupulous the reviews,” Dan Engber wrote in Slate in 2005.

3. Pick up plagiarism

Peer review “is not designed to pick up fraud or plagiarism, so unless those are really egregious it usually doesn’t,” according to the Rett Syndrome Research Trust.

4. Spot ethics issues

“It is not the role of the reviewer to spot ethics issues in papers,” said Jaap van Harten, executive publisher of Elsevier (the world’s largest academic imprint)in a recent interview. “It is the responsibility of the author to abide by the publishing ethics rules. Let’s look at it in a different way: If a person steals a pair of shoes from a shop, is this the fault of the shop for not protecting their goods or the shoplifter for stealing them? Of course the fault lies with the shoplifter who carried out the crime in the first place.”

5. Spot statistical flaccidity

“Peer reviewers do not check all the datasets, rerun calculations of p-values, and so forth, except in the cases where statistical reviewers are involved — and even in these cases, statistical reviewers often check the methodologies used, sample some data, and move on.” So wrote Kent Anderson, who has served as a publishing exec at several top journals, including Science and the New England Journal of Medicine, in a recent blog post.

6. Prevent really bad research from seeing the light of day

Again, Kent Anderson: “Even the most rigorous peer review at a journal cannot stop a study from being published somewhere. Peer reviewers can’t stop an author from self-promoting a published work later.”

But …

Even when you lower expectations for peer review, it appears to come up short. Richard Smith, former editor of the BMJ, reviewed research showing that the system may be worse than no review at all, at least in biomedicine. “Peer review is supposed to be the quality assurance system for science, weeding out the scientifically unreliable and reassuring readers of journals that they can trust what they are reading,” Smith wrote. “In reality, however, it is ineffective, largely a lottery, anti-innovatory, slow, expensive, wasteful of scientific time, inefficient, easily abused, prone to bias, unable to detect fraud and irrelevant.”

So … what’s left? And are whatever scraps that remain worth the veneration peer review receives? Don’t write about anything that isn’t peer-reviewed, editors frequently admonish us journalists, even creating rules that make researchers afraid to talk to reporters before they’ve published. There’s a good chance it will turn out to be wrong. Oh? Greater than 50 percent? Because that’s the risk of preclinical research in biomedicine being wrong after it’s been peer-reviewed.

With friends like these, who needs peer review? In fact, we do need it, but not just only in the black box that happens before publication. We need continual scrutiny of findings, at sites such as PubMed Commons and PubPeer, in what is known as post-publication peer review. That’s where the action is, and where the scientific record actually gets corrected….(More)”

citizenscience.gov


citizenscience.gov is an official government website designed to accelerate the use of crowdsourcing and citizen science across the U.S. government. The site provides a portal to three key components for federal practitioners: a searchable catalog of federally supported citizen science projects, a toolkit to assist with designing and maintaining projects, and a gateway to a community of practice to share best practices.

The sharing economy comes to scientific research


 at the Conversation: “…to perform top-quality and cost-effective research, scientists need these technologies and the technical knowledge of experts to run them. When money is tight, where can scientists turn for the tools they need to complete their projects?

Sharing resources

An early solution to this problem was to create what the academic world calls “resource labs” that specialize in one or more specific type of science experiments (e.g., genomics, cell culture, proteomics). Researchers can then order and pay for that type of experiment from the resource lab instead of doing it on their own.

By focusing on one area of science, resource labs become the experts in that area and do the experiments better, faster and cheaper than most scientists could do in their own labs. Scientists no longer stumble through failed experiments trying to learn a new technique when a resource lab can do it correctly from the start.

The pooled funds from many research projects allow resource labs to buy better and faster equipment than any individual scientist could afford. This provides more researchers access to better technology at lower costs – which also saves taxpayers money, since many grants are government-backed….

Connecting people on a scientific Craigslist

This is a common paradox, with several efforts under way to address it. For example, MIT has created several “remote online laboratories” running experiments that can be controlled via the internet, to help enrich teaching in places that can’t afford advanced equipment. Harvard’s eagle-i system is a directory where researchers can list information, data and equipment they are willing to share with others – including cell lines, research mice, and equipment. Different services work for different institutions.

In 2011, Dr. Elizabeth Iorns, a breast cancer researcher, developed a mouse model to study how breast cancer spreads, but her institution didn’t have the equipment to finish one part of her study. My resource lab could complete the project, but despite significant searching, Dr. Iorns did not have an effective way to find labs like mine.

Actively connecting scientists with resource labs, and helping resource labs keep their equipment optimally busy, is a model Iorns and cofounder Dan Knox have developed into a business, called Science Exchange. (I am on its Lab Advisory Board, but have no financial interest in the company.) A little bit Craigslist and Travelocity for science rolled into one, Science Exchange provides scientists and expert resource labs a way to find each other to keep research progressing.

Unlike Starbucks, resource labs are not found on every corner and can be difficult for scientists to find. Now a simple search provides scientists a list of multiple resource labs that could do the experiments, including estimated costs and speed – and even previous users’ reviews of the choices.

I signed onto Science Exchange soon after it went live and Iorns immediately sent her project to my lab. We completed the project quickly, resulting in the first peer-reviewed publication made possible through Science Exchange….(More).

First, design for data sharing


John Wilbanks & Stephen H Friend in Nature Biotechnology: “To upend current barriers to sharing clinical data and insights, we need a framework that not only accounts for choices made by trial participants but also qualifies researchers wishing to access and analyze the data.

This March, Sage Bionetworks (Seattle) began sharing curated data collected from >9,000 participants of mPower, a smartphone-enabled health research study for Parkinson’s disease. The mPower study is notable as one of the first observational assessments of human health to rapidly achieve scale as a result of its design and execution purely through a smartphone interface. To support this unique study design, we developed a novel electronic informed consent process that includes participant-determined data-sharing preferences. It is through these preferences that the new data—including self-reported outcomes and quantitative sensor data—are shared broadly for secondary analysis. Our hope is that by sharing these data immediately, prior even to our own complete analysis, we will shorten the time to harnessing any utility that this study’s data may hold to improve the condition of patients who suffer from this disease.

Turbulent times for data sharing

Our release of mPower comes at a turbulent time in data sharing. The power of data for secondary research is top of mind for many these days. Vice President Joe Biden, in heading President Barack Obama’s ambitious cancer ‘moonshot’, describes data sharing as second only to funding to the success of the effort. However, this powerful support for data sharing stands in opposition to the opinions of many within the research establishment. To wit, the august New England Journal of Medicine (NEJM)’s recent editorial suggesting that those who wish to reuse clinical trial data without the direct participation and approval of the original study team are “research parasites”4. In the wake of colliding perspectives on data sharing, we must not lose sight of the scientific and societal ends served by such efforts.

It is important to acknowledge that meaningful data sharing is a nontrivial process that can require substantial investment to ensure that data are shared with sufficient context to guide data users. When data analysis is narrowly targeted to answer a specific and straightforward question—as with many clinical trials—this added effort might not result in improved insights. However, many areas of science, such as genomics, astronomy and high-energy physics, have moved to data collection methods in which large amounts of raw data are potentially of relevance to a wide variety of research questions, but the methodology of moving from raw data to interpretation is itself a subject of active research….(More)”

Website Seeks to Make Government Data Easier to Sift Through


Steve Lohr at the New York Times: “For years, the federal government, states and some cities have enthusiastically made vast troves of data open to the public. Acres of paper records on demographics, public health, traffic patterns, energy consumption, family incomes and many other topics have been digitized and posted on the web.

This abundance of data can be a gold mine for discovery and insights, but finding the nuggets can be arduous, requiring special skills.

A project coming out of the M.I.T. Media Lab on Monday seeks to ease that challenge and to make the value of government data available to a wider audience. The project, called Data USA, bills itself as “the most comprehensive visualization of U.S. public data.” It is free, and its software code is open source, meaning that developers can build custom applications by adding other data.

Cesar A. Hidalgo, an assistant professor of media arts and sciences at the M.I.T. Media Lab who led the development of Data USA, said the website was devised to “transform data into stories.” Those stories are typically presented as graphics, charts and written summaries….Type “New York” into the Data USA search box, and a drop-down menu presents choices — the city, the metropolitan area, the state and other options. Select the city, and the page displays an aerial shot of Manhattan with three basic statistics: population (8.49 million), median household income ($52,996) and median age (35.8).

Lower on the page are six icons for related subject categories, including economy, demographics and education. If you click on demographics, one of the so-called data stories appears, based largely on data from the American Community Survey of the United States Census Bureau.

Using colorful graphics and short sentences, it shows the median age of foreign-born residents of New York (44.7) and of residents born in the United States (28.6); the most common countries of origin for immigrants (the Dominican Republic, China and Mexico); and the percentage of residents who are American citizens (82.8 percent, compared with a national average of 93 percent).

Data USA features a selection of data results on its home page. They include the gender wage gap in Connecticut; the racial breakdown of poverty in Flint, Mich.; the wages of physicians and surgeons across the United States; and the institutions that award the most computer science degrees….(More)

Selected Readings on Data and Humanitarian Response


By Prianka Srinivasan and Stefaan G. Verhulst *

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data and humanitarian response was originally published in 2016.

Data, when used well in a trusted manner, allows humanitarian organizations to innovate how to respond to emergency events, including better coordination of post-disaster relief efforts, the ability to harness local knowledge to create more targeted relief strategies, and tools to predict and monitor disasters in real time. Consequently, in recent years both multinational groups and community-based advocates have begun to integrate data collection and evaluation strategies into their humanitarian operations, to better and more quickly respond to emergencies. However, this movement poses a number of challenges. Compared to the private sector, humanitarian organizations are often less equipped to successfully analyze and manage big data, which pose a number of risks related to the security of victims’ data. Furthermore, complex power dynamics which exist within humanitarian spaces may be further exacerbated through the introduction of new technologies and big data collection mechanisms. In the below we share:

  • Selected Reading List (summaries and hyperlinks)
  • Annotated Selected Reading List
  • Additional Readings

Selected Reading List  (summaries in alphabetical order)

Data and Humanitarian Response

Risks of Using Big Data in Humanitarian Context

Annotated Selected Reading List (in alphabetical order)

Karlsrud, John. “Peacekeeping 4.0: Harnessing the Potential of Big Data, Social Media, and Cyber Technologies.” Cyberspace and International Relations, 2013. http://bit.ly/235Qb3e

  • This chapter from the book “Cyberspace and International Relations” suggests that advances in big data give humanitarian organizations unprecedented opportunities to prevent and mitigate natural disasters and humanitarian crises. However, the sheer amount of unstructured data necessitates effective “data mining” strategies for multinational organizations to make the most use of this data.
  • By profiling some civil-society organizations who use big data in their peacekeeping efforts, Karlsrud suggests that these community-focused initiatives are leading the movement toward analyzing and using big data in countries vulnerable to crisis.
  • The chapter concludes by offering ten recommendations to UN peacekeeping forces to best realize the potential of big data and new technology in supporting their operations.

Mancini, Fancesco. “New Technology and the prevention of Violence and Conflict.” International Peace Institute, 2013. http://bit.ly/1ltLfNV

  • This report from the International Peace Institute looks at five case studies to assess how information and communications technologies (ICTs) can help prevent humanitarian conflicts and violence. Their findings suggest that context has a significant impact on the ability for these ICTs for conflict prevention, and any strategies must take into account the specific contingencies of the region to be successful.
  • The report suggests seven lessons gleaned from the five case studies:
    • New technologies are just one in a variety of tools to combat violence. Consequently, organizations must investigate a variety of complementary strategies to prevent conflicts, and not simply rely on ICTs.
    • Not every community or social group will have the same relationship to technology, and their ability to adopt new technologies are similarly influenced by their context. Therefore, a detailed needs assessment must take place before any new technologies are implemented.
    • New technologies may be co-opted by violent groups seeking to maintain conflict in the region. Consequently, humanitarian groups must be sensitive to existing political actors and be aware of possible negative consequences these new technologies may spark.
    • Local input is integral to support conflict prevention measures, and there exists need for collaboration and awareness-raising with communities to ensure new technologies are sustainable and effective.
    • Information shared between civil-society has more potential to develop early-warning systems. This horizontal distribution of information can also allow communities to maintain the accountability of local leaders.

Meier, Patrick. “Digital humanitarians: how big data is changing the face of humanitarian response.” Crc Press, 2015. http://amzn.to/1RQ4ozc

  • This book traces the emergence of “Digital Humanitarians”—people who harness new digital tools and technologies to support humanitarian action. Meier suggests that this has created a “nervous system” to connect people from disparate parts of the world, revolutionizing the way we respond to humanitarian crises.
  • Meier argues that such technology is reconfiguring the structure of the humanitarian space, where victims are not simply passive recipients of aid but can contribute with other global citizens. This in turn makes us more humane and engaged people.

Robertson, Andrew and Olson, Steve. “Using Data Sharing to Improve Coordination in Peacebuilding.” United States Institute for Peace, 2012. http://bit.ly/235QuLm

  • This report functions as an overview of a roundtable workshop on Technology, Science and Peace Building held at the United States Institute of Peace. The workshop aimed to investigate how data-sharing techniques can be developed for use in peace building or conflict management.
  • Four main themes emerged from discussions during the workshop:
    • “Data sharing requires working across a technology-culture divide”—Data sharing needs the foundation of a strong relationship, which can depend on sociocultural, rather than technological, factors.
    • “Information sharing requires building and maintaining trust”—These relationships are often built on trust, which can include both technological and social perspectives.
    • “Information sharing requires linking civilian-military policy discussions to technology”—Even when sophisticated data-sharing technologies exist, continuous engagement between different stakeholders is necessary. Therefore, procedures used to maintain civil-military engagement should be broadened to include technology.
    • “Collaboration software needs to be aligned with user needs”—technology providers need to keep in mind the needs of its users, in this case peacebuilders, in order to ensure sustainability.

United Nations Independent Expert Advisory Group on a Data Revolution for Sustainable Development. “A World That Counts, Mobilizing the Data Revolution.” 2014. https://bit.ly/2Cb3lXq

  • This report focuses on the potential benefits and risks data holds for sustainable development. Included in this is a strategic framework for using and managing data for humanitarian purposes. It describes a need for a multinational consensus to be developed to ensure data is shared effectively and efficiently.
  • It suggests that “people who are counted”—i.e., those who are included in data collection processes—have better development outcomes and a better chance for humanitarian response in emergency or conflict situations.

Katie Whipkey and Andrej Verity. “Guidance for Incorporating Big Data into Humanitarian Operations.” Digital Humanitarian Network, 2015. http://bit.ly/1Y2BMkQ

  • This report produced by the Digital Humanitarian Network provides an overview of big data, and how humanitarian organizations can integrate this technology into their humanitarian response. It primarily functions as a guide for organizations, and provides concise, brief outlines of what big data is, and how it can benefit humanitarian groups.
  • The report puts forward four main benefits acquired through the use of big data by humanitarian organizations: 1) the ability to leverage real-time information; 2) the ability to make more informed decisions; 3) the ability to learn new insights; 4) the ability for organizations to be more prepared.
  • It goes on to assess seven challenges big data poses for humanitarian organizations: 1) geography, and the unequal access to technology across regions; 2) the potential for user error when processing data; 3) limited technology; 4) questionable validity of data; 5) underdeveloped policies and ethics relating to data management; 6) limitations relating to staff knowledge.

Risks of Using Big Data in Humanitarian Context
Crawford, Kate, and Megan Finn. “The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters.” GeoJournal 80.4, 2015. http://bit.ly/1X0F7AI

  • Crawford & Finn present a critical analysis of the use of big data in disaster management, taking a more skeptical tone to the data revolution facing humanitarian response.
  • They argue that though social and mobile data analysis can yield important insights and tools in crisis events, it also presents a number of limitations which can lead to oversights being made by researchers or humanitarian response teams.
  • Crawford & Finn explore the ethical concerns the use of big data in disaster events introduces, including issues of power, privacy, and consent.
  • The paper concludes by recommending that critical data studies, such as those presented in the paper, be integrated into crisis event research in order to analyze some of the assumptions which underlie mobile and social data.

Jacobsen, Katja Lindskov (2010) Making design safe for citizens: A hidden history of humanitarian experimentation. Citizenship Studies 14.1: 89-103. http://bit.ly/1YaRTwG

  • This paper explores the phenomenon of “humanitarian experimentation,” where victims of disaster or conflict are the subjects of experiments to test the application of technologies before they are administered in greater civilian populations.
  • By analyzing the particular use of iris recognition technology during the repatriation of Afghan refugees to Pakistan in 2002 to 2007, Jacobsen suggests that this “humanitarian experimentation” compromises the security of already vulnerable refugees in order to better deliver biometric product to the rest of the world.

Responsible Data Forum. “Responsible Data Reflection Stories: An Overview.” http://bit.ly/1Rszrz1

  • This piece from the Responsible Data forum is primarily a compilation of “war stories” which follow some of the challenges in using big data for social good. By drawing on these crowdsourced cases, the Forum also presents an overview which makes key recommendations to overcome some of the challenges associated with big data in humanitarian organizations.
  • It finds that most of these challenges occur when organizations are ill-equipped to manage data and new technologies, or are unaware about how different groups interact in digital spaces in different ways.

Sandvik, Kristin Bergtora. “The humanitarian cyberspace: shrinking space or an expanding frontier?” Third World Quarterly 37:1, 17-32, 2016. http://bit.ly/1PIiACK

  • This paper analyzes the shift toward more technology-driven humanitarian work, where humanitarian work increasingly takes place online in cyberspace, reshaping the definition and application of aid. This has occurred along with what many suggest is a shrinking of the humanitarian space.
  • Sandvik provides three interpretations of this phenomena:
    • First, traditional threats remain in the humanitarian space, which are both modified and reinforced by technology.
    • Second, new threats are introduced by the increasing use of technology in humanitarianism, and consequently the humanitarian space may be broadening, not shrinking.
    • Finally, if the shrinking humanitarian space theory holds, cyberspace offers one example of this, where the increasing use of digital technology to manage disasters leads to a contraction of space through the proliferation of remote services.

Additional Readings on Data and Humanitarian Response

* Thanks to: Kristen B. Sandvik; Zara Rahman; Jennifer Schulte; Sean McDonald; Paul Currion; Dinorah Cantú-Pedraza and the Responsible Data Listserve for valuable input.

Elements of a New Ethical Framework for Big Data Research


The Berkman Center is pleased to announce the publication of a new paper from the Privacy Tools for Sharing Research Data project team. In this paper, Effy Vayena, Urs Gasser, Alexandra Wood, and David O’Brien from the Berkman Center, with Micah Altman from MIT Libraries, outline elements of a new ethical framework for big data research.

Emerging large-scale data sources hold tremendous potential for new scientific research into human biology, behaviors, and relationships. At the same time, big data research presents privacy and ethical challenges that the current regulatory framework is ill-suited to address. In light of the immense value of large-scale research data, the central question moving forward is not whether such data should be made available for research, but rather how the benefits can be captured in a way that respects fundamental principles of ethics and privacy.

The authors argue that a framework with the following elements would support big data utilization and help harness the value of big data in a sustainable and trust-building manner:

  • Oversight should aim to provide universal coverage of human subjects research, regardless of funding source, across all stages of the information lifecycle.

  • New definitions and standards should be developed based on a modern understanding of privacy science and the expectations of research subjects.

  • Researchers and review boards should be encouraged to incorporate systematic risk-benefit assessments and new procedural and technological solutions from the wide range of interventions that are available.

  • Oversight mechanisms and the safeguards implemented should be tailored to the intended uses, benefits, threats, harms, and vulnerabilities associated with a specific research activity.

Development of a new ethical framework with these elements should be the product of a dynamic multistakeholder process that is designed to capture the latest scientific understanding of privacy, analytical methods, available safeguards, community and social norms, and best practices for research ethics as they evolve over time.

The full paper is available for download through the Washington and Lee Law Review Online as part of a collection of papers featured at the Future of Privacy Forum workshop Beyond IRBs: Designing Ethical Review Processes for Big Data Research held on December 10, 2015, in Washington, DC….(More)”

When open data is a Trojan Horse: The weaponization of transparency in science and governance


Karen E.C. Levy and David Merritt Johns in Big Data and Society: “Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” initiatives in the 1990s, and contemporary legislative efforts to open environmental data to public inspection. Rules that exist mainly to impede science-based policy processes weaponize the concept of data transparency. The discussion illustrates that, much as Big Data itself requires critical assessment, the processes and principles that attend it—like transparency—also carry political valence, and, as such, warrant careful analysis….(More)”

The creative citizen unbound


The creative citizen unbound

Book by Ian Hargreaves and John Hartley on “How social media and DIY culture contribute to democracy, communities and the creative economy”: “The creative citizen unbound introduces the concept of ‘creative citizenship’ to explore the potential of civic-minded creative individuals in the era of social media and in the context of an expanding creative economy. Drawing on the findings of a 30-month study of communities supported by the UK research funding councils, multidisciplinary contributors examine the value and nature of creative citizenship, not only in terms of its contribution to civic life and social capital but also to more contested notions of value, both economic and cultural. This original book will be beneficial to researchers and students across a range of disciplines including media and communication, political science, economics, planning and economic geography, and the creative and performing arts….(More)”