Time for sharing data to become routine: the seven excuses for not doing so are all invalid


Paper by Richard Smith and Ian Roberts: “Data are more valuable than scientific papers but researchers are incentivised to publish papers not share data. Patients are the main beneficiaries of data sharing but researchers have several incentives not to share: others might use their data to get ahead in the academic rat race; they might be scooped; their results might not be replicable; competitors may reach different conclusions; their data management might be exposed as poor; patient confidentiality might be breached; and technical difficulties make sharing impossible. All of these barriers can be overcome and researchers should be rewarded for sharing data. Data sharing must become routine….(More)”

Improving patient care by bridging the divide between doctors and data scientists


 at the Conversation: “While wonderful new medical discoveries and innovations are in the news every day, doctors struggle daily with using information and techniques available right now while carefully adopting new concepts and treatments. As a practicing doctor, I deal with uncertainties and unanswered clinical questions all the time….At the moment, a report from the National Academy of Medicine tells us, most doctors base most of their everyday decisions on guidelines from (sometimes biased) expert opinions or small clinical trials. It would be better if they were from multicenter, large, randomized controlled studies, with tightly controlled conditions ensuring the results are as reliable as possible. However, those are expensive and difficult to perform, and even then often exclude a number of important patient groups on the basis of age, disease and sociological factors.

Part of the problem is that health records are traditionally kept on paper, making them hard to analyze en masse. As a result, most of what medical professionals might have learned from experiences was lost – or at least was inaccessible to another doctor meeting with a similar patient.

A digital system would collect and store as much clinical data as possible from as many patients as possible. It could then use information from the past – such as blood pressure, blood sugar levels, heart rate and other measurements of patients’ body functions – to guide future doctors to the best diagnosis and treatment of similar patients.

Industrial giants such as Google, IBM, SAP and Hewlett-Packard have also recognized the potential for this kind of approach, and are now working on how to leverage population data for the precise medical care of individuals.

Collaborating on data and medicine

At the Laboratory of Computational Physiology at the Massachusetts Institute of Technology, we have begun to collect large amounts of detailed patient data in the Medical Information Mart in Intensive Care (MIMIC). It is a database containing information from 60,000 patient admissions to the intensive care units of the Beth Israel Deaconess Medical Center, a Boston teaching hospital affiliated with Harvard Medical School. The data in MIMIC has been meticulously scoured so individual patients cannot be recognized, and is freely shared online with the research community.

But the database itself is not enough. We bring together front-line clinicians (such as nurses, pharmacists and doctors) to identify questions they want to investigate, and data scientists to conduct the appropriate analyses of the MIMIC records. This gives caregivers and patients the best individualized treatment options in the absence of a randomized controlled trial.

Bringing data analysis to the world

At the same time we are working to bring these data-enabled systems to assist with medical decisions to countries with limited health care resources, where research is considered an expensive luxury. Often these countries have few or no medical records – even on paper – to analyze. We can help them collect health data digitally, creating the potential to significantly improve medical care for their populations.

This task is the focus of Sana, a collection of technical, medical and community experts from across the globe that is also based in our group at MIT. Sana has designed a digital health information system specifically for use by health providers and patients in rural and underserved areas.

At its core is an open-source system that uses cellphones – common even in poor and rural nations – to collect, transmit and store all sorts of medical data. It can handle not only basic patient data such as height and weight, but also photos and X-rays, ultrasound videos, and electrical signals from a patient’s brain (EEG) and heart (ECG).

Partnering with universities and health organizations, Sana organizes training sessions (which we call “bootcamps”) and collaborative workshops (called “hackathons”) to connect nurses, doctors and community health workers at the front lines of care with technology experts in or near their communities. In 2015, we held bootcamps and hackathons in Colombia, Uganda, Greece and Mexico. The bootcamps teach students in technical fields like computer science and engineering how to design and develop health apps that can run on cellphones. Immediately following the bootcamp, the medical providers join the group and the hackathon begins…At the end of the day, though, the purpose is not the apps….(More)

Smart crowds in smart cities: real life, city scale deployments of a smartphone based participatory crowd management platform


Tobias FrankePaul Lukowicz and Ulf Blanke at the Journal of Internet Services and Applications: “Pedestrian crowds are an integral part of cities. Planning for crowds, monitoring crowds and managing crowds, are fundamental tasks in city management. As a consequence, crowd management is a sprawling R&D area (see related work) that includes theoretical models, simulation tools, as well as various support systems. There has also been significant interest in using computer vision techniques to monitor crowds. However, overall, the topic of crowd management has been given only little attention within the smart city domain. In this paper we report on a platform for smart, city-wide crowd management based on a participatory mobile phone sensing platform. Originally, the apps based on this platform have been conceived as a technology validation tool for crowd based sensing within a basic research project. However, the initial deployments at the Notte Bianca Festival1 in Malta and at the Lord Mayor’s Show in London2 generated so much interest within the civil protection community that it has gradually evolved into a full-blown participatory crowd management system and is now in the process of being commercialized through a startup company. Until today it has been deployed at 14 events in three European countries (UK, Netherlands, Switzerland) and used by well over 100,000 people….

Obtaining knowledge about the current size and density of a crowd is one of the central aspects of crowd monitoring . For the last decades, automatic crowd monitoring in urban areas has mainly been performed by means of image processing . One use case for such video-based applications can be found in, where a CCTV camera-based system is presented that automatically alerts the staff of subway stations when the waiting platform is congested. However, one of the downsides of video-based crowd monitoring is the fact that video cameras tend to be considered as privacy invading. Therefore,  presents a privacy preserving approach to video-based crowd monitoring where crowd sizes are estimated without people models or object tracking.

With respect to the mitigation of catastrophes induced by panicking crowds (e.g. during an evacuation), city planners and architects increasingly rely on tools simulating crowd behaviors in order to optimize infrastructures. Murakami et al. presents an agent based simulation for evacuation scenarios. Shendarkar et al. presents a work that is also based on BSI (believe, desire, intent) agents – those agents however are trained in a virtual reality environment thereby giving greater flexibility to the modeling. Kluepfel et al. on the other hand uses a cellular automaton model for the simulation of crowd movement and egress behavior.

With smartphones becoming everyday items, the concept of crowd sourcing information from users of mobile application has significantly gained traction. Roitman et al. presents a smart city system where the crowd can send eye witness reports thereby creating deeper insights for city officials. Szabo et al. takes this approach one step further and employs the sensors built into smartphones for gathering data for city services such as live transit information. Ghose et al. utilizes the same principle for gathering information on road conditions. Pan et al. uses a combination of crowd sourcing and social media analysis for identifying traffic anomalies….(More)”.

Teenage scientists enlisted to fight Zika


ShareAmerica: “A mosquito’s a mosquito, right? Not when it comes to Zika and other mosquito-borne diseases.

Only two of the estimated 3,000 species of mosquitoes are capable of carrying the Zika virus in the United States, but estimates of their precise range remain hazy, according to the U.S. Centers for Disease Control and Prevention.

Scientists could start getting better information about these pesky, but important, insects with the help of plastic cups, brown paper towels and teenage biology students.

As part of the Invasive Mosquito Project from the U.S. Department of Agriculture, secondary-school students nationwide are learning about mosquito populations and helping fill the knowledge gaps.

Simple experiment, complex problem

The experiment works like this: First, students line the cups with paper, then fill two-thirds of the cups with water. Students place the plastic cups outside, and after a week, the paper is dotted with what looks like specks of dirt. These dirt particles are actually mosquito eggs, which the students can identify and classify.

Students then upload their findings to a national crowdsourced database. Crowdsourcing uses the collective intelligence of online communities to “distribute” problem solving across a massive network.

Entomologist Lee Cohnstaedt of the U.S. Department of Agriculture coordinates the program, and he’s already thinking about expansion. He said he hopes to have one-fifth of U.S. schools participate in the mosquito species census. He also plans to adapt lesson plans for middle schools, Scouting troops and gardening clubs.

Already, crowdsourcing has “collected better data than we could have working alone,” he told the Associated Press….

In addition to mosquito tracking, crowdsourcing has been used to develop innovative responses to a number of complex challenges, from climate change to archaeologyto protein modeling….(More)”

Reining in the Big Promise of Big Data: Transparency, Inequality, and New Regulatory Frontiers


Paper by Philipp Hacker and Bilyana Petkova: “The growing differentiation of services based on Big Data harbors the potential for both greater societal inequality and for greater equality. Anti-discrimination law and transparency alone, however, cannot do the job of curbing Big Data’s negative externalities while fostering its positive effects.

To rein in Big Data’s potential, we adapt regulatory strategies from behavioral economics, contracts and criminal law theory. Four instruments stand out: First, active choice may be mandated between data collecting services (paid by data) and data free services (paid by money). Our suggestion provides concrete estimates for the price range of a data free option, sheds new light on the monetization of data collecting services, and proposes an “inverse predatory pricing” instrument to limit excessive pricing of the data free option. Second, we propose using the doctrine of unconscionability to prevent contracts that unreasonably favor data collecting companies. Third, we suggest democratizing data collection by regular user surveys and data compliance officers partially elected by users. Finally, we trace back new Big Data personalization techniques to the old Hartian precept of treating like cases alike and different cases – differently. If it is true that a speeding ticket over $50 is less of a disutility for a millionaire than for a welfare recipient, the income and wealth-responsive fines powered by Big Data that we suggest offer a glimpse into the future of the mitigation of economic and legal inequality by personalized law. Throughout these different strategies, we show how salience of data collection can be coupled with attempts to prevent discrimination against and exploitation of users. Finally, we discuss all four proposals in the context of different test cases: social media, student education software and credit and cell phone markets.

Many more examples could and should be discussed. In the face of increasing unease about the asymmetry of power between Big Data collectors and dispersed users, about differential legal treatment, and about the unprecedented dimensions of economic inequality, this paper proposes a new regulatory framework and research agenda to put the powerful engine of Big Data to the benefit of both the individual and societies adhering to basic notions of equality and non-discrimination….(More)”

Building Data Responsibility into Humanitarian Action


Stefaan Verhulst at The GovLab: “Next Monday, May 23rd, governments, non-profit organizations and citizen groups will gather in Istanbul at the first World Humanitarian Summit. A range of important issues will be on the agenda, not least of which the refugee crisis confronting the Middle East and Europe. Also on the agenda will be an issue of growing importance and relevance, even if it does not generate front-page headlines: the increasing potential (and use) of data in the humanitarian context.

To explore this topic, a new paper, “Building Data Responsibility into Humanitarian Action,” is being released today, and will be presented tomorrow at the Understanding Risk Forum. This paper is the result of a collaboration between the United Nations Office for the Coordination of Humanitarian Affairs (OCHA), The GovLab (NYU Tandon School of Engineering), the Harvard Humanitarian Initiative, and Leiden UniversityCentre for Innovation. It seeks to identify the potential benefits and risks of using data in the humanitarian context, and begins to outline an initial framework for the responsible use of data in humanitarian settings.

Both anecdotal and more rigorously researched evidence points to the growing use of data to address a variety of humanitarian crises. The paper discusses a number of data risk case studies, including the use of call data to fight Malaria in Africa; satellite imagery to identify security threats on the border between Sudan and South Sudan; and transaction data to increase the efficiency of food delivery in Lebanon. These early examples (along with a few others discussed in the paper) have begun to show the opportunities offered by data and information. More importantly, they also help us better understand the risks, including and especially those posed to privacy and security.

One of the broader goals of the paper is to integrate the specific and the theoretical, in the process building a bridge between the deep, contextual knowledge offered by initiatives like those discussed above and the broader needs of the humanitarian community. To that end, the paper builds on its discussion of case studies to begin establishing a framework for the responsible use of data in humanitarian contexts. It identifies four “Minimum Humanitarian standards for the Responsible use of Data” and four “Characteristics of Humanitarian Organizations that use Data Responsibly.” Together, these eight attributes can serve as a roadmap or blueprint for humanitarian groups seeking to use data. In addition, the paper also provides a four-step practical guide for a data responsibility framework (see also earlier blog)….(More)” Full Paper: Building Data Responsibility into Humanitarian Action

Outstanding Challenges in Recent Open Government Data Initiatives


Paper by Usamah A. Algemili: “In recent years, we have witnessed increasing interest in government data. Many governments around the world have sensed the value of its passive data sets. These governments started their Open Data policies, yet many countries are on the way converting raw data into useful representation. This paper surveys the previous efforts of Open Data initiatives. It discusses the various challenges that open data projects may encounter during the transformation from passive data sets towards Open Data culture. It reaches out project teams acquiring their practical assessment. Thus, an online form has been distributed among project teams. The questionnaire was developed in alignment with previous literature of data integration challenges. 138 eligible professional participated, and their responds has been analyzed by the researcher. The result section identifies the most critical challenges from project-teams’ point-of-view, and the findings show four obstacles that stand out as critical challenges facing project teams. This paper casts on these challenges, and it attempts to indicate the missing gap between current guidelines and practical experience. Accordingly, this paper presents the current infrastructure of Open Data framework followed by additional recommendations that may lead to successful implementation of Open Data development….(More)”

We know where you live


MIT News Office: “From location data alone, even low-tech snoopers can identify Twitter users’ homes, workplaces….Researchers at MIT and Oxford University have shown that the location stamps on just a handful of Twitter posts — as few as eight over the course of a single day — can be enough to disclose the addresses of the poster’s home and workplace to a relatively low-tech snooper.

The tweets themselves might be otherwise innocuous — links to funny videos, say, or comments on the news. The location information comes from geographic coordinates automatically associated with the tweets.

Twitter’s location-reporting service is off by default, but many Twitter users choose to activate it. The new study is part of a more general project at MIT’s Internet Policy Research Initiative to help raise awareness about just how much privacy people may be giving up when they use social media.

The researchers describe their research in a paper presented last week at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems, where it received an honorable mention in the best-paper competition, a distinction reserved for only 4 percent of papers accepted to the conference.

“Many people have this idea that only machine-learning techniques can discover interesting patterns in location data,” says Ilaria Liccardi, a research scientist at MIT’s Internet Policy Research Initiative and first author on the paper. “And they feel secure that not everyone has the technical knowledge to do that. With this study, what we wanted to show is that when you send location data as a secondary piece of information, it is extremely simple for people with very little technical knowledge to find out where you work or live.”

Conclusions from clustering

In their study, Liccardi and her colleagues — Alfie Abdul-Rahman and Min Chen of Oxford’s e-Research Centre in the U.K. — used real tweets from Twitter users in the Boston area. The users consented to the use of their data, and they also confirmed their home and work addresses, their commuting routes, and the locations of various leisure destinations from which they had tweeted.

The time and location data associated with the tweets were then presented to a group of 45 study participants, who were asked to try to deduce whether the tweets had originated at the Twitter users’ homes, their workplaces, leisure destinations, or locations along their commutes. The participants were not recruited on the basis of any particular expertise in urban studies or the social sciences; they just drew what conclusions they could from location clustering….

Predictably, participants fared better with map-based representations, correctly identifying Twitter users’ homes roughly 65 percent of the time and their workplaces at closer to 70 percent. Even the tabular representation was informative, however, with accuracy rates of just under 50 percent for homes and a surprisingly high 70 percent for workplaces….(More; Full paper )”

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide


Paper by Jacob Metcalf and Kate Crawford: “There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science,critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce….(More)”

Insights On Collective Problem-Solving: Complexity, Categorization And Lessons From Academia


Part 3 of an interview series by Henry Farrell for the MacArthur Research Network on Opening Governance: “…Complexity theorists have devoted enormous energy and attention to thinking about how complex problems, in which different factors interact in ways that are hard to predict, can best be solved. One key challenge is categorizing problems, so as to understand which approaches are best suited to addressing them.

Scott Page is the Leonid Hurwicz Collegiate Professor of Complex Systems at the University of Michigan, Ann Arbor, and one of the world’s foremost experts on diversity and problem-solving. I asked him a series of questions about how we might use insights from academic research to think better about how problem solving works.

Henry: One of the key issues of collective problem-solving is what you call the ‘problem of problems’ – the question of identifying which problems we need to solve. This is often politically controversial – e.g., it may be hard to get agreement that global warming, or inequality, or long prison sentences are a problem. How do we best go about identifying problems, given that people may disagree?

Scott: In a recent big think paper on the potential of diversity for collective problem solving in Scientific American, Katherine Phillips writes that group members must feel validated, that they must share a commitment to the group, and they must have a common goal if they are going to contribute. This implies that you won’t succeed in getting people to collaborate by setting an agenda from on high and then seeking to attract diverse people to further that agenda.

One way of starting to tackle the problem of problems is to steal a rule of thumb from Getting to Yes, by getting to think people about their broad interests rather than the position that they’re starting from. People often agree on their fundamental desires but disagree on how they can be achieved. For example, nearly everyone wants less crime, but they may disagree over whether they think the solution to crime involves tackling poverty or imposing longer prison sentences. If you can get them to focus on their common interest in solving crime rather than their disagreements, you’re more likely to get them to collaborate usefully.

Segregation amplifies the problem of problems. We live in towns and neighborhoods segregated by race, income, ideology, and human capital. Democrats live near Democrats and Republicans near Republicans. Consensus requires integration. We must work across ideologies. Relatedly, opportunity requires more than access. Many people grow up not knowing any engineers, dentists, doctors, lawyers, and statisticians. This isolation narrows the set of careers they consider and it reduces the diversity of many professions. We cannot imagine lives we do not know.

Henry: Once you get past the problem of problems, you still need to identify which kind of problem you are dealing with. You identify three standard types of problems: solution problems, selection problems and optimization problems. What – very briefly – are the key differences between these kinds of problems?

Scott: I’m constantly pondering the potential set of categories in which collective intelligence can emerge. I’m teaching a course on collective intelligence this semester and the undergraduates and I developed an acronym SCARCE PIGS to describe the different types of domains. Here’s the brief summary:

  • Predict: when individuals combine information, models, or measurements to estimate a future event, guess an answer, or classify an event. Examples might involve betting markets, or combined efforts to guess a quantity, such as Francis Galton’s example of people at a fair trying to guess the weight of a steer.
  • Identify: when individuals have local, partial, or possibly erroneous knowledge and collectively can find an object. Here, an example is DARPA’s Red Balloon project.
  • Solve: when individuals apply and possibly combine higher order cognitive processes and analytic tools for the purpose of finding or improving a solution to a task. Innocentive and similar organizations provide examples of this.
  • Generate: when individuals apply diverse representations, heuristics, and knowledge to produce something new. An everyday example is creating a new building.
  • Coordinate: when individuals adopt similar actions, behaviors, beliefs, or mental frameworks by learning through local interactions. Ordinary social conventions such as people greeting each other are good examples.
  • Cooperate: when individuals take actions, not necessarily in their self interest, that collectively produce a desirable outcome. Here, think of managing common pool resources (e.g. fishing boats not overfishing an area that they collectively control).
  • Arrange: when individuals manipulate items in a physical or virtual environment for their own purposes resulting in an organization of that environment. As an example, imagine a student co-op which keeps twenty types of hot sauce in its pantry. If each student puts whichever hot sauce she uses in the front of the pantry, then on average, the hot sauces will be arranged according to popularity, with the most favored hot sauces in the front and the least favored lost in the back.
  • Respond: when individuals react to external or internal stimuli creating collective responses that maintains system level functioning. For example, when yellow jackets attack a predator to maintain the colony, they are displaying this kind of problem solving.
  • Emerge: when individual parts create a whole that has categorically distinct and new functionalities. The most obvious example of this is the human brain….(More)”