research paper / thesis / dissertation

WeatherUSI: User-Based Weather Crowdsourcing on Public Displays

Curated on June 2, 2016August 3, 2018 by Stefaan Verhulst

Evangelos Niforatos, Ivan Elhart and Marc Langheinrich in Web Engineering: “Contemporary public display systems hold a significant potential to contribute to in situ crowdsourcing. Recently, public display systems have surpassed their traditional role as static content projection hotspots by supporting interactivity and hosting applications that increase overall perceived user utility. As such, we developed WeatherUSI, a web-based interactive public display application that enables passers-by to input subjective information about current and future weather conditions. In this demo paper, we present the functionality of the app, describe the underlying system infrastructure and present how we combine input streams originating from WeatherUSI app on a public display together with its mobile app counterparts for facilitating user based weather crowdsourcing….(more)”

Big data: Issues for an international political sociology of the datafication of worlds

Curated on June 2, 2016August 3, 2018 by Stefaan Verhulst

Paper by Madsen, A.K.; Flyverbom, M.; Hilbert, M. and Ruppert, Evelyn: “The claim that big data can revolutionize strategy and governance in the context of international relations is increasingly hard to ignore. Scholars of international political sociology have mainly discussed this development through the themes of security and surveillance. The aim of this paper is to outline a research agenda that can be used to raise a broader set of sociological and practice-oriented questions about the increasing datafication of international relations and politics. First, it proposes a way of conceptualizing big data that is broad enough to open fruitful investigations into the emerging use of big data in these contexts. This conceptualization includes the identification of three moments contained in any big data practice. Secondly, it suggests a research agenda built around a set of sub-themes that each deserve dedicated scrutiny when studying the interplay between big data and international relations along these moments. Through a combination of these moments and sub-themes, the paper suggests a roadmap for an international political sociology of the datafication of worlds….(more)”

Time for sharing data to become routine: the seven excuses for not doing so are all invalid

Curated on May 30, 2016October 24, 2018 by Stefaan Verhulst

Paper by Richard Smith and Ian Roberts: “Data are more valuable than scientific papers but researchers are incentivised to publish papers not share data. Patients are the main beneficiaries of data sharing but researchers have several incentives not to share: others might use their data to get ahead in the academic rat race; they might be scooped; their results might not be replicable; competitors may reach different conclusions; their data management might be exposed as poor; patient confidentiality might be breached; and technical difficulties make sharing impossible. All of these barriers can be overcome and researchers should be rewarded for sharing data. Data sharing must become routine….(More)”

Improving patient care by bridging the divide between doctors and data scientists

Curated on May 26, 2016August 3, 2018 by Stefaan Verhulst

Leo Anthony Celi at the Conversation: “While wonderful new medical discoveries and innovations are in the news every day, doctors struggle daily with using information and techniques available right now while carefully adopting new concepts and treatments. As a practicing doctor, I deal with uncertainties and unanswered clinical questions all the time….At the moment, a report from the National Academy of Medicine tells us, most doctors base most of their everyday decisions on guidelines from (sometimes biased) expert opinions or small clinical trials. It would be better if they were from multicenter, large, randomized controlled studies, with tightly controlled conditions ensuring the results are as reliable as possible. However, those are expensive and difficult to perform, and even then often exclude a number of important patient groups on the basis of age, disease and sociological factors.

Part of the problem is that health records are traditionally kept on paper, making them hard to analyze en masse. As a result, most of what medical professionals might have learned from experiences was lost – or at least was inaccessible to another doctor meeting with a similar patient.

A digital system would collect and store as much clinical data as possible from as many patients as possible. It could then use information from the past – such as blood pressure, blood sugar levels, heart rate and other measurements of patients’ body functions – to guide future doctors to the best diagnosis and treatment of similar patients.

Industrial giants such as Google, IBM, SAP and Hewlett-Packard have also recognized the potential for this kind of approach, and are now working on how to leverage population data for the precise medical care of individuals.

Collaborating on data and medicine

At the Laboratory of Computational Physiology at the Massachusetts Institute of Technology, we have begun to collect large amounts of detailed patient data in the Medical Information Mart in Intensive Care (MIMIC). It is a database containing information from 60,000 patient admissions to the intensive care units of the Beth Israel Deaconess Medical Center, a Boston teaching hospital affiliated with Harvard Medical School. The data in MIMIC has been meticulously scoured so individual patients cannot be recognized, and is freely shared online with the research community.

But the database itself is not enough. We bring together front-line clinicians (such as nurses, pharmacists and doctors) to identify questions they want to investigate, and data scientists to conduct the appropriate analyses of the MIMIC records. This gives caregivers and patients the best individualized treatment options in the absence of a randomized controlled trial.

Bringing data analysis to the world

At the same time we are working to bring these data-enabled systems to assist with medical decisions to countries with limited health care resources, where research is considered an expensive luxury. Often these countries have few or no medical records – even on paper – to analyze. We can help them collect health data digitally, creating the potential to significantly improve medical care for their populations.

This task is the focus of Sana, a collection of technical, medical and community experts from across the globe that is also based in our group at MIT. Sana has designed a digital health information system specifically for use by health providers and patients in rural and underserved areas.

At its core is an open-source system that uses cellphones – common even in poor and rural nations – to collect, transmit and store all sorts of medical data. It can handle not only basic patient data such as height and weight, but also photos and X-rays, ultrasound videos, and electrical signals from a patient’s brain (EEG) and heart (ECG).

Partnering with universities and health organizations, Sana organizes training sessions (which we call “bootcamps”) and collaborative workshops (called “hackathons”) to connect nurses, doctors and community health workers at the front lines of care with technology experts in or near their communities. In 2015, we held bootcamps and hackathons in Colombia, Uganda, Greece and Mexico. The bootcamps teach students in technical fields like computer science and engineering how to design and develop health apps that can run on cellphones. Immediately following the bootcamp, the medical providers join the group and the hackathon begins…At the end of the day, though, the purpose is not the apps….(More)

Smart crowds in smart cities: real life, city scale deployments of a smartphone based participatory crowd management platform

Curated on May 25, 2016August 3, 2018 by Stefaan Verhulst

Tobias Franke, Paul Lukowicz and Ulf Blanke at the Journal of Internet Services and Applications: “Pedestrian crowds are an integral part of cities. Planning for crowds, monitoring crowds and managing crowds, are fundamental tasks in city management. As a consequence, crowd management is a sprawling R&D area (see related work) that includes theoretical models, simulation tools, as well as various support systems. There has also been significant interest in using computer vision techniques to monitor crowds. However, overall, the topic of crowd management has been given only little attention within the smart city domain. In this paper we report on a platform for smart, city-wide crowd management based on a participatory mobile phone sensing platform. Originally, the apps based on this platform have been conceived as a technology validation tool for crowd based sensing within a basic research project. However, the initial deployments at the Notte Bianca Festival¹ in Malta and at the Lord Mayor’s Show in London² generated so much interest within the civil protection community that it has gradually evolved into a full-blown participatory crowd management system and is now in the process of being commercialized through a startup company. Until today it has been deployed at 14 events in three European countries (UK, Netherlands, Switzerland) and used by well over 100,000 people….

Obtaining knowledge about the current size and density of a crowd is one of the central aspects of crowd monitoring . For the last decades, automatic crowd monitoring in urban areas has mainly been performed by means of image processing . One use case for such video-based applications can be found in, where a CCTV camera-based system is presented that automatically alerts the staff of subway stations when the waiting platform is congested. However, one of the downsides of video-based crowd monitoring is the fact that video cameras tend to be considered as privacy invading. Therefore, presents a privacy preserving approach to video-based crowd monitoring where crowd sizes are estimated without people models or object tracking.

With respect to the mitigation of catastrophes induced by panicking crowds (e.g. during an evacuation), city planners and architects increasingly rely on tools simulating crowd behaviors in order to optimize infrastructures. Murakami et al. presents an agent based simulation for evacuation scenarios. Shendarkar et al. presents a work that is also based on BSI (believe, desire, intent) agents – those agents however are trained in a virtual reality environment thereby giving greater flexibility to the modeling. Kluepfel et al. on the other hand uses a cellular automaton model for the simulation of crowd movement and egress behavior.

With smartphones becoming everyday items, the concept of crowd sourcing information from users of mobile application has significantly gained traction. Roitman et al. presents a smart city system where the crowd can send eye witness reports thereby creating deeper insights for city officials. Szabo et al. takes this approach one step further and employs the sensors built into smartphones for gathering data for city services such as live transit information. Ghose et al. utilizes the same principle for gathering information on road conditions. Pan et al. uses a combination of crowd sourcing and social media analysis for identifying traffic anomalies….(More)”.

Teenage scientists enlisted to fight Zika

Curated on May 25, 2016October 9, 2018 by Stefaan Verhulst

ShareAmerica: “A mosquito’s a mosquito, right? Not when it comes to Zika and other mosquito-borne diseases.

Only two of the estimated 3,000 species of mosquitoes are capable of carrying the Zika virus in the United States, but estimates of their precise range remain hazy, according to the U.S. Centers for Disease Control and Prevention.

Scientists could start getting better information about these pesky, but important, insects with the help of plastic cups, brown paper towels and teenage biology students.

As part of the Invasive Mosquito Project from the U.S. Department of Agriculture, secondary-school students nationwide are learning about mosquito populations and helping fill the knowledge gaps.

Simple experiment, complex problem

The experiment works like this: First, students line the cups with paper, then fill two-thirds of the cups with water. Students place the plastic cups outside, and after a week, the paper is dotted with what looks like specks of dirt. These dirt particles are actually mosquito eggs, which the students can identify and classify.

Students then upload their findings to a national crowdsourced database. Crowdsourcing uses the collective intelligence of online communities to “distribute” problem solving across a massive network.

Entomologist Lee Cohnstaedt of the U.S. Department of Agriculture coordinates the program, and he’s already thinking about expansion. He said he hopes to have one-fifth of U.S. schools participate in the mosquito species census. He also plans to adapt lesson plans for middle schools, Scouting troops and gardening clubs.

Already, crowdsourcing has “collected better data than we could have working alone,” he told the Associated Press….

In addition to mosquito tracking, crowdsourcing has been used to develop innovative responses to a number of complex challenges, from climate change to archaeologyto protein modeling….(More)”

Reining in the Big Promise of Big Data: Transparency, Inequality, and New Regulatory Frontiers

Curated on May 23, 2016August 3, 2018 by Stefaan Verhulst

Paper by Philipp Hacker and Bilyana Petkova: “The growing differentiation of services based on Big Data harbors the potential for both greater societal inequality and for greater equality. Anti-discrimination law and transparency alone, however, cannot do the job of curbing Big Data’s negative externalities while fostering its positive effects.

To rein in Big Data’s potential, we adapt regulatory strategies from behavioral economics, contracts and criminal law theory. Four instruments stand out: First, active choice may be mandated between data collecting services (paid by data) and data free services (paid by money). Our suggestion provides concrete estimates for the price range of a data free option, sheds new light on the monetization of data collecting services, and proposes an “inverse predatory pricing” instrument to limit excessive pricing of the data free option. Second, we propose using the doctrine of unconscionability to prevent contracts that unreasonably favor data collecting companies. Third, we suggest democratizing data collection by regular user surveys and data compliance officers partially elected by users. Finally, we trace back new Big Data personalization techniques to the old Hartian precept of treating like cases alike and different cases – differently. If it is true that a speeding ticket over $50 is less of a disutility for a millionaire than for a welfare recipient, the income and wealth-responsive fines powered by Big Data that we suggest offer a glimpse into the future of the mitigation of economic and legal inequality by personalized law. Throughout these different strategies, we show how salience of data collection can be coupled with attempts to prevent discrimination against and exploitation of users. Finally, we discuss all four proposals in the context of different test cases: social media, student education software and credit and cell phone markets.

Many more examples could and should be discussed. In the face of increasing unease about the asymmetry of power between Big Data collectors and dispersed users, about differential legal treatment, and about the unprecedented dimensions of economic inequality, this paper proposes a new regulatory framework and research agenda to put the powerful engine of Big Data to the benefit of both the individual and societies adhering to basic notions of equality and non-discrimination….(More)”

Building Data Responsibility into Humanitarian Action

Curated on May 19, 2016August 3, 2018 by Stefaan Verhulst

Stefaan Verhulst at The GovLab: “Next Monday, May 23^rd, governments, non-profit organizations and citizen groups will gather in Istanbul at the first World Humanitarian Summit. A range of important issues will be on the agenda, not least of which the refugee crisis confronting the Middle East and Europe. Also on the agenda will be an issue of growing importance and relevance, even if it does not generate front-page headlines: the increasing potential (and use) of data in the humanitarian context.

To explore this topic, a new paper, “Building Data Responsibility into Humanitarian Action,” is being released today, and will be presented tomorrow at the Understanding Risk Forum. This paper is the result of a collaboration between the United Nations Office for the Coordination of Humanitarian Affairs (OCHA), The GovLab (NYU Tandon School of Engineering), the Harvard Humanitarian Initiative, and Leiden UniversityCentre for Innovation. It seeks to identify the potential benefits and risks of using data in the humanitarian context, and begins to outline an initial framework for the responsible use of data in humanitarian settings.

Both anecdotal and more rigorously researched evidence points to the growing use of data to address a variety of humanitarian crises. The paper discusses a number of data risk case studies, including the use of call data to fight Malaria in Africa; satellite imagery to identify security threats on the border between Sudan and South Sudan; and transaction data to increase the efficiency of food delivery in Lebanon. These early examples (along with a few others discussed in the paper) have begun to show the opportunities offered by data and information. More importantly, they also help us better understand the risks, including and especially those posed to privacy and security.

One of the broader goals of the paper is to integrate the specific and the theoretical, in the process building a bridge between the deep, contextual knowledge offered by initiatives like those discussed above and the broader needs of the humanitarian community. To that end, the paper builds on its discussion of case studies to begin establishing a framework for the responsible use of data in humanitarian contexts. It identifies four “Minimum Humanitarian standards for the Responsible use of Data” and four “Characteristics of Humanitarian Organizations that use Data Responsibly.” Together, these eight attributes can serve as a roadmap or blueprint for humanitarian groups seeking to use data. In addition, the paper also provides a four-step practical guide for a data responsibility framework (see also earlier blog)….(More)” Full Paper: Building Data Responsibility into Humanitarian Action

Outstanding Challenges in Recent Open Government Data Initiatives

Curated on May 19, 2016August 3, 2018 by Stefaan Verhulst

Paper by Usamah A. Algemili: “In recent years, we have witnessed increasing interest in government data. Many governments around the world have sensed the value of its passive data sets. These governments started their Open Data policies, yet many countries are on the way converting raw data into useful representation. This paper surveys the previous efforts of Open Data initiatives. It discusses the various challenges that open data projects may encounter during the transformation from passive data sets towards Open Data culture. It reaches out project teams acquiring their practical assessment. Thus, an online form has been distributed among project teams. The questionnaire was developed in alignment with previous literature of data integration challenges. 138 eligible professional participated, and their responds has been analyzed by the researcher. The result section identifies the most critical challenges from project-teams’ point-of-view, and the findings show four obstacles that stand out as critical challenges facing project teams. This paper casts on these challenges, and it attempts to indicate the missing gap between current guidelines and practical experience. Accordingly, this paper presents the current infrastructure of Open Data framework followed by additional recommendations that may lead to successful implementation of Open Data development….(More)”

We know where you live

Curated on May 18, 2016August 3, 2018 by Stefaan Verhulst

MIT News Office: “From location data alone, even low-tech snoopers can identify Twitter users’ homes, workplaces….Researchers at MIT and Oxford University have shown that the location stamps on just a handful of Twitter posts — as few as eight over the course of a single day — can be enough to disclose the addresses of the poster’s home and workplace to a relatively low-tech snooper.

The tweets themselves might be otherwise innocuous — links to funny videos, say, or comments on the news. The location information comes from geographic coordinates automatically associated with the tweets.

Twitter’s location-reporting service is off by default, but many Twitter users choose to activate it. The new study is part of a more general project at MIT’s Internet Policy Research Initiative to help raise awareness about just how much privacy people may be giving up when they use social media.

The researchers describe their research in a paper presented last week at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems, where it received an honorable mention in the best-paper competition, a distinction reserved for only 4 percent of papers accepted to the conference.

“Many people have this idea that only machine-learning techniques can discover interesting patterns in location data,” says Ilaria Liccardi, a research scientist at MIT’s Internet Policy Research Initiative and first author on the paper. “And they feel secure that not everyone has the technical knowledge to do that. With this study, what we wanted to show is that when you send location data as a secondary piece of information, it is extremely simple for people with very little technical knowledge to find out where you work or live.”

Conclusions from clustering

In their study, Liccardi and her colleagues — Alfie Abdul-Rahman and Min Chen of Oxford’s e-Research Centre in the U.K. — used real tweets from Twitter users in the Boston area. The users consented to the use of their data, and they also confirmed their home and work addresses, their commuting routes, and the locations of various leisure destinations from which they had tweeted.

The time and location data associated with the tweets were then presented to a group of 45 study participants, who were asked to try to deduce whether the tweets had originated at the Twitter users’ homes, their workplaces, leisure destinations, or locations along their commutes. The participants were not recruited on the basis of any particular expertise in urban studies or the social sciences; they just drew what conclusions they could from location clustering….

Predictably, participants fared better with map-based representations, correctly identifying Twitter users’ homes roughly 65 percent of the time and their workplaces at closer to 70 percent. Even the tabular representation was informative, however, with accuracy rates of just under 50 percent for homes and a surprisingly high 70 percent for workplaces….(More; Full paper )”