Private Data and the Public Good


Gideon Mann‘s remarks on the occasion of the Robert Khan distinguished lecture at The City College of New York on 5/22/16: and opportunities about a specific aspect of this relationship, the broader need for computer science to engage with the real world. Right now, a key aspect of this relationship is being built around the risks and opportunities of the emerging role of data.

Ultimately, I believe that these relationships, between computer science andthe real world, between data science and real problems, hold the promise tovastly increase our public welfare. And today, we, the people in this room,have a unique opportunity to debate and define a more moral dataeconomy….

The hybrid research model proposes something different. The hybrid research model, embeds, as it were, researchers as practitioners.The thought was always that you would be going about your regular run of business,would face a need to innovate to solve a crucial problem, and would do something novel. At that point, you might choose to work some extra time and publish a paper explaining your innovation. In practice, this model rarely works as expected. Tight deadlines mean the innovation that people do in their normal progress of business is incremental..

This model separated research from scientific publication, and shortens thetime-window of research, to what can be realized in a few year time zone.For me, this always felt like a tremendous loss, with respect to the older so-called “ivory tower” research model. It didn’t seem at all clear how this kindof model would produce the sea change of thought engendered byShannon’s work, nor did it seem that Claude Shannon would ever want towork there. This kind of environment would never support the freestanding wonder, like the robot mouse that Shannon worked on. Moreover, I always believed that crucial to research is publication and participation in the scientific community. Without this engagement, it feels like something different — innovation perhaps.

It is clear that the monopolistic environment that enabled AT&T to support this ivory tower research doesn’t exist anymore. .

Now, the hybrid research model was one model of research at Google, butthere is another model as well, the moonshot model as exemplified byGoogle X. Google X brought together focused research teams to driveresearch and development around a particular project — Google Glass and the Self-driving car being two notable examples. Here the focus isn’t research, but building a new product, with research as potentially a crucial blocking issue. Since the goal of Google X is directly to develop a new product, by definition they don’t publish papers along the way, but they’re not as tied to short-term deliverables as the rest of Google is. However, they are again decidedly un-Bell-Labs like — a secretive, tightly focused, non-publishing group. DeepMind is a similarly constituted initiative — working, for example, on a best-in-the-world Go playing algorithm, with publications happening sparingly.

Unfortunately, both of these approaches, the hybrid research model and the moonshot model stack the deck towards a particular kind of research — research that leads to relatively short term products that generate corporate revenue. While this kind of research is good for society, it isn’t the only kind of research that we need. We urgently need research that is longterm, and that is undergone even without a clear financial local impact. Insome sense this is a “tragedy of the commons”, where a shared public good (the commons) is not supported because everyone can benefit from itwithout giving back. Academic research is thus a non-rival, non-excludible good, and thus reasonably will be underfunded. In certain cases, this takes on an ethical dimension — particularly in health care, where the choice ofwhat diseases to study and address has a tremendous potential to affect human life. Should we research heart disease or malaria? This decisionmakes a huge impact on global human health, but is vastly informed by the potential profit from each of these various medicines….

Private Data means research is out of reach

The larger point that I want to make, is that in the absence of places where long-term research can be done in industry, academia has a tremendous potential opportunity. Unfortunately, it is actually quite difficult to do the work that needs to be done in academia, since many of the resources needed to push the state of the art are only found in industry: in particular data.

Of course, academia also lacks machine resources, but this is a simpler problem to fix — it’s a matter of money, resources form the government could go to enabling research groups building their own data centers or acquiring the computational resources from the market, e.g. Amazon. This is aided by the compute philanthropy that Google and Microsoft practice that grant compute cycles to academic organizations.

But the data problem is much harder to address. The data being collected and generated at private companies could enable amazing discoveries and research, but is impossible for academics to access. The lack of access to private data from companies actually is much more significant effects than inhibiting research. In particular, the consumer level data, collected by social networks and internet companies could do much more than ad targeting.

Just for public health — suicide prevention, addiction counseling, mental health monitoring — there is enormous potential in the use of our online behavior to aid the most needy, and academia and non-profits are set-up to enable this work, while companies are not.

To give a one examples, anorexia and eating disorders are vicious killers. 20 million women and 10 million men suffer from a clinically significant eating disorder at some time in their life, and sufferers of eating disorders have the highest mortality rate of any other mental health disorder — with a jaw-dropping estimated mortality rate of 10%, both directly from injuries sustained by the disorder and by suicide resulting from the disorder.

Eating disorders are particular in that sufferers often seek out confirmatory information, blogs, images and pictures that glorify and validate what sufferers see as “lifestyle” choices. Browsing behavior that seeks out images and guidance on how to starve yourself is a key indicator that someone is suffering. Tumblr, pinterest, instagram are places that people host and seek out this information. Tumblr has tried to help address this severe mental health issue by banning blogs that advocate for self-harm and by adding PSA announcements to query term searches for queries for or related to anorexia. But clearly — this is not the be all and end all of work that could be done to detect and assist people at risk of dying from eating disorders. Moreover, this data could also help understand the nature of those disorders themselves…..

There is probably a role for a data ombudsman within private organizations — someone to protect the interests of the public’s data inside of an organization. Like a ‘public editor’ in a newspaper according to how you’ve set it up. There to protect and articulate the interests of the public, which means probably both sides — making sure a company’s data is used for public good where appropriate, and making sure the ‘right’ to privacy of the public is appropriately safeguarded (and probably making sure the public is informed when their data is compromised).

Next, we need a platform to make collaboration around social good between companies and between companies and academics. This platform would enable trusted users to have access to a wide variety of data, and speed process of research.

Finally, I wonder if there is a way that government could support research sabbaticals inside of companies. Clearly, the opportunities for this research far outstrip what is currently being done…(more)”

Value and Vulnerability: The Internet of Things in a Connected State Government


Pressrelease: “The National Association of State Chief Information Officers (NASCIO) today released a policy brief on the Internet of Things (IoT) in state government. The paper focuses on the different ways state governments are using IoT now and in the future and the policy considerations involved.

“In NASCIO’s 2015 State CIO Survey, we asked state CIOs to what extent IoT was on their agenda. Just over half said they were in informal discussions, however only one in five had moved to the formal discussion phase. We believe IoT needs to be a formal part of each state’s policy considerations,” explained NASCIO Executive Director Doug Robinson.

The paper encourages state CIOs to make IoT part of the enterprise architecture discussions on asset management and risk assessment and to develop an IoT roadmap.

“Cities and municipalities have been working toward the designation of ‘smart city’ for a while now,” said Darryl Ackley, cabinet secretary for the New Mexico Department of Information Technology and NASCIO president. “While states provide different services than cities, we are seeing a lot of activity around IoT to improve citizen services and we see great potential for growth. The more organized and methodical states can be about implementing IoT, the more successful and useful the outcomes.”

Read the policy brief at www.NASCIO.org/ValueAndVulnerability 

Do Open Comment Processes Increase Regulatory Compliance? Evidence from a Public Goods Experiment


Stephen N. Morgan, Nicole M. Mason and Robert S. Shupp at EconPapers: “Agri-environmental programs often incorporate stakeholder participation elements in an effort to increase community ownership of policies designed to protect environmental resources (Hajer 1995; Fischer 2000). Participation – acting through increased levels of ownership – is then expected to increase individual rates of compliance with regulatory policies. Utilizing a novel lab experiment, this research leverages a public goods contribution game to test the effects of a specific type of stakeholder participation scheme on individual compliance outcomes. We find significant evidence that the implemented type of non-voting participation mechanism reduces the probability that an individual will engage in noncompliant behavior and reduces the level of noncompliance. At the same time, exposure to the open comment treatment also increases individual contributions to a public good. Additionally, we find evidence that exposure to participation schemes results in a faster decay in individual compliance over time suggesting that the impacts of this type of participation mechanism may be transitory….(More)”

WeatherUSI: User-Based Weather Crowdsourcing on Public Displays


Evangelos Niforatos, Ivan Elhart and Marc Langheinrich in Web Engineering: “Contemporary public display systems hold a significant potential to contribute to in situ crowdsourcing. Recently, public display systems have surpassed their traditional role as static content projection hotspots by supporting interactivity and hosting applications that increase overall perceived user utility. As such, we developed WeatherUSI, a web-based interactive public display application that enables passers-by to input subjective information about current and future weather conditions. In this demo paper, we present the functionality of the app, describe the underlying system infrastructure and present how we combine input streams originating from WeatherUSI app on a public display together with its mobile app counterparts for facilitating user based weather crowdsourcing….(more)”

Big data: Issues for an international political sociology of the datafication of worlds


Paper by Madsen, A.K.; Flyverbom, M.; Hilbert, M. and Ruppert, Evelyn: “The claim that big data can revolutionize strategy and governance in the context of international relations is increasingly hard to ignore. Scholars of international political sociology have mainly discussed this development through the themes of security and surveillance. The aim of this paper is to outline a research agenda that can be used to raise a broader set of sociological and practice-oriented questions about the increasing datafication of international relations and politics. First, it proposes a way of conceptualizing big data that is broad enough to open fruitful investigations into the emerging use of big data in these contexts. This conceptualization includes the identification of three moments contained in any big data practice. Secondly, it suggests a research agenda built around a set of sub-themes that each deserve dedicated scrutiny when studying the interplay between big data and international relations along these moments. Through a combination of these moments and sub-themes, the paper suggests a roadmap for an international political sociology of the datafication of worlds….(more)”

Time for sharing data to become routine: the seven excuses for not doing so are all invalid


Paper by Richard Smith and Ian Roberts: “Data are more valuable than scientific papers but researchers are incentivised to publish papers not share data. Patients are the main beneficiaries of data sharing but researchers have several incentives not to share: others might use their data to get ahead in the academic rat race; they might be scooped; their results might not be replicable; competitors may reach different conclusions; their data management might be exposed as poor; patient confidentiality might be breached; and technical difficulties make sharing impossible. All of these barriers can be overcome and researchers should be rewarded for sharing data. Data sharing must become routine….(More)”

Improving patient care by bridging the divide between doctors and data scientists


 at the Conversation: “While wonderful new medical discoveries and innovations are in the news every day, doctors struggle daily with using information and techniques available right now while carefully adopting new concepts and treatments. As a practicing doctor, I deal with uncertainties and unanswered clinical questions all the time….At the moment, a report from the National Academy of Medicine tells us, most doctors base most of their everyday decisions on guidelines from (sometimes biased) expert opinions or small clinical trials. It would be better if they were from multicenter, large, randomized controlled studies, with tightly controlled conditions ensuring the results are as reliable as possible. However, those are expensive and difficult to perform, and even then often exclude a number of important patient groups on the basis of age, disease and sociological factors.

Part of the problem is that health records are traditionally kept on paper, making them hard to analyze en masse. As a result, most of what medical professionals might have learned from experiences was lost – or at least was inaccessible to another doctor meeting with a similar patient.

A digital system would collect and store as much clinical data as possible from as many patients as possible. It could then use information from the past – such as blood pressure, blood sugar levels, heart rate and other measurements of patients’ body functions – to guide future doctors to the best diagnosis and treatment of similar patients.

Industrial giants such as Google, IBM, SAP and Hewlett-Packard have also recognized the potential for this kind of approach, and are now working on how to leverage population data for the precise medical care of individuals.

Collaborating on data and medicine

At the Laboratory of Computational Physiology at the Massachusetts Institute of Technology, we have begun to collect large amounts of detailed patient data in the Medical Information Mart in Intensive Care (MIMIC). It is a database containing information from 60,000 patient admissions to the intensive care units of the Beth Israel Deaconess Medical Center, a Boston teaching hospital affiliated with Harvard Medical School. The data in MIMIC has been meticulously scoured so individual patients cannot be recognized, and is freely shared online with the research community.

But the database itself is not enough. We bring together front-line clinicians (such as nurses, pharmacists and doctors) to identify questions they want to investigate, and data scientists to conduct the appropriate analyses of the MIMIC records. This gives caregivers and patients the best individualized treatment options in the absence of a randomized controlled trial.

Bringing data analysis to the world

At the same time we are working to bring these data-enabled systems to assist with medical decisions to countries with limited health care resources, where research is considered an expensive luxury. Often these countries have few or no medical records – even on paper – to analyze. We can help them collect health data digitally, creating the potential to significantly improve medical care for their populations.

This task is the focus of Sana, a collection of technical, medical and community experts from across the globe that is also based in our group at MIT. Sana has designed a digital health information system specifically for use by health providers and patients in rural and underserved areas.

At its core is an open-source system that uses cellphones – common even in poor and rural nations – to collect, transmit and store all sorts of medical data. It can handle not only basic patient data such as height and weight, but also photos and X-rays, ultrasound videos, and electrical signals from a patient’s brain (EEG) and heart (ECG).

Partnering with universities and health organizations, Sana organizes training sessions (which we call “bootcamps”) and collaborative workshops (called “hackathons”) to connect nurses, doctors and community health workers at the front lines of care with technology experts in or near their communities. In 2015, we held bootcamps and hackathons in Colombia, Uganda, Greece and Mexico. The bootcamps teach students in technical fields like computer science and engineering how to design and develop health apps that can run on cellphones. Immediately following the bootcamp, the medical providers join the group and the hackathon begins…At the end of the day, though, the purpose is not the apps….(More)

Smart crowds in smart cities: real life, city scale deployments of a smartphone based participatory crowd management platform


Tobias FrankePaul Lukowicz and Ulf Blanke at the Journal of Internet Services and Applications: “Pedestrian crowds are an integral part of cities. Planning for crowds, monitoring crowds and managing crowds, are fundamental tasks in city management. As a consequence, crowd management is a sprawling R&D area (see related work) that includes theoretical models, simulation tools, as well as various support systems. There has also been significant interest in using computer vision techniques to monitor crowds. However, overall, the topic of crowd management has been given only little attention within the smart city domain. In this paper we report on a platform for smart, city-wide crowd management based on a participatory mobile phone sensing platform. Originally, the apps based on this platform have been conceived as a technology validation tool for crowd based sensing within a basic research project. However, the initial deployments at the Notte Bianca Festival1 in Malta and at the Lord Mayor’s Show in London2 generated so much interest within the civil protection community that it has gradually evolved into a full-blown participatory crowd management system and is now in the process of being commercialized through a startup company. Until today it has been deployed at 14 events in three European countries (UK, Netherlands, Switzerland) and used by well over 100,000 people….

Obtaining knowledge about the current size and density of a crowd is one of the central aspects of crowd monitoring . For the last decades, automatic crowd monitoring in urban areas has mainly been performed by means of image processing . One use case for such video-based applications can be found in, where a CCTV camera-based system is presented that automatically alerts the staff of subway stations when the waiting platform is congested. However, one of the downsides of video-based crowd monitoring is the fact that video cameras tend to be considered as privacy invading. Therefore,  presents a privacy preserving approach to video-based crowd monitoring where crowd sizes are estimated without people models or object tracking.

With respect to the mitigation of catastrophes induced by panicking crowds (e.g. during an evacuation), city planners and architects increasingly rely on tools simulating crowd behaviors in order to optimize infrastructures. Murakami et al. presents an agent based simulation for evacuation scenarios. Shendarkar et al. presents a work that is also based on BSI (believe, desire, intent) agents – those agents however are trained in a virtual reality environment thereby giving greater flexibility to the modeling. Kluepfel et al. on the other hand uses a cellular automaton model for the simulation of crowd movement and egress behavior.

With smartphones becoming everyday items, the concept of crowd sourcing information from users of mobile application has significantly gained traction. Roitman et al. presents a smart city system where the crowd can send eye witness reports thereby creating deeper insights for city officials. Szabo et al. takes this approach one step further and employs the sensors built into smartphones for gathering data for city services such as live transit information. Ghose et al. utilizes the same principle for gathering information on road conditions. Pan et al. uses a combination of crowd sourcing and social media analysis for identifying traffic anomalies….(More)”.

Teenage scientists enlisted to fight Zika


ShareAmerica: “A mosquito’s a mosquito, right? Not when it comes to Zika and other mosquito-borne diseases.

Only two of the estimated 3,000 species of mosquitoes are capable of carrying the Zika virus in the United States, but estimates of their precise range remain hazy, according to the U.S. Centers for Disease Control and Prevention.

Scientists could start getting better information about these pesky, but important, insects with the help of plastic cups, brown paper towels and teenage biology students.

As part of the Invasive Mosquito Project from the U.S. Department of Agriculture, secondary-school students nationwide are learning about mosquito populations and helping fill the knowledge gaps.

Simple experiment, complex problem

The experiment works like this: First, students line the cups with paper, then fill two-thirds of the cups with water. Students place the plastic cups outside, and after a week, the paper is dotted with what looks like specks of dirt. These dirt particles are actually mosquito eggs, which the students can identify and classify.

Students then upload their findings to a national crowdsourced database. Crowdsourcing uses the collective intelligence of online communities to “distribute” problem solving across a massive network.

Entomologist Lee Cohnstaedt of the U.S. Department of Agriculture coordinates the program, and he’s already thinking about expansion. He said he hopes to have one-fifth of U.S. schools participate in the mosquito species census. He also plans to adapt lesson plans for middle schools, Scouting troops and gardening clubs.

Already, crowdsourcing has “collected better data than we could have working alone,” he told the Associated Press….

In addition to mosquito tracking, crowdsourcing has been used to develop innovative responses to a number of complex challenges, from climate change to archaeologyto protein modeling….(More)”

Reining in the Big Promise of Big Data: Transparency, Inequality, and New Regulatory Frontiers


Paper by Philipp Hacker and Bilyana Petkova: “The growing differentiation of services based on Big Data harbors the potential for both greater societal inequality and for greater equality. Anti-discrimination law and transparency alone, however, cannot do the job of curbing Big Data’s negative externalities while fostering its positive effects.

To rein in Big Data’s potential, we adapt regulatory strategies from behavioral economics, contracts and criminal law theory. Four instruments stand out: First, active choice may be mandated between data collecting services (paid by data) and data free services (paid by money). Our suggestion provides concrete estimates for the price range of a data free option, sheds new light on the monetization of data collecting services, and proposes an “inverse predatory pricing” instrument to limit excessive pricing of the data free option. Second, we propose using the doctrine of unconscionability to prevent contracts that unreasonably favor data collecting companies. Third, we suggest democratizing data collection by regular user surveys and data compliance officers partially elected by users. Finally, we trace back new Big Data personalization techniques to the old Hartian precept of treating like cases alike and different cases – differently. If it is true that a speeding ticket over $50 is less of a disutility for a millionaire than for a welfare recipient, the income and wealth-responsive fines powered by Big Data that we suggest offer a glimpse into the future of the mitigation of economic and legal inequality by personalized law. Throughout these different strategies, we show how salience of data collection can be coupled with attempts to prevent discrimination against and exploitation of users. Finally, we discuss all four proposals in the context of different test cases: social media, student education software and credit and cell phone markets.

Many more examples could and should be discussed. In the face of increasing unease about the asymmetry of power between Big Data collectors and dispersed users, about differential legal treatment, and about the unprecedented dimensions of economic inequality, this paper proposes a new regulatory framework and research agenda to put the powerful engine of Big Data to the benefit of both the individual and societies adhering to basic notions of equality and non-discrimination….(More)”