Towards Timely Public Health Decisions to Tackle Seasonal Diseases With Open Government Data


Paper by Vandana Srivastava and Biplav Srivastava for the Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence : “Improving public health is a major responsibility of any government, and is of major interest to citizens and scientific communities around the world. Here, one sees two extremes. On one hand, tremendous progress has been made in recent years in the understanding of causes, spread and remedies of common and regularly occurring diseases like Dengue, Malaria and Japanese Encephalistis (JE). On the other hand, public agencies treat these diseases in an ad hoc manner without learning from the experiences of previous years. Specifically, they would get alerted once reported cases have already arisen substantially in the known disease season, reactively initiate a few actions and then document the disease impact (cases, deaths) for that period, only to forget this learning in the next season. However, they miss the opportunity to reduce preventable deaths and sickness, and their corresponding economic impact, which scientific progress could have enabled. The gap is universal but very prominent in developing countries like India.
In this paper, we show that if public agencies provide historical disease impact information openly, it can be analyzed with statistical and machine learning techniques, correlated with best emerging practices in disease control, and simulated in a setting to optimize social benefits to provide timely guidance for new disease seasons and regions. We illustrate using open data for mosquito-borne communicable diseases; published results in public health on efficacy of Dengue control methods and apply it on a simulated typical city for maximal benefits with available resources. The exercise helps us further suggest strategies for new regions that may be anywhere in the world, how data could be better recorded by city agencies and what prevention methods should medical community focus on for wider impact.
Full Text: PDF

'Big Data' Will Change How You Play, See the Doctor, Even Eat


We’re entering an age of personal big data, and its impact on our lives will surpass that of the Internet. Data will answer questions we could never before answer with certainty—everyday questions like whether that dress actually makes you look fat, or profound questions about precisely how long you will live.

ADVERTISEMENT

Every 20 years or so, a powerful technology moves from the realm of backroom expertise and into the hands of the masses. In the late-1970s, computing made that transition—from mainframes in glass-enclosed rooms to personal computers on desks. In the late 1990s, the first web browsers made networks, which had been for science labs and the military, accessible to any of us, giving birth to the modern Internet.

Each transition touched off an explosion of innovation and reshaped work and leisure. In 1975, 50,000 PCs were in use worldwide. Twenty years later: 225 million. The number of Internet users in 1995 hit 16 million. Today it’s more than 3 billion. In much of the world, it’s hard to imagine life without constant access to both computing and networks.

The 2010s will be the coming-out party for data. Gathering, accessing and gleaning insights from vast and deep data has been a capability locked inside enterprises long enough. Cloud computing and mobile devices now make it possible to stand in a bathroom line at a baseball game while tapping into massive computing power and databases. On the other end, connected devices such as the Nest thermostat or Fitbit health monitor and apps on smartphones increasingly collect new kinds of information about everyday personal actions and habits, turning it into data about ourselves.

More than 80 percent of data today is unstructured: tangles of YouTube videos, news stories, academic papers, social network comments. Unstructured data has been almost impossible to search for, analyze and mix with other data. A new generation of computers—cognitive computing systems that learn from data—will read tweets or e-books or watch video, and comprehend its content. Somewhat like brains, these systems can link diverse bits of data to come up with real answers, not just search results.

Such systems can work in natural language. The progenitor is the IBM Watson computer that won on Jeopardy in 2011. Next-generation Watsons will work like a super-powered Google. (Google today is a data-searching wimp compared with what’s coming.)

Sports offers a glimpse into the data age. Last season the NBA installed in every arena technology that can “watch” a game and record, in 48 minutes of action, more than 4 million data points about every movement and shot. That alone could yield new insights for NBA coaches, such as which group of five players most efficiently passes the ball around….

Think again about life before personal computing and the Internet. Even if someone told you that you’d eventually carry a computer in your pocket that was always connected to global networks, you would’ve had a hard time imagining what that meant—imagining WhatsApp, Siri, Pandora, Uber, Evernote, Tinder.

As data about everything becomes ubiquitous and democratized, layered on top of computing and networks, it will touch off the most spectacular technology explosion yet. We can see the early stages now. “Big data” doesn’t even begin to describe the enormity of what’s coming next.”

Chief Executive of Nesta on the Future of Government Innovation


Interview between Rahim Kanani and Geoff Mulgan, CEO of NESTA and member of the MacArthur Research Network on Opening Governance: “Our aspiration is to become a global center of expertise on all kinds of innovation, from how to back creative business start-ups and how to shape innovations tools such as challenge prizes, to helping governments act as catalysts for new solutions,” explained Geoff Mulgan, chief executive of Nesta, the UK’s innovation foundation. In an interview with Mulgan, we discussed their new report, published in partnership with Bloomberg Philanthropies, which highlights 20 of the world’s top innovation teams in government. Mulgan and I also discussed the founding and evolution of Nesta over the past few years, and leadership lessons from his time inside and outside government.
Rahim Kanani: When we talk about ‘innovations in government’, isn’t that an oxymoron?
Geoff Mulgan: Governments have always innovated. The Internet and World Wide Web both originated in public organizations, and governments are constantly developing new ideas, from public health systems to carbon trading schemes, online tax filing to high speed rail networks.  But they’re much less systematic at innovation than the best in business and science.  There are very few job roles, especially at senior levels, few budgets, and few teams or units.  So although there are plenty of creative individuals in the public sector, they succeed despite, not because of the systems around them. Risk-taking is punished not rewarded.   Over the last century, by contrast, the best businesses have learned how to run R&D departments, product development teams, open innovation processes and reasonably sophisticated ways of tracking investments and returns.
Kanani: This new report, published in partnership with Bloomberg Philanthropies, highlights 20 of the world’s most effective innovation teams in government working to address a range of issues, from reducing murder rates to promoting economic growth. Before I get to the results, how did this project come about, and why is it so important?
Mulgan: If you fail to generate new ideas, test them and scale the ones that work, it’s inevitable that productivity will stagnate and governments will fail to keep up with public expectations, particularly when waves of new technology—from smart phones and the cloud to big data—are opening up dramatic new possibilities.  Mayor Bloomberg has been a leading advocate for innovation in the public sector, and in New York he showed the virtues of energetic experiment, combined with rigorous measurement of results.  In the UK, organizations like Nesta have approached innovation in a very similar way, so it seemed timely to collaborate on a study of the state of the field, particularly since we were regularly being approached by governments wanting to set up new teams and asking for guidance.
Kanani: Where are some of the most effective innovation teams working on these issues, and how did you find them?
Mulgan: In our own work at Nesta, we’ve regularly sought out the best innovation teams that we could learn from and this study made it possible to do that more systematically, focusing in particular on the teams within national and city governments.  They vary greatly, but all the best ones are achieving impact with relatively slim resources.  Some are based in central governments, like Mindlab in Denmark, which has pioneered the use of design methods to reshape government services, from small business licensing to welfare.  SITRA in Finland has been going for decades as a public technology agency, and more recently has switched its attention to innovation in public services. For example, providing mobile tools to help patients manage their own healthcare.   In the city of Seoul, the Mayor set up an innovation team to accelerate the adoption of ‘sharing’ tools, so that people could share things like cars, freeing money for other things.  In south Australia the government set up an innovation agency that has been pioneering radical ways of helping troubled families, mobilizing families to help other families.
Kanani: What surprised you the most about the outcomes of this research?
Mulgan: Perhaps the biggest surprise has been the speed with which this idea is spreading.  Since we started the research, we’ve come across new teams being created in dozens of countries, from Canada and New Zealand to Cambodia and Chile.  China has set up a mobile technology lab for city governments.  Mexico City and many others have set up labs focused on creative uses of open data.  A batch of cities across the US supported by Bloomberg Philanthropy—from Memphis and New Orleans to Boston and Philadelphia—are now showing impressive results and persuading others to copy them.
 

Business Models That Take Advantage of Open Data Opportunities


Mark Boyd at the Programmeableweb: “At last week’s OKFestival in Berlin, Kat Borlongan and Chloé Bonnet from Parisian open data startup Five By Five moderated an interactive speed-geek session to examine how startups are building viability using open data and open data APIs. The picture that emerged revealed a variety of composite approaches being used, with all those presenting having just one thing in common: a commitment to fostering ecosystems that will allow other startups to build alongside them.
The OKFestival—hosted by the Open Knowledge Foundation—brought together more than 1,000 participants from around the globe working on various aspects of the open data agenda: the use of corporate data, open science research, government open data and crowdsourced data projects.
In a session held on the first day of the event, Borlongan facilitated an interactive workshop to help would-be entrepreneurs understand how startups are building business models that take advantage of open data opportunities to create sustainable, employment-generating businesses.
Citing research from the McKinsey Institute that calculates the value of open data to be worth $3 trillion globally, Borlongan said: “So the understanding of the open data process is usually: We throw open data over the wall, then we hold a hackathon, and then people will start making products off it, and then we make the $3 trillion.”
Borlongan argued that it is actually a “blurry identity to be an open data startup” and encouraged participants to unpack, with each of the startups presenting exactly how income can be generated and a viable business built in this space.
Jeni Tennison, from the U.K.’s Open Data Institute (which supports 15 businesses in its Startup Programme) categorizes two types of business models:

  1. Businesses that publish (but do not sell) open data.
  2. Businesses built on top of using open data.

Businesses That Publish but Do Not Sell Open Data

At the Open Data Institute, Tennison is investigating the possibility of an open address database that would provide street address data for every property in the U.K. She describes three types of business models that could be created by projects that generated and published such data:
Freemium: In this model, the bulk data of open addresses could be made available freely, “but if you want an API service, then you would pay for it.” Tennison pointed to lots of opportunities also to degrade the freemium-level data—for example, having it available in bulk but not at a particularly granular level (unless you pay for it), or by provisioning reuse on a share-only basis, but you would pay if you wanted the data for corporate use cases (similar to how OpenCorporates sells access to its data).
Cross-subsidy: In this approach, the data would be available, and the opportunities to generate income would come from providing extra services, like consultancy or white labeling data services alongside publishing the open data.
Network: In this business model, value is created by generating a network effect around the core business interest, which may not be the open data itself. As an example, Tennison suggested that if a post office or delivery company were to create the open address database, it might be interested in encouraging private citizens to collaboratively maintain or crowdsource the quality of the data. The revenue generated by this open data would then come from reductions in the cost of delivery services as the data improved accuracy.

Businesses Built on Top of Open Data

Six startups working in unique ways to make use of available open data also presented their business models to OKFestival attendees: Development Seed, Mapbox, OpenDataSoft, Enigma.io, Open Bank API, and Snips.

Startup: Development Seed
What it does: Builds solutions for development, public health and citizen democracy challenges by creating open source tools and utilizing open data.
Open data API focus: Regularly uses open data APIs in its projects. For example, it worked with the World Bank to create a data visualization website built on top of the World Bank API.
Type of business model: Consultancy, but it has also created new businesses out of the products developed as part of its work, most notably Mapbox (see below).

Startup: Enigma.io
What it does: Open data platform with advanced discovery and search functions.
Open data API focus: Provides the Enigma API to allow programmatic access to all data sets and some analytics from the Enigma platform.
Type of business model: SaaS including a freemium plan with no degradation of data and with access to API calls; some venture funding; some contracting services to particular enterprises; creating new products in Enigma Labs for potential later sale.

Startup: Mapbox
What it does: Enables users to design and publish maps based on crowdsourced OpenStreetMap data.
Open data API focus: Uses OpenStreetMap APIs to draw data into its map-creation interface; provides the Mapbox API to allow programmatic creation of maps using Mapbox web services.
Type of business model: SaaS including freemium plan; some tailored contracts for big map users such as Foursquare and Evernote.

Startup: Open Bank Project
What it does: Creates an open source API for use by banks.
Open data API focus: Its core product is to build an API so that banks can use a standard, open source API tool when creating applications and web services for their clients.
Type of business model: Contract license with tiered SLAs depending on the number of applications built using the API; IT consultancy projects.

Startup: OpenDataSoft
What it does: Provides an open data publishing platform so that cities, governments, utilities and companies can publish their own data portal for internal and public use.
Open data API focus: It’s able to route data sources into the portal from a publisher’s APIs; provides automatic API-creation tools so that any data set uploaded to the portal is then available as an API.
Type of business model: SaaS model with freemium plan, pricing by number of data sets published and number of API calls made against the data, with free access for academic and civic initiatives.

Startup: Snips
What it does: Predictive modeling for smart cities.
Open data API focus: Channels some open and client proprietary data into its modeling algorithm calculations via API; provides a predictive modeling API for clients’ use to programmatically generate solutions based on their data.
Type of business model: Creating one B2C app product for sale as a revenue-generation product; individual contracts with cities and companies to solve particular pain points, such as using predictive modeling to help a post office company better manage staff rosters (matched to sales needs) and a consultancy project to create a visualization mapping tool that can predict the risk of car accidents for a city….”

Big Money, Uncertain Return


Mary K. Pratt  in a MIT Technology Review Special Report on Data-Driven Health Care: “Hospitals are spending billions collecting and analyzing medical data. The one data point no one is tracking: the payoff…. Ten years ago, Kaiser Permanente began building a $4 billion electronic-health-record system that includes a comprehensive collection of health-care data ranging from patients’ treatment records to research-based clinical advice. Now Kaiser has added advanced analytics tools and data from more sources, including a pilot program that integrates information from patients’ medical devices.

Faced with new government regulations and insurer pressure to control costs, other health-care organizations are following Kaiser’s example and increasing their use of analytics. The belief: that mining their vast quantities of patient data will yield insights into the best treatments at the lowest cost.

But just how big will the financial payoff be? Terhilda Garrido, vice president of health IT transformation and analytics at Kaiser, admits she doesn’t know. Nor do other health-care leaders. The return on investment for health-care analytics programs remains elusive and nearly impossible for most to calculate…

Opportunities to identify the most effective treatments could slip away if CIOs and their teams aren’t able to quantify the return on their analytics investments. Health-care providers are under increasing pressure to cut costs in an era of capped billing, and executives at medical organizations won’t okay spending their increasingly limited dollars on data warehouses, analytics software, and data scientists if they can’t be sure they’ll see real benefit.

A new initiative at Cleveland Clinic shows the opportunities and challenges. By analyzing patients’ records on their overall health and medical conditions, the medical center determines which patients coming in for hip and knee replacements can get postoperative services in their own homes (the most cost-effective option), which ones will need a short stay in a skilled nursing facility, and which ones will have longer stints in a skilled nursing facility (the most costly option). The classifications control costs while still ensuring the best possible medical outcomes, says CIO C. Martin Harris.

That does translate into real—and significant—financial benefits, but Harris wonders how to calculate the payoff from his data investment. Should the costs of every system from which patient data is pulled be part of the equation in addition to the costs of the data warehouse and analytics tools? Calculating how much money is saved by implementing better protocols is not straightforward either. Harris hesitates to attribute better, more cost-effective patient outcomes solely to analytics when many other factors are also likely contributors…”

European Commission encourages re-use of public sector data


Press Release: “Today, the European Commission is publishing guidelines to help Member States benefit from the revised Directive on the re-use of public sector information (PSI Directive). These guidelines explain for example how to give access to weather data, traffic data, property asset data and maps. Open data can be used as the basis for innovative value-added services and products, such as mobile apps, which encourage investment in data-driven sectors. The guidelines published today are based on a detailed consultation and cover issues such as:

  1. Licencing: guidelines on when public bodies can allow the re-use of documents without conditions or licences; gives conditions under which the re-use of personal data is possible. For example:

  • Public sector bodies should not impose licences when a simple notice is sufficient;

  • Open licences available on the web, such as several “Creative Commons” licences can facilitate the re-use of public sector data without the need to develop custom-made licences;

  • Attribution requirement is sufficient in most cases of PSI re-use.

  1. Datasets: presents five thematic dataset categories that businesses and other potential re-users are mostly interested in and could thus be given priority for being made available for re-use. For example:

  • Postcodes, national and local maps;

  • Weather, land and water quality, energy consumption, emission levels and other environmental and earth data;

  • Transport data: public transport timetables, road works, traffic information;

  • Statistics: GDP, age, health, unemployment, income, education etc.;

  • Company and business registers.

  1. Cost: gives an overview on how public sector bodies, including libraries, museums and archives, should calculate the amount they should charge re-users for data. For example:

  • Where digital documents are downloaded electronically a no‑cost policy is recommended;

  • For cost-recovery charging, any income generated in the process of collecting or producing documents, e.g. from registration fees or taxes, should be subtracted from the total costs incurred so as to establish the ‘net cost’ of collection, production, reproduction and dissemination.

European Commission Vice President @NeelieKroesEU said: “This guidance will help all of us benefit from the wealth of information public bodies hold. Opening and re-using this data will lead to many new businesses and convenient services.

An independent report carried out by the consultants McKinsey in 2013 claimed that open data re-use could boost the global economy hugely; and a 2013 Spanish studyfound that commercial re-users in Spain could employ around 10,000 people and reach a business volume of €900 million….”

See also Speech by Neelie Kroes: Embracing the open opportunity

Neuroeconomics, Judgment, and Decision Making


New edited book by Evan A. Wilhelms, and Valerie F. Reyna: “This volume explores how and why people make judgments and decisions that have economic consequences, and what the implications are for human well-being. It provides an integrated review of the latest research from many different disciplines, including social, cognitive, and developmental psychology; neuroscience and neurobiology; and economics and business.

The book has six areas of focus: historical foundations; cognitive consistency and inconsistency; heuristics and biases; neuroeconomics and neurobiology; developmental and individual differences; and improving decisions. Throughout, the contributors draw out implications from traditional behavioral research as well as evidence from neuroscience. In recent years, neuroscientific methods have matured, beyond being simply correlational and descriptive, into theoretical prediction and explanation, and this has opened up many new areas of discovery about economic behavior that are reviewed in the book. In the final part, there are applications of the research to cognitive development, individual differences, and the improving of decisions.
The book takes a broad perspective and is written in an accessible way so as to reach a wide audience of advanced students and researchers interested in behavioral economics and related areas. This includes neuroscientists, neuropsychologists, clinicians, psychologists (developmental, social, and cognitive), economists and other social scientists; legal scholars and criminologists; professionals in public health and medicine; educators; evidence-based practitioners; and policy-makers.”

Social Network Sites as a Mode to Collect Health Data: A Systematic Review


New paper by Fahdah Alshaikh, et al, in J Med Internet Research: “Background: To date, health research literature has focused on social network sites (SNS) either as tools to deliver health care, to study the effect of these networks on behavior, or to analyze Web health content. Less is known about the effectiveness of these sites as a method for collecting data for health research and the means to use such powerful tools in health research.
Objective: The objective of this study was to systematically review the available literature and explore the use of SNS as a mode of collecting data for health research. The review aims to answer four questions: Does health research employ SNS as method for collecting data? Is data quality affected by the mode of data collection? What types of participants were reached by SNS? What are the strengths and limitations of SNS?
Methods: The literature was reviewed systematically in March 2013 by searching the databases MEDLINE, Embase, and PsycINFO, using the Ovid and PubMed interface from 1996 to the third week of March 2013. The search results were examined by 2 reviewers, and exclusion, inclusion, and quality assessment were carried out based on a pre-set protocol.
Results: The inclusion criteria were met by 10 studies and results were analyzed descriptively to answer the review questions. There were four main results. (1) SNS have been used as a data collection tool by health researchers; all but 1 of the included studies were cross-sectional and quantitative. (2) Data quality indicators that were reported include response rate, cost, timeliness, missing data/completion rate, and validity. However, comparison was carried out only for response rate and cost as it was unclear how other reported indicators were measured. (3) The most targeted population were females and younger people. (4) All studies stated that SNS is an effective recruitment method but that it may introduce a sampling bias.
Conclusions: SNS has a role in health research, but we need to ascertain how to use it effectively without affecting the quality of research. The field of SNS is growing rapidly, and it is necessary to take advantage of the strengths of this tool and to avoid its limitations by effective research design. This review provides an important insight for scholars who plan to conduct research using SNS.”

How Citizen Scientists Are Using The Web to Track the Natural World


Yale Environment 360: “By making the recording and sharing of environmental data easier than ever, web-based technology has fostered the rapid growth of so-called citizen scientists — volunteers who collaborate with scientists to collect and interpret data. Numerous Internet-based projects now make use of citizen scientists to monitor environmental health and to track sensitive plant and wildlife populations. From counting butterflies, frogs, and bats across the globe, to piloting personal drones capable of high-definition infrared imaging, citizen scientists are playing a crucial role in collecting data that will help researchers understand the environment. Here is a sampling of some of these projects. ”
View the gallery.

To improve quality, value of patient data, get them involved, study says


Joseph Conn at Vital Signs: “The key to the future use of patient-generated data is focusing on data that patients want to produce, own and use and making it easy for them to produce it.
At least, that’s the take of four co-authors from Duke University in an article in this month’s healthcare policy journal Health Affairs. The July issue is chockablock with articles on the many forms and uses of Big Data.
“We observe that the key to high-quality, patient-generated data is to have immediate and actionable data so that patients experience the importance of the data for their own care as well as research purposes,” the authors said in “Assessing the Value of Patient-Generated Data to Comparative Effectiveness Research.”
Patient-generated data, which the authors describe as patient-reported outcomes, or PRO will be “critical to developing the evidence base that informs decisions made by patients, providers and policymakers in pursuit of high-value medical care,” they predict.
“The easier it is for patients and clinicians to navigate (personal data) the more relevant that information will be to patient care, the more invested patients and clinics will be in contributing high-quality data, and the better the data in the big-data ecosystem will be,” they write.
“Analysis show that data quality improves over time and that the amount of missing data declines as patients experience the attention to their symptoms and actions that result from the information they provide,” the authors say…”