Navigating the Health Data Ecosystem


New book on O’Reilly Media on “The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine”: “Data-driven technologies are now being adopted, developed, funded, and deployed throughout the health care market at an unprecedented scale. But, as this O’Reilly report reveals, health care innovation contains more hurdles and requires more finesse than many tech startups expect. By paying attention to the lessons from the report’s findings, innovation teams can better anticipate what they’ll face, and plan accordingly.

Simply put, teams looking to apply collective intelligence and “big data” platforms to health and health care problems often don’t appreciate the messy details of using and making sense of data in the heavily regulated hospital IT environment. Download this report today and learn how it helps prepare startups in six areas:

  1. Complexity: An enormous domain with noisy data not designed for machine consumption
  2. Computing: Lack of standard, interoperable schema for documenting human health in a digital format
  3. Context: Lack of critical contextual metadata for interpreting health data
  4. Culture: Startup difficulties in hospital ecosystems: why innovation can be a two-edged sword
  5. Contracts: Navigating the IRB, HIPAA, and EULA frameworks
  6. Commerce: The problem of how digital health startups get paid

This report represents the initial findings of a study funded by a grant from the Robert Wood Johnson Foundation. Subsequent reports will explore the results of three deep-dive projects the team pursued during the study. (More)”

Delhi trials participatory budget initiative


Medha Basu in FutureGov: “The Delhi government is running a participatory budget exercise to involve citizens in deciding priorities for the 2015 Budget.

The city government, which came into office in February, has set aside INR 5 million (US$78,598) for each neighbourhood and residents will decide what this money gets spent on.

The initiative, Janta Ka Budget (meaning ‘People’s Budget’), will be tested in 400 communities across the city, with the first one launched by Chief Minister Arvind Kejriwal last month.

….

Officials met with residents of the neighbourhood to hear what they would like to see improved in their area. Residents then voted in public meetings to decide the most popular ones.

Officials are expected to come up with cost estimates for the shortlisted projects within a week of the meeting and allocate money from the fund.

In the first session, the shortlisted projects were a library, dispensary, road repairs and CCTV cameras….(More)”

Collective Intelligence or Group Think?


Paper analyzing “Engaging Participation Patterns in World without Oil” by Nassim JafariNaimi and Eric M. Meyers: “This article presents an analysis of participation patterns in an Alternate Reality Game, World Without Oil. This game aims to bring people together in an online environment to reflect on how an oil crisis might affect their lives and communities as a way to both counter such a crisis and to build collective intelligence about responding to it. We present a series of participation profiles based on a quantitative analysis of 1554 contributions to the game narrative made by 322 players. We further qualitatively analyze a sample of these contributions. We outline the dominant themes, the majority of which engage the global oil crisis for its effects on commute options and present micro-sustainability solutions in response. We further draw on the quantitative and qualitative analysis of this space to discuss how the design of the game, specifically its framing of the problem, feedback mechanism, and absence of subject-matter expertise, counter its aim of generating collective intelligence, making it conducive to groupthink….(More)”

Do Experts or Collective Intelligence Write with More Bias? Evidence from Encyclopædia Britannica and Wikipedia.


Working Paper by Shane Greenstein and Feng Zhu.  Which source of information contains greater bias and slant—text written by an expert or that constructed via collective intelligence? Do the costs of acquiring, storing, displaying and revising information shape those differences? We evaluate these questions empirically by examining slanted and biased phrases in content on US political issues from two sources — Encyclopædia Britannica and Wikipedia. Our overall slant measure is less (more) than zero when an article leans towards Democrat (Republican) viewpoints, while bias is the absolute value of the slant. Using a matched sample of pairs of articles from Britannica and Wikipedia, we show that, overall, Wikipedia articles are more slanted towards Democrat than Britannica articles, as well as more biased. Slanted Wikipedia articles tend to become less biased than Britannica articles on the same topic as they become substantially revised, and the bias on a per word basis hardly differs between the sources. These results have implications for the segregation of readers in online sources and the allocation of editorial resources in online sources using collective intelligence…Key concepts include:

  • The costs of producing, storing, and distributing knowledge shape different biases and slants in the collective intelligence (Wikipedia) and the expert-based model (Britannica).
  • Many of the differences between Wikipedia and Britannica arise because Wikipedia faces insignificant storage, production, and distribution costs. This leads to longer articles with greater coverage of more points of view. The number of revisions of Wikipedia articles results in more neutral point of view. In the best cases, it reduces slant and bias to a negligible difference with an expert-based model.
  • As the world moves from reliance on expert-based production of knowledge to collectively-produced intelligence, it is unwise to blindly trust the properties of knowledge produced by the crowd. Their slants and biases are not widely appreciated, nor are the properties of the production model as yet fully understood.”…(More)

Turns Out the Internet Is Bad at Guessing How Many Coins Are in a Jar


Eric B. Steiner at Wired: “A few weeks ago, I asked the internet to guess how many coins were in a huge jar…The mathematical theory behind this kind of estimation game is apparently sound. That is, the mean of all the estimates will be uncannily close to the actual value, every time. James Surowiecki’s best-selling book, Wisdom of the Crowd, banks on this principle, and details several striking anecdotes of crowd accuracy. The most famous is a 1906 competition in Plymouth, England to guess the weight of an ox. As reported by Sir Francis Galton in a letter to Nature, no one guessed the actual weight of the ox, but the average of all 787 submitted guesses was exactly the beast’s actual weight….
So what happened to the collective intelligence supposedly buried in our disparate ignorance?
Most successful crowdsourcing projects are essentially the sum of many small parts: efficiently harvested resources (information, effort, money) courtesy of a large group of contributors. Think Wikipedia, Google search results, Amazon’s Mechanical Turk, and KickStarter.
But a sum of parts does not wisdom make. When we try to produce collective intelligence, things get messy. Whether we are predicting the outcome of an election, betting on sporting contests, or estimating the value of coins in a jar, the crowd’s take is vulnerable to at least three major factors: skill, diversity, and independence.
A certain amount of skill or knowledge in the crowd is obviously required, while crowd diversity expands the number of possible solutions or strategies. Participant independence is important because it preserves the value of individual contributors, which is another way of saying that if everyone copies their neighbor’s guess, the data are doomed.
Failure to meet any one of these conditions can lead to wildly inaccurate answers, information echo, or herd-like behavior. (There is more than a little irony with the herding hazard: The internet makes it possible to measure crowd wisdom and maybe put it to use. Yet because people tend to base their opinions on the opinions of others, the internet ends up amplifying the social conformity effect, thereby preventing an accurate picture of what the crowd actually thinks.)
What’s more, even when these conditions—skill, diversity, independence—are reasonably satisfied, as they were in the coin jar experiment, humans exhibit a whole host of other cognitive biases and irrational thinking that can impede crowd wisdom. True, some bias can be positive; all that Gladwellian snap-judgment stuff. But most biases aren’t so helpful, and can too easily lead us to ignore evidence, overestimate probabilities, and see patterns where there are none. These biases are not vanquished simply by expanding sample size. On the contrary, they get magnified.
Given the last 60 years of research in cognitive psychology, I submit that Galton’s results with the ox weight data were outrageously lucky, and that the same is true of other instances of seemingly perfect “bean jar”-styled experiments….”

The New Thing in Google Flu Trends Is Traditional Data


in the New York Times: “Google is giving its Flu Trends service an overhaul — “a brand new engine,” as it announced in a blog post on Friday.

The new thing is actually traditional data from the Centers for Disease Control and Prevention that is being integrated into the Google flu-tracking model. The goal is greater accuracy after the Google service had been criticized for consistently over-estimating flu outbreaks in recent years.

The main critique came in an analysis done by four quantitative social scientists, published earlier this year in an article in Science magazine, “The Parable of Google Flu: Traps in Big Data Analysis.” The researchers found that the most accurate flu predictor was a data mash-up that combined Google Flu Trends, which monitored flu-related search terms, with the official C.D.C. reports from doctors on influenza-like illness.

The Google Flu Trends team is heeding that advice. In the blog post, written by Christian Stefansen, a Google senior software engineer, wrote, “We’re launching a new Flu Trends model in the United States that — like many of the best performing methods in the literature — takes official CDC flu data into account as the flu season progresses.”

Google’s flu-tracking service has had its ups and downs. Its triumph came in 2009, when it gave an advance signal of the severity of the H1N1 outbreak, two weeks or so ahead of official statistics. In a 2009 article in Nature explaining how Google Flu Trends worked, the company’s researchers did, as the Friday post notes, say that the Google service was not intended to replace official flu surveillance methods and that it was susceptible to “false alerts” — anything that might prompt a surge in flu-related search queries.

Yet those caveats came a couple of pages into the Nature article. And Google Flu Trends became a symbol of the superiority of the new, big data approach — computer algorithms mining data trails for collective intelligence in real time. To enthusiasts, it seemed so superior to the antiquated method of collecting health data that involved doctors talking to patients, inspecting them and filing reports.

But Google’s flu service greatly overestimated the number of cases in the United States in the 2012-13 flu season — a well-known miss — and, according to the research published this year, has persistently overstated flu cases over the years. In the Science article, the social scientists called it “big data hubris.”

Social Collective Intelligence


New book edited by Daniele Miorandi, Vincenzo Maltese, Michael Rovatsos, Anton Nijholt, and James Stewart: “The book focuses on Social Collective Intelligence, a term used to denote a class of socio-technical systems that combine, in a coordinated way, the strengths of humans, machines and collectives in terms of competences, knowledge and problem solving capabilities with the communication, computing and storage capabilities of advanced ICT.
Social Collective Intelligence opens a number of challenges for researchers in both computer science and social sciences; at the same time it provides an innovative approach to solve challenges in diverse application domains, ranging from health to education and organization of work.
The book will provide a cohesive and holistic treatment of Social Collective Intelligence, including challenges emerging in various disciplines (computer science, sociology, ethics) and opportunities for innovating in various application areas.
By going through the book the reader will gauge insight and knowledge into the challenges and opportunities provided by this new, exciting, field of investigation. Benefits for scientists will be in terms of accessing a comprehensive treatment of the open research challenges in a multidisciplinary perspective. Benefits for practitioners and applied researchers will be in terms of access to novel approaches to tackle relevant problems in their field. Benefits for policy-makers and public bodies representatives will be in terms of understanding how technological advances can support them in supporting the progress of society and economy…”

Crowdteaching: Supporting Teaching as Designing in Collective Intelligence Communities


Paper by Mimi Recker, Min Yuan, and Lei Ye in the International Review of Research in Open and Distant Learning: “The widespread availability of high-quality Web-based content offers new potential for supporting teachers as designers of curricula and classroom activities. When coupled with a participatory Web culture and infrastructure, teachers can share their creations as well as leverage from the best that their peers have to offer to support a collective intelligence or crowdsourcing community, which we dub crowdteaching. We applied a collective intelligence framework to characterize crowdteaching in the context of a Web-based tool for teachers called the Instructional Architect (IA). The IA enables teachers to find, create, and share instructional activities (called IA projects) for their students using online learning resources. These IA projects can further be viewed, copied, or adapted by other IA users. This study examines the usage activities of two samples of teachers, and also analyzes the characteristics of a subset of their IA projects. Analyses of teacher activities suggest that they are engaging in crowdteaching processes. Teachers, on average, chose to share over half of their IA projects, and copied some directly from other IA projects. Thus, these teachers can be seen as both contributors to and consumers of crowdteaching processes. In addition, IA users preferred to view IA projects rather than to completely copy them. Finally, correlational results based on an analysis of the characteristics of IA projects suggest that several easily computed metrics (number of views, number of copies, and number of words in IA projects) can act as an indirect proxy of instructionally relevant indicators of the content of IA projects.”

Forget The Wisdom of Crowds; Neurobiologists Reveal The Wisdom Of The Confident


Emerging Technology From the arXiv: “Way back in 1906, the English polymath Francis Galton visited a country fair in which 800 people took part in a contest to guess the weight of a slaughtered ox. After the fair, he collected the guesses and calculated their average which turned out to be 1208 pounds. To Galton’s surprise, this was within 1 per cent of the true weight of 1198 pounds.
This is one of the earliest examples of a phenomenon that has come to be known as the wisdom of the crowd. The idea is that the collective opinion of a group of individuals can be better than a single expert opinion.
This phenomenon is commonplace today on websites such as Reddit in which users vote on the importance of particular stories and the most popular are given greater prominence.
However, anyone familiar with Reddit will know that the collective opinion isn’t always wise. In recent years, researchers have spent a significant amount of time and effort teasing apart the factors that make crowds stupid. One important factor turns out to be the way members of a crowd influence each other.
It turns out that if a crowd offers a wide range of independent estimates, then it is more likely to be wise. But if members of the crowd are influenced in the same way, for example by each other or by some external factor, then they tend to converge on a biased estimate. In this case, the crowd is likely to be stupid.
Today, Gabriel Madirolas and Gonzalo De Polavieja at the Cajal Institute in Madrid, Spain, say they found a way to analyse the answers from a crowd which allows them to remove this kind of bias and so settle on a wiser answer.
The theory behind their work is straightforward. Their idea is that some people are more strongly influenced by additional information than others who are confident in their own opinion. So identifying these more strongly influenced people and separating them from the independent thinkers creates two different groups. The group of independent thinkers is then more likely to give a wise estimate. Or put another way, ignore the wisdom of the crowd in favour of the wisdom of the confident.
So how to identify confident thinkers. Madirolas and De Polavieja began by studying the data from an earlier set of experiments in which groups of people were given tasks such as to estimate the length of the border between Switzerland and Italy, the correct answer being 734 kilometres.
After one task, some groups were shown the combined estimates of other groups before beginning their second task. These experiments clearly showed how this information biased the answers from these groups in their second tasks.
Madirolas and De Polavieja then set about creating a mathematical model of how individuals incorporate this extra information. They assume that each person comes to a final estimate based on two pieces of information: first, their own independent estimate of the length of the border and second, the earlier combined estimate revealed to the group. Each individual decides on a final estimate depending on the weighting they give to each piece of information.
Those people who are heavily biased give a strong weighting to the additional information whereas people who are confident in their own estimate give a small or zero weighting to the additional information.
Madirolas and De Polavieja then take each person’s behaviour and fit it to this model to reveal how independent their thinking has been.
That allows them to divide the groups into independent thinkers and biased thinkers. Taking the collective opinion of the independent thinkers then gives a much more accurate estimate of the length of the border.
“Our results show that, while a simple operation like the mean, median or geometric mean of a group may not allow groups to make good estimations, a more complex operation taking into account individuality in the social dynamics can lead to a better collective intelligence,” they say.

Ref: arxiv.org/abs/1406.7578 : Wisdom of the Confident: Using Social Interactions to Eliminate the Bias in Wisdom of the Crowds”