Stefaan Verhulst
Richard Harris at NPR: “More than a million Americans have donated genetic information and medical data for research projects. But how that information gets used varies a lot, depending on the philosophy of the organizations that have gathered the data.
Some hold the data close, while others are working to make the data as widely available to as many researchers as possible — figuring science will progress faster that way. But scientific openness can be constrained b y both practical and commercial considerations.
Three major projects in the United States illustrate these differing philosophies.
VA scientists spearhead research on veterans database
The first project involves three-quarters of a million veterans, mostly men over age 60. Every day, 400 to 500 blood samples show up in a modern lab in the basement of the Veterans Affairs hospital in Boston. Luis Selva, the center’s associate director, explains that robots extract DNA from the samples and then the genetic material is sent out for analysis….
Intermountain Healthcare teams with deCODE genetics
Our second example involves what is largely an extended family: descendants of settlers in Utah, primarily from the Church of Jesus Christ of Latter-day Saints. This year, Intermountain Healthcare in Utah announced that it was going to sequence the complete DNA of half a million of its patients, resulting in what the health system says will be the world’s largest collection of complete genomes….
NIH’s All of Us aims to diversify and democratize research
Our third and final example is an effort by the National Institutes of Health to recruit a million Americans for a long-term study of health, behavior and genetics. Its philosophy sharply contrasts with that of Intermountain Health.
“We do have a very strong goal around diversity, in making sure that the participants in the All of Us research program reflect the vast diversity of the United States,” says Stephanie Devaney, the program’s deputy director….(More)”.
Stefaan G. Verhulst in apolitical: “If I had only one hour to save the world, I would spend fifty-five minutes defining the questions, and only five minutes finding the answers,” is a famous aphorism attributed to Albert Einstein.
Behind this quote is an important insight about human nature: Too often, we leap to answers without first pausing to examine our questions. We tout solutions without considering whether we are addressing real or relevant challenges or priorities. We advocate fixes for problems, or for aspects of society, that may not be broken at all.
This misordering of priorities is especially acute — and represents a missed opportunity — in our era of big data. Today’s data has enormous potential to solve important public challenges.
However, policymakers often fail to invest in defining the questions that matter, focusing mainly on the supply side of the data equation (“What data do we have or must have access to?”) rather than the demand side (“What is the core question and what data do we really need to answer it?” or “What data can or should we actually use to solve those problems that matter?”).
As such, data initiatives often provide marginal insights while at the same time generating unnecessary privacy risks by accessing and exploring data that may not in fact be needed at all in order to address the root of our most important societal problems.
A new science of questions
So what are the truly vexing questions that deserve attention and investment today? Toward what end should we strategically seek to leverage data and AI?
The truth is that policymakers and other stakeholders currently don’t have a good way of defining questions or identifying priorities, nor a clear framework to help us leverage the potential of data and data science toward the public good.
This is a situation we seek to remedy at The GovLab, an action research center based at New York University.
Our most recent project, the 100 Questions Initiative, seeks to begin developing a new science and practice of questions — one that identifies the most urgent questions in a participatory manner. Launched last month, the goal of this project is to develop a process that takes advantage of distributed and diverse expertise on a range of given topics or domains so as to identify and prioritize those questions that are high impact, novel and feasible.
Because we live in an age of data and much of our work focuses on the promises and perils of data, we seek to identify the 100 most pressing problems confronting the world that could be addressed by greater use of existing, often inaccessible, datasets through data collaboratives – new forms of cross-disciplinary collaboration beyond public-private partnerships focused on leveraging data for good….(More)”.
Data across Sectors for Health: “Data sharing between organizations addressing social risk factors has the potential to amplify impact by increasing direct service capacity and efficiency. Unfortunately, the risks of and restrictions on sharing personal data often limit this potential, and adherence to regulations such as HIPAA and FERPA can make data sharing a significant challenge.
DASH CIC-START awardee Restore Hope Ministries worked with Asemio to utilize technology that allows for the analysis of personally identifiable information while preserving clients’ privacy. The collaboration shared their findings in a new white paper that describes the process of using multi-party computation technology to answer questions that can aid service providers in exploring the barriers that underserved populations may be facing. The first question they asked: what is the overlap of populations served by two distinct organizations? The results of the overlap analysis confirmed that a significant opportunity exists to increase access to services for a subset of individuals through better outreach…(More)”
Hannah Fry at The New Yorker: “Harold Eddleston, a seventy-seven-year-old from Greater Manchester, was still reeling from a cancer diagnosis he had been given that week when, on a Saturday morning in February, 1998, he received the worst possible news. He would have to face the future alone: his beloved wife had died unexpectedly, from a heart attack.
Eddleston’s daughter, concerned for his health, called their family doctor, a well-respected local man named Harold Shipman. He came to the house, sat with her father, held his hand, and spoke to him tenderly. Pushed for a prognosis as he left, Shipman replied portentously, “I wouldn’t buy him any Easter eggs.” By Wednesday, Eddleston was dead; Dr. Shipman had murdered him.
Harold Shipman was one of the most prolific serial killers in history. In a twenty-three-year career as a mild-mannered and well-liked family doctor, he injected at least two hundred and fifteen of his patients with lethal doses of opiates. He was finally arrested in September, 1998, six months after Eddleston’s death.
David Spiegelhalter, the author of an important and comprehensive new book, “The Art of Statistics” (Basic), was one of the statisticians tasked by the ensuing public inquiry to establish whether the mortality rate of Shipman’s patients should have aroused suspicion earlier. Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s excess mortality—the number of his older patients who had died in the course of his career over the number that would be expected of an average doctor’s—was a hundred and seventy-four women and forty-nine men at the time of his arrest. The total closely matched the number of victims confirmed by the inquiry….
In 1825, the French Ministry of Justice ordered the creation of a national collection of crime records. It seems to have been the first of its kind anywhere in the world—the statistics of every arrest and conviction in the country, broken down by region, assembled and ready for analysis. It’s the kind of data set we take for granted now, but at the time it was extraordinarily novel. This was an early instance of Big Data—the first time that mathematical analysis had been applied in earnest to the messy and unpredictable realm of human behavior.
Or maybe not so unpredictable. In the early eighteen-thirties, a Belgian astronomer and mathematician named Adolphe Quetelet analyzed the numbers and discovered a remarkable pattern. The crime records were startlingly consistent. Year after year, irrespective of the actions of courts and prisons, the number of murders, rapes, and robberies reached almost exactly the same total. There is a “terrifying exactitude with which crimes reproduce themselves,” Quetelet said. “We know in advance how many individuals will dirty their hands with the blood of others. How many will be forgers, how many poisoners.”
To Quetelet, the evidence suggested that there was something deeper to discover. He developed the idea of a “Social Physics,” and began to explore the possibility that human lives, like planets, had an underlying mechanistic trajectory. There’s something unsettling in the idea that, amid the vagaries of choice, chance, and circumstance, mathematics can tell us something about what it is to be human. Yet Quetelet’s overarching findings still stand: at some level, human life can be quantified and predicted. We can now forecast, with remarkable accuracy, the number of women in Germany who will choose to have a baby each year, the number of car accidents in Canada, the number of plane crashes across the Southern Hemisphere, even the number of people who will visit a New York City emergency room on a Friday evening….(More)”
Karin Wulf at the Washington Post: “We are at a distinctive point in the relationship between information and democracy: As the volume of information dissemination has grown, so too have attempts by individuals and groups to weaponize disinformation for commercial and political purposes. This has contributed to fragmentation, political polarization, cynicism, and distrust in institutions and expertise, as a recent Pew Research Center report found. So what is the solution?
Footnotes.
Outside of academics and lawyers, few people may think about footnotes once they leave school. Indeed, there is a hackneyed caricature about footnotes as pedantry, the purview of tweedy scholars blinking as we emerge from fluorescent-lit libraries into the sun — not the concern of regular folks. A recent essay in the Economist even laid some of Britain’s recent woes at the feet of historians who spend too much time “fiddling with footnotes.”
But nothing could be further from the truth. More than ever, we need what this tool provides: accountability and transparency. “Fiddling with footnotes” is the kind of hygienic practice that our era of information pollution needs — and needs to be shared as widely as possible. Footnotes are for everyone.
Though they began as an elite practice, footnotes became aligned historically with modern democracy itself. Citation is rooted in the 17th-century emergence of enlightenment science, which asked for evidence rather than faith as key to supporting a conclusion. It was an era when scientific empiricism threatened the authority of government and religious institutions and newly developing institutional science publications, the Philosophical Transactions of the Royal Society, for example, began to use citations for evidence and reference. In one of Isaac Newton’s contributions to the journal in 1673, a reply to queries about his work on light and the color spectrum, he used citations to his initial publication on the subject (“see no. 80. Page 3075”).
By the 18th century, and with more agile printing, the majority of scientific publications included citations, and the bottom of the page was emerging as the preferred placement. Where scientific scholarship traveled, humanists were not far behind. The disdain of French philosopher and mathematician René Descartes for any discipline without rigorous methods was part of the prompt for historians to embrace citations….(More)”.
Mary Hui at Quartz: “The “Be Water” nature of Hong Kong’s protests means that crowds move quickly and spread across the city. They might stage a protest in the central business district one weekend, then industrial neighborhoods and far-flung suburban towns the next. And a lot is happening at any one time at each protest. One of the key difficulties for protesters is to figure out what’s happening in the crowded, fast-changing, and often chaotic circumstances.
Citizen-led efforts to map protests in real-time are an attempt to address those challenges and answer some pressing questions for protesters and bystanders alike: Where should they go? Where have tear gas and water cannons been deployed? Where are police advancing, and are there armed thugs attacking civilians?
One of the most widely used real-time maps of the protests is HKMap.live, a volunteer-run and crowdsourced effort that officially launched in early August. It’s a dynamic map of Hong Kong that users can zoom in and out of, much like Google Maps. But in addition to detailed street and building names, this one features various emoji to communicate information at a glance: a dog for police, a worker in a yellow hardhat for protesters, a dinosaur for the police’s black-clad special tactical squad, a white speech-bubble for tear gas, two exclamation marks for danger.

Founded by a finance professional in his 20s and who only wished to be identified as Kuma, HKMap is an attempt to level the playing field between protesters and officers, he said in an interview over chat app Telegram. While earlier on in the protest movement people relied on text-based, on-the-ground live updates through public Telegram channels, Kuma found these to be too scattered to be effective, and hard to visualize unless someone knew the particular neighborhood inside out.
“The huge asymmetric information between protesters and officers led to multiple occasions of surround and capture,” said Kuma. Passersby and non-frontline protesters could also make use of the map, he said, to avoid tense conflict zones. After some of his friends were arrested in late July, he decided to build HKMap….(More)”.
Paper by Emily S. Rempel, Julie Barnett and Hannah Durrant: “This study examines the hidden assumptions around running public-engagement exercises in government. We study an example of public engagement on the ethics of combining and analysing data in national government – often called data science ethics. We study hidden assumptions, drawing on hidden curriculum theories in education research, as it allows us to identify conscious and unconscious underlying processes related to conducting public engagement that may impact results. Through participation in the 2016 Public Dialogue for Data Science Ethics in the UK, four key themes were identified that exposed underlying public engagement norms. First, that organizers had constructed a strong imagined public as neither overly critical nor supportive, which they used to find and engage participants. Second, that official aims of the engagement, such as including publics in developing ethical data regulations, were overshadowed by underlying meta-objectives, such as counteracting public fears. Third, that advisory group members, organizers and publics understood the term ‘engagement’ in varying ways, from creating interest to public inclusion. And finally, that stakeholder interests, particularly government hopes for a positive report, influenced what was written in the final report. Reflection on these underlying mechanisms, such as the development of meta-objectives that seek to benefit government and technical stakeholders rather than publics, suggests that the practice of public engagement can, in fact, shut down opportunities for meaningful public dialogue….(More)”.
Book by Gary Smith and Jay Cordes: “Data science has never had more influence on the world. Large companies are now seeing the benefit of employing data scientists to interpret the vast amounts of data that now exists. However, the field is so new and is evolving so rapidly that the analysis produced can be haphazard at best.
The 9 Pitfalls of Data Science shows us real-world examples of what can go wrong. Written to be an entertaining read, this invaluable guide investigates the all too common mistakes of data scientists – who can be plagued by lazy thinking, whims, hunches, and prejudices – and indicates how they have been at the root of many disasters, including the Great Recession.
Gary Smith and Jay Cordes emphasise how scientific rigor and critical thinking skills are indispensable in this age of Big Data, as machines often find meaningless patterns that can lead to dangerous false conclusions. The 9 Pitfalls of Data Science is loaded with entertaining tales of both successful and misguided approaches to interpreting data, both grand successes and epic failures. These cautionary tales will not only help data scientists be more effective, but also help the public distinguish between good and bad data science….(More)”.
Claire Wardle at Scientific American: “…Online misinformation has been around since the mid-1990s. But in 2016 several events made it broadly clear that darker forces had emerged: automation, microtargeting and coordination were fueling information campaigns designed to manipulate public opinion at scale. Journalists in the Philippines started raising flags as Rodrigo Duterte rose to power, buoyed by intensive Facebook activity. This was followed by unexpected results in the Brexit referendum in June and then the U.S. presidential election in November—all of which sparked researchers to systematically investigate the ways in which information was being used as a weapon.
During the past three years the discussion around the causes of our polluted information ecosystem has focused almost entirely on actions taken (or not taken) by the technology companies. But this fixation is too simplistic. A complex web of societal shifts is making people more susceptible to misinformation and conspiracy. Trust in institutions is falling because of political and economic upheaval, most notably through ever widening income inequality. The effects of climate change are becoming more pronounced. Global migration trends spark concern that communities will change irrevocably. The rise of automation makes people fear for their jobs and their privacy.
Bad actors who want to deepen existing tensions understand these societal trends, designing content that they hope will so anger or excite targeted users that the audience will become the messenger. The goal is that users will use their own social capital to reinforce and give credibility to that original message.
Most of this content is designed not to persuade people in any particular direction but to cause confusion, to overwhelm and to undermine trust in democratic institutions from the electoral system to journalism. And although much is being made about preparing the U.S. electorate for the 2020 election, misleading and conspiratorial content did not begin with the 2016 presidential race, and it will not end after this one. As tools designed to manipulate and amplify content become cheaper and more accessible, it will be even easier to weaponize users as unwitting agents of disinformation….(More)”.

Book by Margaret Doyle and Nick O’Brien: “This book reconnects everyday justice with social rights. It rediscovers human rights in the ‘small places’ of housing, education, health and social care, where administrative justice touches the citizen every day, and in doing so it re-imagines administrative justice and expands its democratic reach. The institutions of everyday justice – ombuds, tribunals and mediation – rarely herald their role in human rights frameworks, and never very loudly. For the most part, human rights and administrative justice are ships that pass in the night. Drawing on design theory, the book proposes to remedy this alienation by replacing current orthodoxies, not least that of ‘user focus’, with more promising design principles of community, network and openness. Thus re-imagined, the future of both administrative justice and social rights is demosprudential, firmly rooted in making response to citizen grievance more democratic and embedding legal change in the broader culture….(More)”.