Real-time flu tracking. By monitoring social media, scientists can monitor outbreaks as they happen.


Charles Schmidt at Nature: “Conventional influenza surveillance describes outbreaks of flu that have already happened. It is based on reports from doctors, and produces data that take weeks to process — often leaving the health authorities to chase the virus around, rather than get on top of it.

But every day, thousands of unwell people pour details of their symptoms and, perhaps unknowingly, locations into search engines and social media, creating a trove of real-time flu data. If such data could be used to monitor flu outbreaks as they happen and to make accurate predictions about its spread, that could transform public-health surveillance.

Powerful computational tools such as machine learning and a growing diversity of data streams — not just search queries and social media, but also cloud-based electronic health records and human mobility patterns inferred from census information — are making it increasingly possible to monitor the spread of flu through the population by following its digital signal. Now, models that track flu in real time and forecast flu trends are making inroads into public-health practice.

“We’re becoming much more comfortable with how these models perform,” says Matthew Biggerstaff, an epidemiologist who works on flu preparedness at the US Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia.

In 2013–14, the CDC launched the FluSight Network, a website informed by digital modelling that predicts the timing, peak and short-term intensity of the flu season in ten regions of the United States and across the whole country. According to Biggerstaff, flu forecasting helps responders to plan ahead, so they can be ready with vaccinations and communication strategies to limit the effects of the virus. Encouraged by progress in the field, the CDC announced in January 2019 that it will spend US$17.5 million to create a network of influenza-forecasting centres of excellence, each tasked with improving the accuracy and communication of real-time forecasts.

The CDC is leading the way on digital flu surveillance, but health agencies elsewhere are following suit. “We’ve been working to develop and apply these models with collaborators using a range of data sources,” says Richard Pebody, a consultant epidemiologist at Public Health England in London. The capacity to predict flu trajectories two to three weeks in advance, Pebody says, “will be very valuable for health-service planning.”…(More)”.

The Internet Relies on People Working for Free


Owen Williams at OneZero: “When you buy a product like Philips Hue’s smart lights or an iPhone, you probably assume the people who wrote their code are being paid. While that’s true for those who directly author a product’s software, virtually every tech company also relies on thousands of bits of free code, made available through “open-source” projects on sites like GitHub and GitLab.

Often these developers are happy to work for free. Writing open-source software allows them to sharpen their skills, gain perspectives from the community, or simply help the industry by making innovations available at no cost. According to Google, which maintains hundreds of open-source projects, open source “enables and encourages collaboration and the development of technology, solving real-world problems.”

But when software used by millions of people is maintained by a community of people, or a single person, all on a volunteer basis, sometimes things can go horribly wrong. The catastrophic Heartbleed bug of 2014, which compromised the security of hundreds of millions of sites, was caused by a problem in an open-source library called OpenSSL, which relied on a single full-time developer not making a mistake as they updated and changed that code, used by millions. Other times, developers grow bored and abandon their projects, which can be breached while they aren’t paying attention.

It’s hard to demand that programmers who are working for free troubleshoot problems or continue to maintain software that they’ve lost interest in for whatever reason — though some companies certainly try. Not adequately maintaining these projects, on the other hand, makes the entire tech ecosystem weaker. So some open-source programmers are asking companies to pay, not for their code, but for their support services….(More)”.

The promise and peril of a digital ecosystem for the planet


Blog post by Jillian Campbell and David E Jensen: “A range of frontier and digital technologies have dramatically boosted the ways in which we can monitor the health of our planet. And sustain our future on it (Figure 1).

Figure 1. A range of frontier an digital technologies can be combined to monitor our planet and the sustainable use of natural resources (1)

If we can leverage this technology effectively, we will be able to assess and predict risks, increase transparency and accountability in the management of natural resources and inform markets as well as consumer choice. These actions are all required if we are to stand a better chance of achieving the Sustainable Development Goals (SDGs).

However, for this vision to become a reality, public and private sector actors must take deliberate action and collaborate to build a global digital ecosystem for the planet — one consisting of data, infrastructure, rapid analytics, and real-time insights. We are now at a pivotal moment in the history of our stewardship of this planet. A “tipping point” of sorts. And in order to guide the political action which is required to counter the speed, scope and severity of the environmental and climate crises, we must acquire and deploy these data sets and frontier technologies. Doing so can fundamentally change our economic trajectory and underpin a sustainable future.

This article shows how such a global digital ecosystem for the planet can be achieved — as well as what we risk if we do not take decisive action within the next 12 months….(More)”.

How big data can affect your bank account – and life


Alena Buyx, Barbara Prainsack and Aisling McMahon at The Conversation: “Mustafa loves good coffee. In his free time, he often browses high-end coffee machines that he cannot currently afford but is saving for. One day, travelling to a friend’s wedding abroad, he gets to sit next to another friend on the plane. When Mustafa complains about how much he paid for his ticket, it turns out that his friend paid less than half of what he paid, even though they booked around the same time.

He looks into possible reasons for this and concludes that it must be related to his browsing of expensive coffee machines and equipment. He is very angry about this and complains to the airline, who send him a lukewarm apology that refers to personalised pricing models. Mustafa feels that this is unfair but does not challenge it. Pursuing it any further would cost him time and money.

This story – which is hypothetical, but can and does occur – demonstrates the potential for people to be harmed by data use in the current “big data” era. Big data analytics involves using large amounts of data from many sources which are linked and analysed to find patterns that help to predict human behaviour. Such analysis, even when perfectly legal, can harm people.

Mustafa, for example, has likely been affected by personalised pricing practices whereby his search for high-end coffee machines has been used to make certain assumptions about his willingness to pay or buying power. This in turn may have led to his higher priced airfare. While this has not resulted in serious harm in Mustafa’s case, instances of serious emotional and financial harm are, unfortunately, not rare, including the denial of mortgages for individuals and risks to a person’s general credit worthiness based on associations with other individuals. This might happen if an individual shares some similar characteristics to other individuals who have poor repayment histories….(More)”.

JPMorgan Creates ‘Volfefe’ Index to Track Trump Tweet Impact


Tracy Alloway at Bloomberg: “Two of the largest Wall Street banks are trying to measure the market impact of Donald Trump’s tweets.

Analysts at JPMorgan Chase & Co. have created an index to quantify what they say are the growing effects on U.S. bond yields. Citigroup Inc.’s foreign exchange team, meanwhile, report that these micro-blogging missives are also becoming “increasingly relevant” to foreign-exchange moves.

JPMorgan’s “Volfefe Index,” named after Trump’s mysterious covfefe tweet from May 2017, suggests that the president’s electronic musings are having a statistically significant impact on Treasury yields. The number of market-moving Trump tweets has ballooned in the past month, with those including words such as “China,” “billion,” “products,” “Democrats” and “great” most likely to affect prices, the analysts found….

JPMorgan’s analysis looked at Treasury yields in the five minutes after a Trump tweet, and the index shows the rolling one-month probability that each missive is market-moving.

They found that the Volfefe Index can account for a “measurable fraction” of moves in implied volatility, seen in interest rate derivatives known as swaptions. That’s particularly apparent at the shorter end of the curve, with two- and five-year rates more impacted than 10-year securities.

Meanwhile, Citi’s work shows that the president’s tweets are generally followed by a stretch of higher volatility across global currency markets. And there’s little sign traders are growing numb to these messages….(More)”

How Should Scientists’ Access To Health Databanks Be Managed?


Richard Harris at NPR: “More than a million Americans have donated genetic information and medical data for research projects. But how that information gets used varies a lot, depending on the philosophy of the organizations that have gathered the data.

Some hold the data close, while others are working to make the data as widely available to as many researchers as possible — figuring science will progress faster that way. But scientific openness can be constrained b y both practical and commercial considerations.

Three major projects in the United States illustrate these differing philosophies.

VA scientists spearhead research on veterans database

The first project involves three-quarters of a million veterans, mostly men over age 60. Every day, 400 to 500 blood samples show up in a modern lab in the basement of the Veterans Affairs hospital in Boston. Luis Selva, the center’s associate director, explains that robots extract DNA from the samples and then the genetic material is sent out for analysis….

Intermountain Healthcare teams with deCODE genetics

Our second example involves what is largely an extended family: descendants of settlers in Utah, primarily from the Church of Jesus Christ of Latter-day Saints. This year, Intermountain Healthcare in Utah announced that it was going to sequence the complete DNA of half a million of its patients, resulting in what the health system says will be the world’s largest collection of complete genomes….

NIH’s All of Us aims to diversify and democratize research

Our third and final example is an effort by the National Institutes of Health to recruit a million Americans for a long-term study of health, behavior and genetics. Its philosophy sharply contrasts with that of Intermountain Health.

“We do have a very strong goal around diversity, in making sure that the participants in the All of Us research program reflect the vast diversity of the United States,” says Stephanie Devaney, the program’s deputy director….(More)”.

Raw data won’t solve our problems — asking the right questions will


Stefaan G. Verhulst in apolitical: “If I had only one hour to save the world, I would spend fifty-five minutes defining the questions, and only five minutes finding the answers,” is a famous aphorism attributed to Albert Einstein.

Behind this quote is an important insight about human nature: Too often, we leap to answers without first pausing to examine our questions. We tout solutions without considering whether we are addressing real or relevant challenges or priorities. We advocate fixes for problems, or for aspects of society, that may not be broken at all.

This misordering of priorities is especially acute — and represents a missed opportunity — in our era of big data. Today’s data has enormous potential to solve important public challenges.

However, policymakers often fail to invest in defining the questions that matter, focusing mainly on the supply side of the data equation (“What data do we have or must have access to?”) rather than the demand side (“What is the core question and what data do we really need to answer it?” or “What data can or should we actually use to solve those problems that matter?”).

As such, data initiatives often provide marginal insights while at the same time generating unnecessary privacy risks by accessing and exploring data that may not in fact be needed at all in order to address the root of our most important societal problems.

A new science of questions

So what are the truly vexing questions that deserve attention and investment today? Toward what end should we strategically seek to leverage data and AI?

The truth is that policymakers and other stakeholders currently don’t have a good way of defining questions or identifying priorities, nor a clear framework to help us leverage the potential of data and data science toward the public good.

This is a situation we seek to remedy at The GovLab, an action research center based at New York University.

Our most recent project, the 100 Questions Initiative, seeks to begin developing a new science and practice of questions — one that identifies the most urgent questions in a participatory manner. Launched last month, the goal of this project is to develop a process that takes advantage of distributed and diverse expertise on a range of given topics or domains so as to identify and prioritize those questions that are high impact, novel and feasible.

Because we live in an age of data and much of our work focuses on the promises and perils of data, we seek to identify the 100 most pressing problems confronting the world that could be addressed by greater use of existing, often inaccessible, datasets through data collaboratives – new forms of cross-disciplinary collaboration beyond public-private partnerships focused on leveraging data for good….(More)”.

Real-time maps warn Hong Kong protesters of water cannons and riot police


Mary Hui at Quartz: “The “Be Water” nature of Hong Kong’s protests means that crowds move quickly and spread across the city. They might stage a protest in the central business district one weekend, then industrial neighborhoods and far-flung suburban towns the next. And a lot is happening at any one time at each protest. One of the key difficulties for protesters is to figure out what’s happening in the crowded, fast-changing, and often chaotic circumstances.

Citizen-led efforts to map protests in real-time are an attempt to address those challenges and answer some pressing questions for protesters and bystanders alike: Where should they go? Where have tear gas and water cannons been deployed? Where are police advancing, and are there armed thugs attacking civilians?

One of the most widely used real-time maps of the protests is HKMap.live, a volunteer-run and crowdsourced effort that officially launched in early August. It’s a dynamic map of Hong Kong that users can zoom in and out of, much like Google Maps. But in addition to detailed street and building names, this one features various emoji to communicate information at a glance: a dog for police, a worker in a yellow hardhat for protesters, a dinosaur for the police’s black-clad special tactical squad, a white speech-bubble for tear gas, two exclamation marks for danger.

HKMap during a protest on August 31, 2019.

Founded by a finance professional in his 20s and who only wished to be identified as Kuma, HKMap is an attempt to level the playing field between protesters and officers, he said in an interview over chat app Telegram. While earlier on in the protest movement people relied on text-based, on-the-ground  live updates through public Telegram channels, Kuma found these to be too scattered to be effective, and hard to visualize unless someone knew the particular neighborhood inside out.

“The huge asymmetric information between protesters and officers led to multiple occasions of surround and capture,” said Kuma. Passersby and non-frontline protesters could also make use of the map, he said, to avoid tense conflict zones. After some of his friends were arrested in late July, he decided to build HKMap….(More)”.

Study finds Big Data eliminates confidentiality in court judgements


Swissinfo: “Swiss researchers have found that algorithms that mine large swaths of data can eliminate anonymity in federal court rulings. This could have major ramifications for transparency and privacy protection.

This is the result of a study by the University of Zurich’s Institute of Law, published in the legal journal “Jusletter” and shared by Swiss public television SRF on Monday.

The study relied on a “web scraping technique” or mining of large swaths of data. The researchers created a database of all decisions of the Supreme Court available online from 2000 to 2018 – a total of 122,218 decisions. Additional decisions from the Federal Administrative Court and the Federal Office of Public Health were also added.

Using an algorithm and manual searches for connections between data, the researchers were able to de-anonymise, in other words reveal identities, in 84% of the judgments in less than an hour.

In this specific study, the researchers were able to identify the pharma companies and medicines hidden in the documents of the complaints filed in court.  

Study authors say that this could have far-reaching consequences for transparency and privacy. One of the study’s co-authors Kerstin Noëlle Vokinger, professor of law at the University of Zurich explains that, “With today’s technological possibilities, anonymisation is no longer guaranteed in certain areas”. The researchers say the technique could be applied to any publicly available database.

Vokinger added there is a need to balance necessary transparency while safeguarding the personal rights of individuals.

Adrian Lobsiger, the Swiss Federal Data Protection Commissioner, told SRF that this confirms his view that facts may need to be treated as personal data in the age of technology….(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.