New flu tracker uses Google search data better than Google


 at ArsTechnica: “With big data comes big noise. Google learned this lesson the hard way with its now kaput Google Flu Trends. The online tracker, which used Internet search data to predict real-life flu outbreaks, emerged amid fanfare in 2008. Then it met a quiet death this August after repeatedly coughing up bad estimates.

But big Internet data isn’t out of the disease tracking scene yet.

With hubris firmly in check, a team of Harvard researchers have come up with a way to tame the unruly data, combine it with other data sets, and continually calibrate it to track flu outbreaks with less error. Their new model, published Monday in the Proceedings of the National Academy of Sciences, out-performs Google Flu Trends and other models with at least double the accuracy. If the model holds up in coming flu seasons, it could reinstate some optimism in using big data to monitor disease and herald a wave of more accurate second-generation models.

Big data has a lot of potential, Samuel Kou, a statistics professor at Harvard University and coauthor on the new study, told Ars. It’s just a question of using the right analytics, he said.

Kou and his colleagues built on Google’s flu tracking model for their new version, called ARGO (AutoRegression with GOogle search data). Google Flu Trends basically relied on trends in Internet search terms, such as headache and chills, to estimate the number of flu cases. Those search terms were correlated with flu outbreak data collected by the Centers for Disease Control and Prevention. The CDC’s data relies on clinical reports from around the country. But compiling and analyzing that data can be slow, leading to a lag time of one to three weeks. The Google data, on the other hand, offered near real-time tracking for health experts to manage and prepare for outbreaks.

At first Google’s tracker appeared to be pretty good, matching CDC data’s late-breaking data somewhat closely. But, two notable stumbles led to its ultimate downfall: an underestimate of the 2009 H1N1 swine flu outbreak and an alarming overestimate (almost double real numbers) of the 2012-2013 flu season’s cases…..For ARGO, he and colleagues took the trend data and then designed a model that could self-correct for changes in how people search. The model has a two-year sliding window in which it re-calibrates current search term trends with the CDC’s historical flu data (the gold standard for flu data). They also made sure to exclude winter search terms, such as March Madness and the Oscars, so they didn’t get accidentally correlated with seasonal flu trends. Last, they incorporated data on the historical seasonality of flu.

The result was a model that significantly out-competed the Google Flu Trends estimates for the period between March 29, 2009 to July 11, 2015. ARGO also beat out other models, including one based on current and historical CDC data….(More)”

See also Proceedings of the National Academy of Sciences, 2015. DOI: 10.1073/pnas.1515373112

How smartphones are solving one of China’s biggest mysteries


Ana Swanson at the Washington Post: “For decades, China has been engaged in a building boom of a scale that is hard to wrap your mind around. In the last three decades, 260 million people have moved from the countryside to Chinese cities — equivalent to around 80 percent of the population of the U.S. To make room for all of those people, the size of China’s built-up urban areas nearly quintupled between 1984 and 2010.

Much of that development has benefited people’s lives, but some has not. In a breathless rush to boost growth and development, some urban areas have built vast, unused real estate projects — China’s infamous “ghost cities.” These eerie, shining developments are complete except for one thing: people to live in them.

China’s ghost cities have sparked a lot of debate over the last few years. Some argue that the developments are evidence of the waste in top-down planning, or the result of too much cheap funding for businesses. Some blame the lack of other good places for average people to invest their money, or the desire of local officials to make a quick buck — land sales generate a lot of revenue for China’s local governments.

Others say the idea of ghost cities has been overblown. They espouse a “build it and they will come” philosophy, pointing out that, with time, some ghost cities fill up and turn into vibrant communities.

It’s been hard to evaluate these claims, since most of the research on ghost cities has been anecdotal. Even the most rigorous research methods leave a lot to be desired — for example, investment research firms sending poor junior employees out to remote locations to count how many lights are turned on in buildings at night.

Now new research from Baidu, one of China’s biggest technology companies, provides one of the first systematic looks at Chinese ghost cities. Researchers from Baidu’s Big Data Lab and Peking University in Beijing used the kind of location data gathered by mobile phones and GPS receivers to track how people moved in and out suspected ghost cities, in real time and on a national scale, over a period of six months. You can see the interactive project here.

Google has been blocked in China for years, and Baidu dominates the market in terms of search, mobile maps and other offerings. That gave the researchers a huge data base to work with —  770 million users, a hefty chunk of China’s 1.36 billion people.

To identify potential ghost cities, the researchers created an algorithm that identifies urban areas with a relatively spare population. They define a ghost city as an urban region with a population of fewer than 5,000 people per square kilometer – about half the density recommended by the Chinese Ministry of Housing and Urban-Rural Development….(More)”

Mobile data: Made to measure


Neil Savage in Nature: “For decades, doctors around the world have been using a simple test to measure the cardiovascular health of patients. They ask them to walk on a hard, flat surface and see how much distance they cover in six minutes. This test has been used to predict the survival rates of lung transplant candidates, to measure the progression of muscular dystrophy, and to assess overall cardiovascular fitness.

The walk test has been studied in many trials, but even the biggest rarely top a thousand participants. Yet when Euan Ashley launched a cardiovascular study in March 2015, he collected test results from 6,000 people in the first two weeks. “That’s a remarkable number,” says Ashley, a geneticist who heads Stanford University’s Center for Inherited Cardiovascular Disease. “We’re used to dealing with a few hundred patients, if we’re lucky.”

Numbers on that scale, he hopes, will tell him a lot more about the relationship between physical activity and heart health. The reason they can be achieved is that millions of people now have smartphones and fitness trackers with sensors that can record all sorts of physical activity. Health researchers are studying such devices to figure out what sort of data they can collect, how reliable those data are, and what they might learn when they analyse measurements of all sorts of day-to-day activities from many tens of thousands of people and apply big-data algorithms to the readings.

By July, more than 40,000 people in the United States had signed up to participate in Ashley’s study, which uses an iPhone application called MyHeart Counts. He expects the numbers to surge as the app becomes more widely available around the world. The study — designed by scientists, approved by institutional review boards, and requiring informed consent — asks participants to answer questions about their health and risk factors, and to use their phone’s motion sensors to collect data about their activities for seven days. They also do a six-minute walk test, and the phone measures the distance they cover. If their own doctors have ordered blood tests, users can enter information such as cholesterol or glucose measurements. Every three months, the app checks back to update their data.

Physicians know that physical activity is a strong predictor of long-term heart health, Ashley says. But it is less clear what kind of activity is best, or whether different groups of people do better with different types of exercise. MyHeart Counts may open a window on such questions. “We can start to look at subgroups and find differences,” he says.

“You can take pretty noisy data, but if you have enough of it, you can find a signal.”

It is the volume of the data that makes such studies possible. In traditional studies, there may not be enough data to find statistically significant results for such subgroups. And rare events may not occur in the smaller samples, or may produce a signal so weak that it is lost in statistical noise. Big data can overcome those problems, and if the data set is big enough, small errors can be smoothed out. “You can take pretty noisy data, but if you have enough of it, you can find a signal,” Ashley says….(More)”.

How big data and The Sims are helping us to build the cities of the future


The Next Web: “By 2050, the United Nations predicts that around 66 percent of the world’s population will be living in urban areas. It is expected that the greatest expansion will take place in developing regions such as Africa and Asia. Cities in these parts will be challenged to meet the needs of their residents, and provide sufficient housing, energy, waste disposal, healthcare, transportation, education and employment.

So, understanding how cities will grow – and how we can make them smarter and more sustainable along the way – is a high priority among researchers and governments the world over. We need to get to grips with the inner mechanisms of cities, if we’re to engineer them for the future. Fortunately, there are tools to help us do this. And even better, using them is a bit like playing SimCity….

Cities are complex systems. Increasingly, scientists studying cities have gone from thinking about “cities as machines”, to approaching “cities as organisms”. Viewing cities as complex, adaptive organisms – similar to natural systems like termite mounds or slime mould colonies – allows us to gain unique insights into their inner workings. …So, if cities are like organisms, it follows that we should examine them from the bottom-up, and seek to understand how unexpected large-scale phenomena emerge from individual-level interactions. Specifically, we can simulate how the behaviour of individual “agents” – whether they are people, households, or organisations – affect the urban environment, using a set of techniques known as “agent-based modelling”….These days, increases in computing power and the proliferation of big datagive agent-based modelling unprecedented power and scope. One of the most exciting developments is the potential to incorporate people’s thoughts and behaviours. In doing so, we can begin to model the impacts of people’s choices on present circumstances, and the future.

For example, we might want to know how changes to the road layout might affect crime rates in certain areas. By modelling the activities of individuals who might try to commit a crime, we can see how altering the urban environment influences how people move around the city, the types of houses that they become aware of, and consequently which places have the greatest risk of becoming the targets of burglary.

To fully realise the goal of simulating cities in this way, models need a huge amount of data. For example, to model the daily flow of people around a city, we need to know what kinds of things people spend their time doing, where they do them, who they do them with, and what drives their behaviour.

Without good-quality, high-resolution data, we have no way of knowing whether our models are producing realistic results. Big data could offer researchers a wealth of information to meet these twin needs. The kinds of data that are exciting urban modellers include:

  • Electronic travel cards that tell us how people move around a city.
  • Twitter messages that provide insight into what people are doing and thinking.
  • The density of mobile telephones that hint at the presence of crowds.
  • Loyalty and credit-card transactions to understand consumer behaviour.
  • Participatory mapping of hitherto unknown urban spaces, such as Open Street Map.

These data can often be refined to the level of a single person. As a result, models of urban phenomena no longer need to rely on assumptions about the population as a whole – they can be tailored to capture the diversity of a city full of individuals, who often think and behave differently from one another….(More)

Open government: a new paradigm in social change?


Rosie Williams: In a recent speech to the Australian and New Zealand School of Government (ANSOG) annual conference, technology journalist and academic Suelette Drefyus explained the growing ‘information asymmetry’ that characterises the current-day relationship between government and citizenry.

According to Dreyfus:

‘Big Data makes government very powerful in its relationship with the citizen. This is even more so with the rise of intelligent systems, software that increasingly trawls, matches and analyses that Big Data. And it is moving toward making more decisions once made by human beings.’

The role of technology in the delivery of government services gives much food for thought in terms of both its implications for potential good and the potential dangers it may pose. The concept of open government is an important one for the future of policy and democracy in Australia. Open government has at its core a recognition that the world has changed, that the ways people engage and who they engage with has transformed in ways that governments around the world must respond to in both technological and policy terms.

As described in the ANSOG speech, the change within government in how it uses technology is well underway, however in many regards we are at the very beginning of understanding and implementing the potential of data and technology in providing solutions to many of our shared problems. Australia’s pending membership of the Open Government Partnership is integral to how Australia responds to these challenges. Membership of the multi-lateral partnership requires the Australian government to create a National Action Plan based on consultation and demonstrate our credentials in the areas of Fiscal Transparency, Access to Information, Income and Asset Disclosure, and Citizen Engagement.

What are the implications of the National Action Plan for policy consultation formulation, implementation and evaluation? In relative terms, Australia’s history with open government is fairly recent. Policies on open data have seen the roll out of data.gov.au – a repository of data published by government agencies and made available for re-use in efforts such as the author’s own financial transparency site OpenAus.

In this way citizen activity and government come together for the purposes of achieving open government. These efforts express a new paradigm in government and activism where the responsibility for solving the problems of democracy are shared between government and the people as opposed to the government ‘solving’ the problems of a passive, receptive citizenry.

As the famous whistle-blowers have shown, citizens are no longer passive but this new capability also requires a consciousness of the responsibilities and accountability that go along with the powers newly developed by citizen activists through technological change.

The opening of data and communication channels in the formulation of public policy provides a way forward to create both a better informed citizenry and also better informed policy evaluation. When new standards of transparency are applied to wicked problems what shortcomings does this highlight?

This question was tested with my recent request for a basic fact missing from relevant government research and reviews but key to social issues of homelessness and domestic violence….(More)”

The Human Face of Big Data


A film by Sandy Smolan [56 minutes]: “Big Data is defined as the real time collection, analyses, and visualization of vast amounts of information. In the hands of Data Scientists this raw information is fueling a revolution which many people believe may have as big an impact on humanity going forward as the Internet has over the past two decades. Its enable us to sense, measure, and understand aspects of our existence in ways never before possible.

The Human Face of Big Data captures an extraordinary revolution sweeping, almost invisibly, through business, academia, government, healthcare, and everyday life. It’s already enabling us to provide a healthier life for our children. To provide our seniors with independence while keeping them safe. To help us conserve precious resources like water and energy. To alert us to tiny changes in our health, weeks or years before we develop a life—threatening illness. To peer into our own individual genetic makeup. To create new forms of life. And soon, as many predict, to re—engineer our own species. And we’ve barely scratched the surface…

This massive gathering and analyzing of data in real time is allowing us to address some of humanities biggest challenges. Yet, as Edward Snowden and the release of the NSA documents has shown, the accessibility of all this data can come at a steep price….(More)”

New Human Need Index fills a data void to help those in need


Scott W. Allard at Brookings: “My 2009 book, “Out of Reach,” examined why it can be hard for poor families to get help from the safety net. One critical barrier is the lack of information about local program resources and nonprofit social service organizations. Good information is key to finding help, but also to important if we are to target resources effectively and assess if program investments were successful.

As I prepared data for the book in 2005, my research team struggled to compile useful information about services and programs in the three major metro areas at the center of the study. We grappled with out-of-date print directories, incomplete online listings, bad addresses, disconnected phone numbers, and inaccurate information about the availability of services. It wasn’t clear families experiencing hardship could easily find the help they needed. It also wasn’t clear how potential volunteers or donors could know where to direct their energies, or whether communities could know whether they were deploying adequate and relevant safety net resources. In the book’s conclusion, however, I was optimistic things would get better. A mix of emerging technology, big data systems, and a generation of young entrepreneurs would certainly close these information gaps over the next several years.

Recently, I embarked upon an effort to again identify the social service organizations operating in one of the book’s original study sites. To my surprise, the work was much harder this time around. Print directories are artifacts of the past. Online referral tools provided only spotty coverage. Addresses and service information can still be quite out of date. In many local communities, it felt as if there was less information available now than a decade ago.

Lack of data about local safety net programs, particularly nonprofit organizations, has long been a problem for scholars, community advocates, nonprofit leaders, and philanthropists. Data about providers and populations served are expensive to collect, update, and disseminate. There are no easy ways to monetize data resources or find regular revenue streams to support data work. There are legal obstacles and important concerns about confidentiality. Many organizations don’t have the resources to do much analytic or learning work.

The result is striking. We spend tens of billions of dollars on social services for low-income households each year, but we have only the vaguest ideas of where those dollars go, what impact they have, and where unmet needs exist.

Into this information void steps the Salvation Army and the Lilly Family School of Philanthropy at Indiana University with a possible path forward. Working together and with an advisory board of scholars, the Salvation Army and the Lilly School have created a real-time Human Needs Index drawn from service provision tracking systems maintained by more than 7,000 Salvation Army sites nationwide. The index provides useful insight into consumption of an array of emergency services (e.g., food, shelter, clothing) at a given place and point in time across the entire country…(More)”

How Big Data Could Open The Financial System For Millions Of People


But that’s changing as the poor start leaving data trails on the Internet and on their cell phones. Now that data can be mined for what it says about someone’s creditworthiness, likeliness to repay, and all that hardcore stuff lenders want to know.

“Every time these individuals make a phone call, send a text, browse the Internet, engage social media networks, or top up their prepaid cards, they deepen the digital footprints they are leaving behind,” says a new report from the Omidyar Network. “These digital footprints are helping to spark a new kind of revolution in lending.”

The report, called “Big Data, Small Credit,” looks at the potential to expand credit access by analyzing mobile and smartphone usage data, utility records, Internet browsing patters and social media behavior….

“In the last few years, a cluster of fast-emerging and innovative firms has begun to use highly predictive technologies and algorithms to interrogate and generate insights from these footprints,” the report says.

“Though these are early days, there is enough to suggest that hundreds of millions of mass-market consumers may not have to remain ‘invisible’ to formal, unsecured credit for much longer.”…(More)

Toward a manifesto for the ‘public understanding of big data’


Mike Michael and Deborah Lupton in Public Understanding of Science: “….we sketch a ‘manifesto’ for the ‘public understanding of big data’. On the one hand, this entails such public understanding of science and public engagement with science and technology–tinged questions as follows: How, when and where are people exposed to, or do they engage with, big data? Who are regarded as big data’s trustworthy sources, or credible commentators and critics? What are the mechanisms by which big data systems are opened to public scrutiny? On the other hand, big data generate many challenges for public understanding of science and public engagement with science and technology: How do we address publics that are simultaneously the informant, the informed and the information of big data? What counts as understanding of, or engagement with, big data, when big data themselves are multiplying, fluid and recursive? As part of our manifesto, we propose a range of empirical, conceptual and methodological exhortations. We also provide Appendix 1 that outlines three novel methods for addressing some of the issues raised in the article….(More)”

Open data, open mind: Why you should share your company data with the world


Mark Samuels at ZDnet: “If information really is the lifeblood of modern organisations, then CIOs could create huge benefits from opening their data to new, creative pairs of eyes. Research from consultant McKinsey suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data: that is, taking previously proprietary data (often starting with public sector data) and opening up access.

So, should your business consider giving outsiders access to insider information? ZDNet speaks to three experts.

More viewpoints can mean better results

Former Tullow Oil CIO Andrew Marks says debates about the potential openness of data in a private sector context are likely to be dominated by one major concern: information security.

“It’s a perfectly reasonable debate until people start thinking about privacy,” he says. “Putting information at risk, both in terms of customer data and competitive advantage, will be a risk too far for many senior executives.”

But what if CIOs could allay c-suite peers’ concerns and create a new opportunity? Marks points to the Goldcorp Challenge, which saw the mining specialist share its proprietary geological data to allow outside experts pick likely spots for mining. The challenge, which included prize money of $575,000 helped identify more than 110 sites, 50 per cent of which were previously unknown to the company. The value of gold found through the competition exceeded $6bn. Marks wonders whether other firms could take similarly brave steps.
“There is a period of time when information is very sensitive,” he says. “Once the value of data starts to become finite, then it might be beneficial for businesses to open the doors and to let outsiders play with the information. That approach, in terms of gamification, might lead to the creation of new ideas and innovations.”…

Marks says these projects help prove that, when it comes to data, more is likely to mean different – and possibly better – results. “Whether using big data algorithms or the human touch, the more viewpoints you bring together, the more you can increases chances of success and reduce risk,” he says.

“There is, therefore, always likely to be value in seeking an alternative perspective. Opening access to data means your firm is going to get more ideas, but CIOs and other senior executives need to think very carefully about what such openness means for the business, and the potential benefits.”….Some leading firms are already taking steps towards openness. Take Christina Scott, chief product and information officer at the Financial Times, who says the media organisation has used data analysts to help push the benefits of information-led insight across the business.

Her team has democratised data in order to make sure that all parts of the organisation can get the information they need to complete their day-to-day jobs. Scott says the approach is best viewed as an open data strategy, but within the safe confines of the existing enterprise firewall. While the tactic is internally focused currently, Scott says the FT is keen to find ways to make the most of external talent in the future.

“We’re starting to consider how we might open data beyond the organisation, too,” she says. “Our data holds a lot of value and insight, including across the metadata we’ve created. So it would be great to think about how we could use that information in a more open way.” Part of the FT’s business includes trade-focused magazines. Scott says opening the data could provide new insight to its B2B customers across a range of sectors. In fact, the firm has already dabbled at a smaller scale.

“We’ve run hackathons, where we’ve exposed our APIs and given people the chance to come up with some new ideas,” she says. “But I don’t think we’ve done as much work on open data as we could. And I think that’s the direction in which better organisations are moving. They recognise that not all innovation is going to happen within the company.”…

CIO Omid Shiraji is another IT expert who recognises that there is a general move towards a more open society. Any executive who expects to work within a tightly defined enterprise firewall is living in cloud cuckoo land, he argues. More to the point, they will miss out on big advantages.
“If you can expose your sources to a range of developers, you can start to benefit from massive innovation,” he says. “You can get really big benefits from opening your data to external experts who can focus on areas that you don’t have the capability to develop internally.”

Many IT leaders would like to open data to outside experts, suggests Shiraji. For CIOs who are keen to expose their sources, he suggests letting small-scale developers take a close look at in-house data silos in an attempt to discover what relationships might exist and what advantages could accrue….(More)”