Unleashing the power of big data to guide precision medicine in China


Article by Yvaine Ye in Nature: “Precision medicine in China was given a boost in 2016 when the government included the field in its 13th five-year economic plan. The policy blueprint, which defined the country’s spending priorities until 2020, pledged to “spur innovation and industrial application” in precision medicine alongside other areas such as smart vehicles and new materials.

Precision medicine is part of the Healthy China 2030 plan, also launched in 2016. The idea is to use the approach to tackle some major health-care challenges the country faces, such as rising cancer rates and issues related to an ageing population. Current projections suggest that, by 2040, 28% of China’s population will be over 60 years old.

Following the announcement of the five-year plan, China’s Ministry of Science and Technology (MOST) launched a precision-medicine project as part of its National Key Research and Development Program. MOST has invested about 1.3 billion yuan (US$200.4 million) in more than 100 projects from 2016 to 2018. These range from finding new drug targets for chronic diseases such as diabetes to developing better sequencing technologies and building a dozen large population cohorts comprising hundreds of thousands of people from across China.

China’s population of 1.4 billion people means the country has great potential for using big data to study health issues, says Zhengming Chen, an epidemiologist and chronic-disease researcher at the University of Oxford, UK. “The advantage is especially prominent in the research of rare diseases, where you might not be able to have a data set in smaller countries like the United Kingdom, where only a handful of cases exist,” says Chen, who leads the China Kadoorie Biobank, a chronic-disease initiative that launched in 2004. It recruited more than 510,000 adults from 10 regions across China in its first 4 years, collecting data through questionnaires and by recording physical measurements and storing participants’ blood samples for future study. So far, the team has investigated whether some disease-related lifestyle factors that have been identified in the West apply to the Chinese population. They have just begun to dig into participants’ genetic data, says Chen.

Another big-data precision-medicine project launched in 2021, after Huijun Yuan, a physician who has been researching hereditary hearing loss for more than two decades, founded the Institute of Rare Diseases at West China Hospital in Chengdu, Sichuan province, in 2020. By 2025, the institute plans to set up a database of 100,000 people from China who have rare conditions, including spinal muscular atrophy and albinism. It will contain basic health information and data relating to biological samples, such as blood for gene sequencing. Rare diseases are hard to diagnose, because their incidences are low. But the development of technologies such as genetic testing and artificial intelligence driven by big data is providing a fresh approach to diagnosing these rare conditions, and could pave the way for therapies…(More)”.

Dynamic World


About: “The real world is as dynamic as the people and natural processes that shape it. Dynamic World is a near realtime 10m resolution global land use land cover dataset, produced using deep learning, freely available and openly licensed. It is the result of a partnership between Google and the World Resources Institute, to produce a dynamic dataset of the physical material on the surface of the Earth. Dynamic World is intended to be used as a data product for users to add custom rules with which to assign final class values, producing derivative land cover maps.

Key innovations of Dynamic World

  1. Near realtime data. Over 5000 Dynamic World image are produced every day, whereas traditional approaches to building land cover data can take months or years to produce. As a result of leveraging a novel deep learning approach, based on Sentinel-2 Top of Atmosphere, Dynamic World offers global land cover updating every 2-5 days depending on location.
  2. Per-pixel probabilities across 9 land cover classes. A major benefit of an AI-powered approach is the model looks at an incoming Sentinel-2 satellite image and, for every pixel in the image, estimates the degree of tree cover, how built up a particular area is, or snow coverage if there’s been a recent snowstorm, for example.
  3. Ten meter resolution. As a result of the European Commission’s Copernicus Programme making European Space Agency Sentinel data freely and openly available, products like Dynamic World are able to offer 10m resolution land cover data. This is important because quantifying data in higher resolution produces more accurate results for what’s really on the surface of the Earth…(More)”.

Impediment of Infodemic on Disaster Policy Efficacy: Insights from Location Big Data


Paper by Xiaobin Shen, Natasha Zhang Foutz, and Beibei Li: “Infodemics impede the efficacy of business and public policies, particularly in disastrous times when high-quality information is in the greatest demand. This research proposes a multi-faceted conceptual framework to characterize an infodemic and then empirically assesses its impact on the core mitigation policy of a latest prominent disaster, the COVID-19 pandemic. Analyzing a half million records of COVID-related news media and social media, as well as .2 billion records of location data, via a multitude of methodologies, including text mining and spatio-temporal analytics, we uncover a number of interesting findings. First, the volume of the COVID information incurs an inverted-U-shaped impact on individuals’ compliance with the lockdown policy. That is, a smaller volume encourages the policy compliance, whereas an overwhelming volume discourages compliance, revealing negative ramifications of excessive information about a disaster. Second, novel information boosts policy compliance, signifying the value of offering original and distinctive, instead of redundant, information to the public during a disaster. Third, misinformation exhibits a U-shaped influence unexplored by the literature, deterring policy compliance until a larger amount surfaces, diminishing informational value, escalating public uncertainty. Overall, these findings demonstrate the power of information technology, such as media analytics and location sensing, in disaster management. They also illuminate the significance of strategic information management during disasters and the imperative need for cohesive efforts across governments, media, technology platforms, and the general public to curb future infodemics…(More)”.

Decoding human behavior with big data? Critical, constructive input from the decision sciences


Paper by Konstantinos V. Katsikopoulos and Marc C. Canellas: “Big data analytics employs algorithms to uncover people’s preferences and values, and support their decision making. A central assumption of big data analytics is that it can explain and predict human behavior. We investigate this assumption, aiming to enhance the knowledge basis for developing algorithmic standards in big data analytics. First, we argue that big data analytics is by design atheoretical and does not provide process-based explanations of human behavior; thus, it is unfit to support deliberation that is transparent and explainable. Second, we review evidence from interdisciplinary decision science, showing that the accuracy of complex algorithms used in big data analytics for predicting human behavior is not consistently higher than that of simple rules of thumb. Rather, it is lower in situations such as predicting election outcomes, criminal profiling, and granting bail. Big data algorithms can be considered as candidate models for explaining, predicting, and supporting human decision making when they match, in transparency and accuracy, simple, process-based, domain-grounded theories of human behavior. Big data analytics can be inspired by behavioral and cognitive theory….(More)”.

Police surveillance and facial recognition: Why data privacy is an imperative for communities of color


Paper by Nicol Turner Lee and Caitlin Chin: “Governments and private companies have a long history of collecting data from civilians, often justifying the resulting loss of privacy in the name of national security, economic stability, or other societal benefits. But it is important to note that these trade-offs do not affect all individuals equally. In fact, surveillance and data collection have disproportionately affected communities of color under both past and current circumstances and political regimes.

From the historical surveillance of civil rights leaders by the Federal Bureau of Investigation (FBI) to the current misuse of facial recognition technologies, surveillance patterns often reflect existing societal biases and build upon harmful and virtuous cycles. Facial recognition and other surveillance technologies also enable more precise discrimination, especially as law enforcement agencies continue to make misinformed, predictive decisions around arrest and detainment that disproportionately impact marginalized populations.

In this paper, we present the case for stronger federal privacy protections with proscriptive guardrails for the public and private sectors to mitigate the high risks that are associated with the development and procurement of surveillance technologies. We also discuss the role of federal agencies in addressing the purposes and uses of facial recognition and other monitoring tools under their jurisdiction, as well as increased training for state and local law enforcement agencies to prevent the unfair or inaccurate profiling of people of color. We conclude the paper with a series of proposals that lean either toward clear restrictions on the use of surveillance technologies in certain contexts, or greater accountability and oversight mechanisms, including audits, policy interventions, and more inclusive technical designs….(More)”

Researcher Helps Create Big Data ‘Early Alarm’ for Ukraine Abuses


Article by Chris Carroll: From searing images of civilians targeted by shelling to detailed accounts of sick children and their families fleeing nearby fighting to seek medical care, journalists have created a kaleidoscopic view of the suffering that has engulfed Ukraine since Russia invaded—but the news media can’t be everywhere.

Social media practically can be, however, and a University of Maryland researcher is part of a U.S.-Ukrainian multi-institutional team that’s harvesting data from Twitter and analyzing it with machine-learning algorithms. The result is a real-time system that provides a running account of what people in Ukraine are facing, constructed from their own accounts.

The project, Data for Ukraine, has been running for about three weeks, and has shown itself able to surface important events a few hours ahead of Western or even Ukrainian media sources. It focuses on four areas: humanitarian needs, displaced people, civilian resistance and human rights violations. In addition to simply showing spikes of credible tweets about certain subjects the team is tracking, the system also geolocates tweets—essentially mapping where events take place.

“It’s an early alarm system for human rights abuses,” said Ernesto Calvo, professor of government and politics and director of UMD’s Inter-Disciplinary Lab for Computational Social Science. “For it to work, we need to know two basic things: what is happening or being reported, and who is reporting those things.”

Calvo and his lab focus on the second of those two requirements, and constructed a “community detection” system to identify key nodes of Twitter users from which to use data. Other team members with expertise in Ukrainian society and politics spotted him a list of about 400 verified users who actively tweet on relevant topics. Then Calvo, who honed his approach analyzing social media from political and environmental crises in Latin America, and his team expanded and deepened the collection, drawing on connections and followers of the initial list so that millions of tweets per day now feed the system.

Nearly half of the captured tweets are in Ukrainian, 30% are in English and 20% are in Russian. Knowing who to exclude—accounts started the day before the invasion, for instance, or with few long-term connections—is key, Calvo said…(More)”.

The ethical imperative to identify and address data and intelligence asymmetries


Article by Stefaan Verhulst in AI & Society: “The insight that knowledge, resulting from having access to (privileged) information or data, is power is more relevant today than ever before. The data age has redefined the very notion of knowledge and information (as well as power), leading to a greater reliance on dispersed and decentralized datasets as well as to new forms of innovation and learning, such as artificial intelligence (AI) and machine learning (ML). As Thomas Piketty (among others) has shown, we live in an increasingly stratified world, and our society’s socio-economic asymmetries are often grafted onto data and information asymmetries. As we have documented elsewhere, data access is fundamentally linked to economic opportunity, improved governance, better science and citizen empowerment. The need to address data and information asymmetries—and their resulting inequalities of political and economic power—is therefore emerging as among the most urgent ethical challenges of our era, yet often not recognized as such.

Even as awareness grows of this imperative, society and policymakers lag in their understanding of the underlying issue. Just what are data asymmetries? How do they emerge, and what form do they take? And how do data asymmetries accelerate information and other asymmetries? What forces and power structures perpetuate or deepen these asymmetries, and vice versa? I argue that it is a mistake to treat this problem as homogenous. In what follows, I suggest the beginning of a taxonomy of asymmetries. Although closely related, each one emerges from a different set of contingencies, and each is likely to require different policy remedies. The focus of this short essay is to start outlining these different types of asymmetries. Further research could deepen and expand the proposed taxonomy as well help define solutions that are contextually appropriate and fit for purpose….(More)”.

Big data, computational social science, and other recent innovations in social network analysis


Paper by David Tindall, John McLevey, Yasmin Koop-Monteiro, Alexander Graham: “While sociologists have studied social networks for about one hundred years, recent developments in data, technology, and methods of analysis provide opportunities for social network analysis (SNA) to play a prominent role in the new research world of big data and computational social science (CSS). In our review, we focus on four broad topics: (1) Collecting Social Network Data from the Web, (2) Non-traditional and Bipartite/Multi-mode Networks, including Discourse and Semantic Networks, and Social-Ecological Networks, (3) Recent Developments in Statistical Inference for Networks, and (4) Ethics in Computational Network Research…(More)”

The Use of Artificial Intelligence as a Strategy to Analyse Urban Informality


Article by Agustina Iñiguez: “Within the Latin American and Caribbean region, it has been recorded that at least 25% of the population lives in informal settlements. Given that their expansion is one of the major problems afflicting these cities, a project is presented, supported by the IDB, which proposes how new technologies are capable of contributing to the identification and detection of these areas in order to intervene in them and help reduce urban informality.

Informal settlements, also known as slums, shantytowns, camps or favelas, depending on the country in question, are uncontrolled settlements on land where, in many cases, the conditions for a dignified life are not in place. Through self-built dwellings, these sites are generally the result of the continuous growth of the housing deficit.

For decades, the possibility of collecting information about the Earth’s surface through satellite imagery has been contributing to the analysis and production of increasingly accurate and useful maps for urban planning. In this way, not only the growth of cities can be seen, but also the speed at which they are growing and the characteristics of their buildings.

Advances in artificial intelligence facilitate the processing of a large amount of information. When a satellite or aerial image is taken of a neighbourhood where a municipal team has previously demarcated informal areas, the image is processed by an algorithm that will identify the characteristic visual patterns of the area observed from space. The algorithm will then identify other areas with similar characteristics in other images, automatically recognising the districts where informality predominates. It is worth noting that while satellites are able to report both where and how informal settlements are growing, specialised equipment and processing infrastructure are also required…(More)”

The Staggering Ecological Impacts of Computation and the Cloud


Essay by Steven Gonzalez Monserrate: “While in technical parlance the “Cloud” might refer to the pooling of computing resources over a network, in popular culture, “Cloud” has come to signify and encompass the full gamut of infrastructures that make online activity possible, everything from Instagram to Hulu to Google Drive. Like a puffy cumulus drifting across a clear blue sky, refusing to maintain a solid shape or form, the Cloud of the digital is elusive, its inner workings largely mysterious to the wider public, an example of what MIT cybernetician Norbert Weiner once called a “black box.” But just as the clouds above us, however formless or ethereal they may appear to be, are in fact made of matter, the Cloud of the digital is also relentlessly material.

To get at the matter of the Cloud we must unravel the coils of coaxial cables, fiber optic tubes, cellular towers, air conditioners, power distribution units, transformers, water pipes, computer servers, and more. We must attend to its material flows of electricity, water, air, heat, metals, minerals, and rare earth elements that undergird our digital lives. In this way, the Cloud is not only material, but is also an ecological force. As it continues to expand, its environmental impact increases, even as the engineers, technicians, and executives behind its infrastructures strive to balance profitability with sustainability. Nowhere is this dilemma more visible than in the walls of the infrastructures where the content of the Cloud lives: the factory libraries where data is stored and computational power is pooled to keep our cloud applications afloat….

To quell this thermodynamic threat, data centers overwhelmingly rely on air conditioning, a mechanical process that refrigerates the gaseous medium of air, so that it can displace or lift perilous heat away from computers. Today, power-hungry computer room air conditioners (CRACs) or computer room air handlers (CRAHs) are staples of even the most advanced data centers. In North America, most data centers draw power from “dirty” electricity grids, especially in Virginia’s “data center alley,” the site of 70 percent of the world’s internet traffic in 2019. To cool, the Cloud burns carbon, what Jeffrey Moro calls an “elemental irony.” In most data centers today, cooling accounts for greater than 40 percent of electricity usage….(More)”.