Researchers Develop Faster Way to Replace Bad Data With Accurate Information


NCSU Press Release: “Researchers from North Carolina State University and the Army Research Office have demonstrated a new model of how competing pieces of information spread in online social networks and the Internet of Things (IoT). The findings could be used to disseminate accurate information more quickly, displacing false information about anything from computer security to public health….

In their paper, the researchers show that a network’s size plays a significant role in how quickly “good” information can displace “bad” information. However, a large network is not necessarily better or worse than a small one. Instead, the speed at which good data travels is primarily affected by the network’s structure.

A highly interconnected network can disseminate new data very quickly. And the larger the network, the faster the new data will travel.

However, in networks that are connected primarily by a limited number of key nodes, those nodes serve as bottlenecks. As a result, the larger this type of network is, the slower the new data will travel.

The researchers also identified an algorithm that can be used to assess which point in a network would allow you to spread new data throughout the network most quickly.

“Practically speaking, this could be used to ensure that an IoT network purges old data as quickly as possible and is operating with new, accurate data,” Wenye Wang says.

“But these findings are also applicable to online social networks, and could be used to facilitate the spread of accurate information regarding subjects that affect the public,” says Jie Wang. “For example, we think it could be used to combat misinformation online.”…(More)”

Full paper: “Modeling and Analysis of Conflicting Information Propagation in a Finite Time Horizon,”

The Fate of the News in the Age of the Coronavirus


Michael Luo at the New Yorker: “The shift to paywalls has been a boon for quality journalism. Instead of chasing trends on search engines and social media, subscription-based publications can focus on producing journalism worth paying for, which has meant investments in original reporting of all kinds. A small club of élite publications has now found a sustainable way to support its journalism, through readers instead of advertisers. The Times and the Post, in particular, have thrived in the Trump era. So have subscription-driven startups, such as The Information, which covers the tech industry and charges three hundred and ninety-nine dollars a year. Meanwhile, many of the free-to-read outlets still dependent on ad revenue—including former darlings of the digital-media revolution, such as BuzzFeed, Vice, HuffPost, Mic, Mashable, and the titles under Vox Media—have labored to find viable business models.

Many of these companies attracted hundreds of millions of dollars in venture funding, and built sizable newsrooms. Even so, they’ve struggled to succeed as businesses, in part because Google and Facebook take in the bulk of the revenue derived from digital advertising. Some sites have been forced to shutter; others have slashed their staffs and scaled back their journalistic ambitions. There are free digital news sites that continue to attract outsized audiences: CNN and Fox News, for instance, each draw well over a hundred million visitors a month. But the news on these sites tends to be commodified. Velocity is the priority, not complexity and depth.

A robust, independent press is widely understood to be an essential part of a functioning democracy. It helps keep citizens informed; it also serves as a bulwark against the rumors, half-truths, and propaganda that are rife on digital platforms. It’s a problem, therefore, when the majority of the highest-quality journalism is behind a paywall. In recent weeks, recognizing the value of timely, fact-based news during a pandemic, the TimesThe Atlantic, the Wall Street Journal, the Washington Post, and other publications—including The New Yorker—have lowered their paywalls for portions of their coronavirus coverage. But it’s unclear how long publishers will stay committed to keeping their paywalls down, as the state of emergency stretches on. The coronavirus crisis promises to engulf every aspect of society, leading to widespread economic dislocations and social disruptions that will test our political processes and institutions in ways far beyond the immediate public-health threat. With the misinformation emanating from the Trump White House, the need for reliable, widely-accessible information and facts is more urgent than ever. Yet the economic shutdown created by the spread of covid-19 promises to decimate advertising revenue, which could doom more digital news outlets and local newspapers.

It’s easy to underestimate the information imbalance in American society. After all, “information” has never felt more easily available. A few keyboard strokes on an Internet search engine instantly connects us to unlimited digital content. On Facebook, Instagram, and other social-media platforms, people who might not be intentionally looking for news encounter it, anyway. And yet the apparent ubiquity of news and information is misleading. Between 2004 and 2018, nearly one in five American newspapers closed; in that time, print newsrooms have shed nearly half of their employees. Digital-native publishers employ just a fraction of the diminished number of journalists who still remain at legacy outlets, and employment in broadcast-TV newsrooms trails that of newspapers. On some level, news is a product manufactured by journalists. Fewer journalists means less news. The tributaries that feed the river of information have been drying up. There are a few mountain springs of quality journalism; most sit behind a paywall.

A report released last year by the Reuters Institute for the Study of Journalism maps the divide that is emerging among news readers. The proportion of people in the United States who pay for online news remains small: just sixteen per cent. Those readers tend to be wealthier, and are more likely to have college degrees; they are also significantly more likely to find news trustworthy. Disparities in the level of trust that people have in their news diets, the data suggests, are likely driven by the quality of the news they are consuming….(More)”.

A Closer Look at Location Data: Privacy and Pandemics


Assessment by Stacey Gray: “In light of COVID-19, there is heightened global interest in harnessing location data held by major tech companies to track individuals affected by the virus, better understand the effectiveness of social distancing, or send alerts to individuals who might be affected based on their previous proximity to known cases. Governments around the world are considering whether and how to use mobile location data to help contain the virus: Israel’s government passed emergency regulations to address the crisis using cell phone location data; the European Commission requested that mobile carriers provide anonymized and aggregate mobile location data; and South Korea has created a publicly available map of location data from individuals who have tested positive. 

Public health agencies and epidemiologists have long been interested in analyzing device location data to track diseases. In general, the movement of devices effectively mirrors movement of people (with some exceptions discussed below). However, its use comes with a range of ethical and privacy concerns. 

In order to help policymakers address these concerns, we provide below a brief explainer guide of the basics: (1) what is location data, (2) who holds it, and (3) how is it collected? Finally we discuss some preliminary ethical and privacy considerations for processing location data. Researchers and agencies should consider: how and in what context location data was collected; the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

The US lacks health information technologies to stop COVID-19 epidemic


Niam Yaraghi at Brookings: “The COVID-19 pandemic highlights the crucial importance of health information technology and data interoperability. The pandemic has shattered our common beliefs about the type and scope of health information exchange. It has shown us that the definition of health data should no longer be limited to medical data of patients and instead should encompass a much wider variety of data types from individuals’ online and offline activity. Moreover, the pandemic has proven that healthcare is not local. In an interconnected world, with more individuals traveling long distances than ever before, it is naïve to look at regions in isolation from each other and try to manage public health independently. To efficiently manage a pandemic like this, the scope of health information exchange efforts should not be limited to small geographical regions and instead should be done at least nationally, if not internationally.

HEALTH DATA SHOULD GO BEYOND MEDICAL RECORDS

A wide variety of factors affect one’s overall well-being, a very small fraction of which could be quantified via medical records. We tend to ignore this fact, and try to explain and predict a patient’s condition only based on medical data. Previously, we did not have the technology and knowledge to collect huge amounts of non-medical data and analyze it for healthcare purposes. Now, privacy concerns and outdated regulations have exacerbated the situation and has led to a fragmented data ecosystem. Interoperability, even among healthcare providers, remains a major challenge where exchange and analysis of non-medical data for healthcare purposes almost never happens….(More)”.

Will This Year’s Census Be the Last?


Jill Lepore at The New Yorker: “People have been counting people for thousands of years. Count everyone, beginning with babies who have teeth, decreed census-takers in China in the first millennium B.C.E., under the Zhou dynasty. “Take ye the sum of all the congregation of the children of Israel, after their families, by the house of their fathers, with the number of their names, every male by their polls,” God commands Moses in the Book of Numbers, describing a census, taken around 1500 B.C.E., that counted only men “twenty years old and upward, all that are able to go forth to war in Israel”—that is, potential conscripts.

Ancient rulers took censuses to measure and gather their strength: to muster armies and levy taxes. Who got counted depended on the purpose of the census. In the United States, which counts “the whole number of persons in each state,” the chief purpose of the census is to apportion representation in Congress. In 2018, Secretary of Commerce Wilbur Ross sought to add a question to the 2020 U.S. census that would have read, “Is this person a citizen of the United States?” Ross is a banker who specialized in bankruptcy before joining the Trump Administration; earlier, he had handled cases involving the insolvency of Donald Trump’s casinos. The Census Bureau objected to the question Ross proposed. Eighteen states, the District of Columbia, fifteen cities and counties, the United Conference of Mayors, and a coalition of non-governmental organizations filed a lawsuit, alleging that the question violated the Constitution.

Last year, United States District Court Judge Jesse Furman, in an opinion for the Southern District, found Ross’s attempt to add the citizenship question to be not only unlawful, and quite possibly unconstitutional, but also, given the way Ross went about trying to get it added to the census, an abuse of power. Furman wrote, “To conclude otherwise and let Secretary Ross’s decision stand would undermine the proposition—central to the rule of law—that ours is a ‘government of laws, and not of men.’ ” There is, therefore, no citizenship question on the 2020 census.

All this, though, may be by the bye, because the census, like most other institutions of democratic government, is under threat. Google and Facebook, after all, know a lot more about you, and about the population of the United States, or any other state, than does the U.S. Census Bureau or any national census agency. This year may be the last time that a census is taken door by door, form by form, or even click by click….

In the ancient world, rulers counted and collected information about people in order to make use of them, to extract their labor or their property. Facebook works the same way. “It was the great achievement of eighteenth- and nineteenth-century census-takers to break that nexus and persuade people—the public on one side and their colleagues in government on the other—that states could collect data on their citizens without using it against them,” Whitby writes. It is among the tragedies of the past century that this trust has been betrayed. But it will be the error of the next if people agree to be counted by unregulated corporations, rather than by democratic governments….(More)”.

Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration


Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, and David L Rosen at Public Health Ethics: “Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court records and confidential HIV records) to enhance HIV surveillance and improve continuity of care for incarcerated populations. We argue that the well-known framework of Emanuel et al. (2000) provides only partial ethical guidance for the activities we describe, which lie at a complex intersection of public health research and public health practice. We suggest some ethical considerations from the ethics of public health practice to help fill gaps in this relatively unexplored area….(More)”.

Techlash? America’s Growing Concern with Major Technology Companies


Press Release: “Just a few years ago, Americans were overwhelmingly optimistic about the power of new technologies to foster an informed and engaged society. More recently, however, that confidence has been challenged by emerging concerns over the role that internet and technology companies — especially social media — now play in our democracy.

A new Knight Foundation and Gallup study explores how much the landscape has shifted. This wide-ranging study confirms that, for Americans, the techlash is real, widespread, and bipartisan. From concerns about the spread of misinformation to election interference and data privacy, we’ve documented the deep pessimism of folks across the political spectrum who believe tech companies have too much power — and that they do more harm than good. 

Despite their shared misgivings, Americans are deeply divided on how best to address these challenges. This report explores the contours of the techlash in the context of the issues currently animating policy debates in Washington and Silicon Valley. Below are the main findings from the executive summary….

  • 77% of Americans say major internet and technology companies like Facebook, Google, Amazon and Apple have too muchpower.
  • Americans are equally divided among those who favor (50%) and oppose (49%) government intervention that would require internet and technology companies to break into smaller companies. 
  • Americans do not trust social media companies much (44%) or at all (40%) to make the right decisions about what content should or should not be allowed on online platforms.
  • However, they would still prefer the companies (55%) to make those decisions rather than the government (44%). …(More)

Milwaukee’s Amani Neighborhood Uses Data to Target Traffic Safety and Build Trust


Article by Kassie Scott: “People in Milwaukee’s Amani neighborhood are using data to identify safety issues and build relationships with the police. It’s a story of community-engaged research at its best.

In 2017, the Milwaukee Police Department received a grant under the federal Byrne Criminal Justice Innovation program, now called the Community Based Crime Reduction Program, whose purpose is to bridge the gap between practitioners and researchers and advance the use of data in making communities safer. Because of its close ties in the Amani neighborhood, the Dominican Center was selected to lead this initiative, known as the Amani Safety Initiative, and they partnered with local churches, the district attorney’s office, LISC-Milwaukee, and others. To support the effort with data and coaching, the police department contracted with Data You Can Use.

Together with Data You Can Use, the Amani Safety Initiative team first implemented a survey to gauge perceptions of public safety and police legitimacy. Neighborhood ambassadors were trained (and paid) to conduct the survey themselves, going door to door to gather the information from nearly 300 of their neighbors. The ambassadors shared these results with their neighborhood during what they called “data chats.” They also printed summary survey results on door hangers, which they distributed throughout the neighborhood.

Neighbors and community organizations were surprised by the survey results. Though violent crime and mistrust in the police were commonly thought to be the biggest issues, the data showed that residents were most concerned about traffic safety. Ultimately, residents decided to post slow-down signs in intersections.

This project stands out for letting the people in the neighborhood lead the way. Neighbors collected data, shared results, and took action. The partnership between neighbors, police, and local organizations shows how people can drive decision-making for their neighborhood.

The larger story is one of social cohesion and mutual trust. Through participating in the initiative and learning more about their neighborhood, Amani neighbors built stronger relationships with the police. The police began coming to neighborhood community meetings, which helped them build relationships with people in the community and understand the challenges they face….(More).

Is Your Data Being Collected? These Signs Will Tell You Where


Flavie Halais at Wired: “Alphabet’s Sidewalk Labs is testing icons that provide “digital transparency” when information is collected in public spaces….

As cities incorporate digital technologies into their landscapes, they face the challenge of informing people of the many sensors, cameras, and other smart technologies that surround them. Few people have the patience to read through the lengthy privacy notice on a website or smartphone app. So how can a city let them know how they’re being monitored?

Sidewalk Labs, the Google sister company that applies technology to urban problems, is taking a shot. Through a project called Digital Transparency in the Public Realm, or DTPR, the company is demonstrating a set of icons, to be displayed in public spaces, that shows where and what kinds of data are being collected. The icons are being tested as part Sidewalk Labs’ flagship project in Toronto, where it plans to redevelop a 12-acre stretch of the city’s waterfront. The signs would be displayed at each location where data would be collected—streets, parks, businesses, and courtyards.

Data collection is a core feature of the project, called Sidewalk Toronto, and the source of much of the controversy surrounding it. In 2017, Waterfront Toronto, the organization in charge of administering the redevelopment of the city’s eastern waterfront, awarded Sidewalk Labs the contract to develop the waterfront site. The project has ambitious goals: It says it could create 44,000 direct jobs by 2040 and has the potential to be the largest “climate-positive” community—removing more CO2 from the atmosphere than it produces—in North America. It will make use of new urban technology like modular street pavers and underground freight delivery. Sensors, cameras, and Wi-Fi hotspots will monitor and control traffic flows, building temperature, and crosswalk signals.

All that monitoring raises inevitable concerns about privacy, which Sidewalk aims to address—at least partly—by posting signs in the places where data is being collected.

The signs display a set of icons in the form of stackable hexagons, derived in part from a set of design rules developed by Google in 2014. Some describe the purpose for collecting the data (mobility, energy efficiency, or waste management, for example). Others refer to the type of data that’s collected, such as photos, air quality, or sound. When the data is identifiable, meaning it can be associated with a person, the hexagon is yellow. When the information is stripped of personal identifiers, the hexagon is blue…(More)”.