Dissecting racial bias in an algorithm used to manage the health of populations


Paper by Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan in Science: “Health systems rely on commercial prediction algorithms to identify and help patients with complex health needs. We show that a widely used algorithm, typical of this industry-wide approach and affecting millions of patients, exhibits significant racial bias: At a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses. Remedying this disparity would increase the percentage of Black patients receiving additional help from 17.7 to 46.5%. The bias arises because the algorithm predicts health care costs rather than illness, but unequal access to care means that we spend less money caring for Black patients than for White patients. Thus, despite health care cost appearing to be an effective proxy for health by some measures of predictive accuracy, large racial biases arise. We suggest that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts….(More)”.

Waze launches data-sharing integration for cities with Google Cloud


Ryan Johnston at StateScoop: “Thousands of cities across the world that rely on externally-sourced traffic data from Waze, the route-finding mobile app, will now have access to the data through the Google Cloud suite of analytics tools instead of a raw feed, making it easier for city transportation and planning officials to reach data-driven decisions. 

Waze said Tuesday that the anonymized data is now available through Google Cloud, with the goal of making curbside management, roadway maintenance and transit investment easier for small to midsize cities that don’t have the resources to invest in enterprise data-analytics platforms of their own. Since 2014, Waze — which became a Google subsidiary in 2013 — has submitted traffic data to its partner cities through its “Waze for Cities” program, but those data sets arrived in raw feeds without any built-in analysis or insights.

While some cities have built their own analysis tools to understand the free data from the company, others have struggled to stay afloat in the sea of data, said Dani Simons, Waze’s head of public sector partnerships.

“[What] we’ve realized is providing the data itself isn’t enough for our city partners or for a lot of our city and state partners,” Simons said. “We have been asked over time for better ways to analyze and turn that raw data into something more actionable for our public partners, and that’s why we’re doing this.”

The data will now arrive automatically integrated with Google’s free data analysis tool, BigQuery, and a visualization tool, Data Studio. Cities can use the tools to analyze up to a terabyte of data and store up to 10 gigabytes a month for free, but they can also choose to continue to use in-house analysis tools, Simons said. 

The integration was also designed with input from Waze’s top partner cities, including Los Angeles; Seattle; and San Jose, California. One of Waze’s private sector partners, Genesis Pulse, which designs software for emergency responders, reported that Waze users identified 40 percent of roadside accidents an average of 4.5 minutes before those incidents were reported to 911 or public safety.

The integration is Waze’s attempt at solving two of the biggest data problems that cities have today, Simons told StateScoop. For some cities in the U.S., Waze is one of the several private companies sharing transit data with them. Other cities are drowning in data from traffic sensors, city-owned fleets data or private mobility companies….(More)”.

From Transactions Data to Economic Statistics: Constructing Real-Time, High-Frequency, Geographic Measures of Consumer Spending


Paper by Aditya Aladangady et al: “Access to timely information on consumer spending is important to economic policymakers. The Census Bureau’s monthly retail trade survey is a primary source for monitoring consumer spending nationally, but it is not well suited to study localized or short-lived economic shocks. Moreover, lags in the publication of the Census estimates and subsequent, sometimes large, revisions diminish its usefulness for real-time analysis. Expanding the Census survey to include higher frequencies and subnational detail would be costly and would add substantially to respondent burden. We take an alternative approach to fill these information gaps. Using anonymized transactions data from a large electronic payments technology company, we create daily estimates of retail spending at detailed geographies. Our daily estimates are available only a few days after the transactions occur, and the historical time series are available from 2010 to the present. When aggregated to the national leve l, the pattern of monthly growth rates is similar to the official Census statistics. We discuss two applications of these new data for economic analysis: First, we describe how our monthly spending estimates are useful for real-time monitoring of aggregate spending, especially during the government shutdown in 2019, when Census data were delayed and concerns about the economy spiked. Second, we show how the geographic detail allowed us quantify in real time the spending effects of Hurricanes Harvey and Irma in 2017….(More)”.

Toolkit to Help Community Leaders Drive Sustainable, Inclusive Growth


The Mastercard Center for Inclusive Growth: “… is unveiling a groundbreaking suite of tools that will provide local leaders with timely data-driven insights on the current state of and potential for inclusive growth in their communities. The announcement comes as private and public sector leaders gather in Washington for the inaugural Global Inclusive Growth Summit.

For the first time the new Inclusive Growth Toolkit brings together a clear, simple view of social and economic growth in underserved communities across the U.S., at the census-tract level. This was created in response to growing demand from community leaders for more evidence-based insights, to help them steer impact investment dollars to locally-led economic development initiatives, unlock the potential of neighborhoods, and improve quality of life for all.    

The initial design of the toolkit is focused on driving sustainable growth for the 37+ million people living in the 8700+ QOZs throughout the United States. This comprehensive picture reveals that neighborhoods can look very different and may require different types of interventions to achieve successful and sustainable growth.

The Inclusive Growth Toolkit includes:

  • The Inclusive Growth Score – an interactive online map where users can view measures of inclusion and growth and then download a PDF Scorecard for any of the QOZs at census tract level.

A deep-dive analytics consultancy service that provides community leaders with customized insights to inform policy decisions, prospectus development, and impact investor discussions….(More)”.

Data Ownership: Exploring Implications for Data Privacy Rights and Data Valuation


Hearing by the Senate Committee on Banking, Housing and Urban Affairs:”…As a result of an increasingly digital economy, more personal information is available to companies than ever before.
Private companies are collecting, processing, analyzing and sharing considerable data on individuals for all kinds of purposes.

There have been many questions about what personal data is being collected, how it is being collected, with whom it is being shared and how it is being used, including in ways that affect individuals’ financial lives.

Given the vast amount of personal information flowing through the economy, individuals need real control over their personal data. This Committee has held a series of data privacy hearings exploring possible
frameworks for facilitating privacy rights to consumers. Nearly all have included references to data as a new currency or commodity.

The next question, then, is who owns it? There has been much debate about the concept of data ownership, the monetary value of personal information and its potential role in data privacy…..The witnesses will be: 

  1. Mr. Jeffrey Ritter Founding Chair, American Bar Association Committee on Cyberspace Law, External Lecturer
  2. Mr. Chad Marlow Senior Advocacy And Policy Counsel American Civil Liberties Union
  3. Mr. Will Rinehart Director Of Technology And Innovation Policy American Action Forum
  4. Ms. Michelle Dennedy Chief Executive Officer DrumWave Inc.

Rethinking Encryption


Jim Baker at Lawfare: “…Public safety officials should continue to highlight instances where they find that encryption hinders their ability to effectively and efficiently protect society so that the public and lawmakers understand the trade-offs they are allowing. To do this, the Justice Department should, for example, file an annual public report describing, as best it can, the continuing nature and scope of the going dark problem. If necessary, it can also file a classified annual report with the appropriate congressional committees.

But, for the reasons discussed above, public safety officials should also become among the strongest supporters of widely available strong encryption.

I know full well that this approach will be a bitter pill for some in law enforcement and other public safety fields to swallow, and many people will reject it outright. It may make some of my former colleagues angry at me. I expect that some will say that I’m simply joining others who have left the government and switched sides on encryption to curry favor with the tech sector in order to get a job. That is wrong. My dim views about cybersecurity risks, China and Huawei are essentially the same as those that I held while in government. I also think that my overall approach on encryption today—as well as my frustration with Congress—is generally consistent with the approach I had while I was in government.

I have long said—as I do here—that encryption poses real challenges for public safety officials; that any proposed technical solution must properly balance all of the competing equities; and that (absent an unlikely definitive judicial ruling as a result of litigation) Congress must change the law to resolve the issue. What has changed is my acceptance of, or perhaps resignation to, the fact that Congress is unlikely to act, as well as my assessment that the relevant cybersecurity risks to society have grown disproportionately over the years when compared with other risks….(More)”.

Should Consumers Be Able to Sell Their Own Personal Data?


The Wall Street Journal: “People around the world are confused and concerned about what companies do with the data they collect from their interactions with consumers.

A global survey conducted last fall by the research firm Ipsos gives a sense of the scale of people’s worries and uncertainty. Roughly two-thirds of those surveyed said they knew little or nothing about how much data companies held about them or what companies did with that data. And only about a third of respondents on average said they had at least a fair amount of trust that a variety of corporate and government organizations would use the information they had about them in the right way….

Christopher Tonetti, an associate professor of economics at Stanford Graduate School of Business, says consumers should own and be able to sell their personal data. Cameron F. Kerry, a visiting fellow at the Brookings Institution and former general counsel and acting secretary of the U.S. Department of Commerce, opposes the idea….

YES: It Would Encourage Sharing of Data—a Plus for Consumers and Society…Data isn’t like other commodities in one fundamental way—it doesn’t diminish with use. And that difference is the key to why consumers should own the data that’s created when they interact with companies, and have the right to sell it.YES: It Would Encourage Sharing of Data—a Plus for Consumers and Society…

NO: It Would Do Little to Help Consumers, and Could Leave Them Worse Off Than Now…

But owning data will do little to help consumers’ privacy—and may well leave them worse off. Meanwhile, consumer property rights would create enormous friction for valid business uses of personal information and for the free flow of information we value as a society.

In our current system, consumers reflexively click away rights to data in exchange for convenience, free services, connection, endorphins or other motivations. In a market where consumers could sell or license personal information they generate from web browsing, ride-sharing apps and other digital activities, is there any reason to expect that they would be less motivated to share their information? …(More)”.

Computers have an unlikely origin story: the 1890 census


David Lindsey Roberts at FastCompany: “The U.S. Constitution requires that a population count be conducted at the beginning of every decade.

This census has always been charged with political significance and continues to be. That’s clear from the controversy over the conduct of the upcoming 2020 census.

But it’s less widely known how important the census has been in developing the U.S. computer industry, a story that I tell in my new book, Republic of Numbers: Unexpected Stories of Mathematical Americans Through History....

The only use of the census clearly specified in the Constitution is to allocate seats in the House of Representatives. More populous states get more seats.

A minimalist interpretation of the census mission would require reporting only the overall population of each state. But the census has never confined itself to this.

A complicating factor emerged right at the beginning, with the Constitution’s distinction between “free persons” and “three-fifths of all other persons.” This was the Founding Fathers’ infamous mealy-mouthed compromise between those states with a large number of enslaved persons and those states where relatively few lived.

The first census, in 1790, also made nonconstitutionally mandated distinctions by age and sex. In subsequent decades, many other personal attributes were probed as well: occupational status, marital status, educational status, place of birth, and so on….

John Shaw Billings, a physician assigned to assist the Census Office with compiling health statistics, had closely observed the immense tabulation efforts required to deal with the raw data of 1880. He expressed his concerns to a young mechanical engineer assisting with the census, Herman Hollerith, a recent graduate of the Columbia School of Mines.

On September 23, 1884, the U.S. Patent Office recorded a submission from the 24-year-old Hollerith, titled “Art of Compiling Statistics.”

By progressively improving the ideas of this initial submission, Hollerith would decisively win an 1889 competition to improve the processing of the 1890 census.

The technological solutions devised by Hollerith involved a suite of mechanical and electrical devices….After his census success, Hollerith went into business selling this technology. The company he founded would, after he retired, become International Business Machines—IBM. IBM led the way in perfecting card technology for recording and tabulating large sets of data for a variety of purposes….(More)”

AI script finds bias in movies before production starts


Springwise:The GD-IQ (Geena Davis Inclusion Quotient) Spellcheck for Bias analysis tool reviews film and television scripts for equality and diversity. Geena Davis, the founder of the Geena Davis Institute on Gender in Media, recently announced a yearlong pilot programme with Walt Disney Studios. The Spellcheck for Bias tool will be used throughout the studio’s development process.

Funded by Google, the GD-IQ uses audio-visual processing technologies from the University of Southern California Viterbi School of Engineering together with Google’s machine learning capabilities. 

The tool’s analysis reveals the percentages of representation and dialogue broken down into categories of gender, race, LGBTQIA and disability representation. The analysis also highlights non-gender identified speaking characters that could help improve equality and diversity. 

Designed to help identify unconscious bias before it becomes a publicly consumed piece of media, the tool also ranks the sophistication of the characters’ vocabulary and their relative level of power within the story.

The first study of film and television representation using the GD-IQ examined the top 200 grossing, non-animated films of 2014 and 2015. Unsurprisingly, the more diverse and equal a film’s characters were, the more money the film earned. …(More)”.

Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques


Paper by Gabriela V. Angeles et al: “Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current.

However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze.

The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS….(More)”.