Leveraging Social Media Data for Emergency Preparedness and Response


Report by the National Academies of Sciences, Engineering, and Medicine: “Most state departments of transportation (DOTs) use social media to broadcast information and monitor emergencies, but few rely heavily on social media data. The most common barriers to using social media for emergencies are personnel availability and training, privacy issues, and data reliability.

NCHRP Synthesis 610: Leveraging Social Media Data for Emergency Preparedness and Response, from TRB’s National Cooperative Highway Research Program, documents state DOT practices that leverage social media data for emergency preparedness, response, and recovery…(More)”.

How Statisticians Should Grapple with Privacy in a Changing Data Landscape


Article by Joshua Snoke, and Claire McKay Bowen: “Suppose you had a data set that contained records of individuals, including demographics such as their age, sex, and race. Suppose also that these data contained additional in-depth personal information, such as financial records, health status, or political opinions. Finally, suppose that you wanted to glean relevant insights from these data using machine learning, causal inference, or survey sampling adjustments. What methods would you use? What best practices would you ensure you followed? Where would you seek information to help guide you in this process?…(More)”

To Save Society from Digital Tech, Enable Scrutiny of How Policies Are Implemented


Article by Ido Sivan-Sevilla: “…there is little discussion about how to create accountability when implementing tech policies. Decades of research exploring policy implementation across diverse areas consistently shows how successful implementation allows policies to be adapted and involves crucial bargaining. But this is rarely understood in the tech sector. For tech policies to work, those responsible for enforcement and compliance should be overseen and held to account. Otherwise, as history shows, tech policies will struggle to fulfill the intentions of their policymakers.

Scrutiny is required for three types of actors. First are regulators, who convert promising tech laws into enforcement practices but are often ill-equipped for their mission. My recent research found that across Europe, the rigor and methods of national privacy regulators tasked with enforcing the European Union’s GDPR vary greatly. The French data protection authority, for instance, proactively monitors for privacy violations and strictly sanctions companies that overstep; in contrast, Bulgarian authorities monitor passively and are hesitant to act. Reflecting on the first five years of the GDPR, Max Schrems, the chair of privacy watchdog NOYB, found authorities and courts reluctant to enforce the law, and companies free to take advantage: “It often feels like there is more energy spent in undermining the GDPR than in complying with it.” Variations in resources and technical expertise among regulators create regulatory arbitrage that the regulated eagerly exploit.

Tech companies are the second type of actor requiring scrutiny. Service providers such as Goolge, Meta, and Twitter, along with lesser-known technology companies, mediate digital services for billions around the world but enjoy considerable latitude on how and whether they comply with tech policies. Civil society groups, for instance, uncovered how Meta was trying to bypass the GDPR and use personal information for advertising…(More)”.

Attacks on Tax Privacy: How the Tax Prep Industry Enabled Meta to Harvest Millions of Taxpayers’ Sensitive Data


Congressional Report: “The investigation revealed that:

  • Tax preparation companies shared millions of taxpayers’ data with Meta, Google, and other Big Tech firms: The tax prep companies used computer code – known as pixels – to send data to Meta and Google. While most websites use pixels, it is particularly reckless for online tax preparation websites to use them on webpages where tax return information is entered unless further steps are taken to ensure that the pixels do not access sensitive information. TaxAct, TaxSlayer, and H&R Block confirmed that they had used the Meta Pixel, and had been using it “for at least a couple of years” and all three companies had been using Google Analytics (GA) for even longer.
  • Tax prep companies shared extraordinarily sensitive personal and financial information with Meta, which used the data for diverse advertising purposes: TaxAct, H&R Block, and TaxSlayer each revealed, in response to this Congressional inquiry, that they shared taxpayer data via their use of the Meta Pixel and Google’s tools. Although the tax prep companies and Big Tech firms claimed that all shared data was anonymous, the FTC and experts have indicated that the data could easily be used to identify individuals, or to create a dossier on them that could be used for targeted advertising or other purposes. 
  • Tax prep companies and Big Tech firms were reckless about their data sharing practices and their treatment of sensitive taxpayer data: The tax prep companies indicated that they installed the Meta and Google tools on their websites without fully understanding the extent to which they would send taxpayer data to these tech firms, without consulting with independent compliance or privacy experts, and without full knowledge of Meta’s use of and disposition of the data. 
  • Tax prep companies may have violated taxpayer privacy laws by sharing taxpayer data with Big Tech firms: Under the law, “a tax return preparer may not disclose or use a taxpayer’s tax return information prior to obtaining a written consent from the taxpayer,” – and they failed to do so when it came to the information that was turned over to Meta and Google. Tax prep companies can also turn over data to “auxiliary service providers in connection with the preparation of a tax return.” But Meta and Google likely do not meet the definition of “auxiliary service providers” and the data sharing with Meta was for advertising purposes – not “in connection with the preparation of a tax return.”…(More)”.

Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology


Paper by Nikhil Agarwal, Alex Moehring, Pranav Rajpurkar & Tobias Salz: “While Artificial Intelligence (AI) algorithms have achieved performance levels comparable to human experts on various predictive tasks, human experts can still access valuable contextual information not yet incorporated into AI predictions. Humans assisted by AI predictions could outperform both human-alone or AI-alone. We conduct an experiment with professional radiologists that varies the availability of AI assistance and contextual information to study the effectiveness of human-AI collaboration and to investigate how to optimize it. Our findings reveal that (i) providing AI predictions does not uniformly increase diagnostic quality, and (ii) providing contextual information does increase quality. Radiologists do not fully capitalize on the potential gains from AI assistance because of large deviations from the benchmark Bayesian model with correct belief updating. The observed errors in belief updating can be explained by radiologists’ partially underweighting the AI’s information relative to their own and not accounting for the correlation between their own information and AI predictions. In light of these biases, we design a collaborative system between radiologists and AI. Our results demonstrate that, unless the documented mistakes can be corrected, the optimal solution involves assigning cases either to humans or to AI, but rarely to a human assisted by AI…(More)”.

How Good Are Privacy Guarantees? Platform Architecture and Violation of User Privacy


Paper by Daron Acemoglu, Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian & Asuman Ozdaglar: “Many platforms deploy data collected from users for a multitude of purposes. While some are beneficial to users, others are costly to their privacy. The presence of these privacy costs means that platforms may need to provide guarantees about how and to what extent user data will be harvested for activities such as targeted ads, individualized pricing, and sales to third parties. In this paper, we build a multi-stage model in which users decide whether to share their data based on privacy guarantees. We first introduce a novel mask-shuffle mechanism and prove it is Pareto optimal—meaning that it leaks the least about the users’ data for any given leakage about the underlying common parameter. We then show that under any mask-shuffle mechanism, there exists a unique equilibrium in which privacy guarantees balance privacy costs and utility gains from the pooling of user data for purposes such as assessment of health risks or product development. Paradoxically, we show that as users’ value of pooled data increases, the equilibrium of the game leads to lower user welfare. This is because platforms take advantage of this change to reduce privacy guarantees so much that user utility declines (whereas it would have increased with a given mechanism). Even more strikingly, we show that platforms have incentives to choose data architectures that systematically differ from those that are optimal from the user’s point of view. In particular, we identify a class of pivot mechanisms, linking individual privacy to choices by others, which platforms prefer to implement and which make users significantly worse off…(More)”.

Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data


Proceedings from the National Academies of Sciences: “Artificial intelligence (AI), facial recognition, and other advanced computational and statistical techniques are accelerating advancements in the life sciences and many other fields. However, these technologies and the scientific developments they enable also hold the potential for unintended harm and malicious exploitation. To examine these issues and to discuss practices for anticipating and preventing the misuse of advanced data analytics and biological data in a global context, the National Academies of Sciences, Engineering, and Medicine convened two virtual workshops on November 15, 2022, and February 9, 2023. The workshops engaged scientists from the United States, South Asia, and Southeast Asia through a series of presentations and scenario-based exercises to explore emerging applications and areas of research, their potential benefits, and the ethical issues and security risks that arise when AI applications are used in conjunction with biological data. This publication highlights the presentations and discussions of the workshops…(More)”.

Building the Democracy We Need for the Twenty-First Century


Toolkit by Hollie Russon Gilman, Grace Levin, and Jessica Tang: “This toolkit situates collaborative governance, also known as “co-governance,” within a framework for building community that sees civic education, relationship building, and leadership development as essential first steps toward an effective and sustained participatory process. It offers key takeaways and best practices from effective, ongoing collaborative governance projects between communities and decision makers. The best of these projects shift decision-making power to the hands of communities to make room for more deliberation, consensus, and lasting change. Building on the lessons of successful case studies from across the United States, including Georgia, Kentucky, New York, and Washington, this toolkit aims to support local leaders inside and outside government as they navigate and execute co-governance models in their communities…(More)”.

Shifting the Culture of Public Procurement Can Help Improve Digital Public Services


Article by Sarah Forland: “Advancing digital public infrastructure (DPI) that strengthens the delivery of public services requires shifting the culture around how governments design, develop, and implement digital solutions. At the foundation of this work, is public procurement—a unique and often overlooked avenue for improving digital public services…

To reconceptualize public procurement, stakeholders need to collaborate to improve shared accountability, build mutual trust, and create better outcomes for public service delivery. In October 2022, DIGI worked with experts across the field to identify five core opportunity areas for change and highlighted personal narratives with advice on how to get there, including insight from some of the panelist organizations and experts…

1. View procurement as part of the innovation process.

Rather than focusing primarily on risk-avoidance and compliance, public servants should integrate procurement into the innovation process. jurisdictions can adopt goal-oriented, modular contracting practices or performance-based contracts by fostering collaboration among various stakeholders. This approach allows for agile, iterative, and flexible solution development, placing emphasis on outcome-based solutions.

2. Start with the goal, then work toward the most effective solution, rather than prescribing a solution.

Jurisdictions can create an environment that encourages vendors to propose a variety of innovative solutions through request for proposals (RFP) that explicitly outlines objectives, success indicators, and potential failure points. This process can serve as a design exercise for vendors, enabling jurisdictions to select the proposal that most effectively aligns with their identified goals.

3. Center diversity, equity, inclusion, and access (DEIA) throughout procurement.

Delivering people-centered outcomes through civic solutions requires intentional DEIA practices. On the backend, this can include increasing RFP availability and access to new vendors—especially women- and minority-owned businesses. In addition, requiring human-centered design and community input can help ensure that those who will interact with a digital solution can do so effectively, easily, and safely…(More)”.

How to Regulate AI? Start With the Data


Article by Susan Ariel Aaronson: “We live in an era of data dichotomy. On one hand, AI developers rely on large data sets to “train” their systems about the world and respond to user questions. These data troves have become increasingly valuable and visible. On the other hand, despite the import of data, U.S. policy makers don’t view data governance as a vehicle to regulate AI.  

U.S. policy makers should reconsider that perspective. As an example, the European Union, and more than 30 other countries, provide their citizens with a right not to be subject to automated decision making without explicit consent. Data governance is clearly an effective way to regulate AI.

Many AI developers treat data as an afterthought, but how AI firms collect and use data can tell you a lot about the quality of the AI services they produce. Firms and researchers struggle to collect, classify, and label data sets that are large enough to reflect the real world, but then don’t adequately clean (remove anomalies or problematic data) and check their data. Also, few AI developers and deployers divulge information about the data they use to train AI systems. As a result, we don’t know if the data that underlies many prominent AI systems is complete, consistent, or accurate. We also don’t know where that data comes from (its provenance). Without such information, users don’t know if they should trust the results they obtain from AI. 

The Washington Post set out to document this problem. It collaborated with the Allen Institute for AI to examine Google’s C4 data set, a widely used and large learning model built on data scraped by bots from 15 million websites. Google then filters the data, but it understandably can’t filter the entire data set.  

Hence, this data set provides sufficient training data, but it also presents major risks for those firms or researchers who rely on it. Web scraping is generally legal in most countries as long as the scraped data isn’t used to cause harm to society, a firm, or an individual. But the Post found that the data set contained swaths of data from sites that sell pirated or counterfeit data, which the Federal Trade Commission views as harmful. Moreover, to be legal, the scraped data should not include personal data obtained without user consent or proprietary data obtained without firm permission. Yet the Post found large amounts of personal data in the data sets as well as some 200 million instances of copyrighted data denoted with the copyright symbol.

Reliance on scraped data sets presents other risks. Without careful examination of the data sets, the firms relying on that data and their clients cannot know if it contains incomplete or inaccurate data, which in turn could lead to problems of bias, propaganda, and misinformation. But researchers cannot check data accuracy without information about data provenance. Consequently, the firms that rely on such unverified data are creating some of the AI risks regulators hope to avoid. 

It makes sense for Congress to start with data as it seeks to govern AI. There are several steps Congress could take…(More)”.