Without appropriate metadata, data-sharing mandates are pointless


Article by Mark A. Musen: “Last month, the US government announced that research articles and most underlying data generated with federal funds should be made publicly available without cost, a policy to be implemented by the end of 2025. That’s atop other important moves. The European Union’s programme for science funding, Horizon Europe, already mandates that almost all data be FAIR (that is, findable, accessible, interoperable and reusable). The motivation behind such data-sharing policies is to make data more accessible so others can use them to both verify results and conduct further analyses.

But just getting those data sets online will not bring anticipated benefits: few data sets will really be FAIR, because most will be unfindable. What’s needed are policies and infrastructure to organize metadata.

Imagine having to search for publications on some topic — say, methods for carbon reclamation — but you could use only the article titles (no keywords, abstracts or search terms). That’s essentially the situation for finding data sets. If I wanted to identify all the deposited data related to carbon reclamation, the task would be futile. Current metadata often contain only administrative and organizational information, such as the name of the investigator and the date when the data were acquired.

What’s more, for scientific data to be useful to other researchers, metadata must sensibly and consistently communicate essentials of the experiments — what was measured, and under what conditions. As an investigator who builds technology to assist with data annotation, it’s frustrating that, in the majority of fields, the metadata standards needed to make data FAIR don’t even exist.

Metadata about data sets typically lack experiment-specific descriptors. If present, they’re sparse and idiosyncratic. An investigator searching the Gene Expression Omnibus (GEO), for example, might seek genomic data sets containing information on how a disease or condition manifests itself in young animals or humans. Performing such a search requires knowledge of how the age of individuals is represented — which in the GEO repository, could be age, AGE, age (after birth), age (years), Age (yr-old) or dozens of other possibilities. (Often, such information is missing from data sets altogether.) Because the metadata are so ad hoc, automated searches fail, and investigators waste enormous amounts of time manually sifting through records to locate relevant data sets, with no guarantee that most (or any) can be found…(More)”.

The New ADP National Employment Report


Press Release: “The new ADP National Employment Report (NER) launched today in collaboration with the Stanford Digital Economy Lab. Earlier this spring, the ADP Research Institute paused the NER in order to refine the methodology and design of the report. Part of that evolution was teaming up data scientists at the Stanford Digital Economy Lab to add a new perspective and rigor to the report. The new report uses fine-grained, high-frequency data on jobs and wages to deliver a richer and more useful analysis of the labor market.

Let’s take a look at some of the key changes with the new NER, along with the new ADP® Pay Insights Report.

It’s independent. The key change is that the new ADP NER is an independent measure of the US labor market, rather than a forecast of the BLS monthly jobs number. Jobs report and pay insights are based on anonymized and aggregated payroll data from more than 25 million US employees across 500,000 companies. The new report focuses solely on ADP’s clients and private-sector change…(More)”.

Measuring Small Business Dynamics and Employment with Private-Sector Real-Time Data


Paper by André Kurmann, Étienne Lalé and Lien Ta: “The COVID-19 pandemic has led to an explosion of research using private-sector datasets to measure business dynamics and employment in real-time. Yet questions remain about the representativeness of these datasets and how to distinguish business openings and closings from sample churn – i.e., sample entry of already operating businesses and sample exits of businesses that continue operating. This paper proposes new methods to address these issues and applies them to the case of Homebase, a real-time dataset of mostly small service-sector sector businesses that has been used extensively in the literature to study the effects of the pandemic. We match the Homebase establishment records with information on business activity from Safegraph, Google, and Facebook to assess the representativeness of the data and to estimate the probability of business closings and openings among sample exits and entries. We then exploit the high frequency / geographic detail of the data to study whether small service-sector businesses have been hit harder by the pandemic than larger firms, and the extent to which the Paycheck Protection Program (PPP) helped small businesses keep their workforce employed. We find that our real-time estimates of small business dynamics and employment during the pandemic are remarkably representative and closely fit population counterparts from administrative data that have recently become available. Distinguishing business closings and openings from sample churn is critical for these results. We also find that while employment by small businesses contracted more severely in the beginning of the pandemic than employment of larger businesses, it also recovered more strongly thereafter. In turn, our estimates suggests that the rapid rollout of PPP loans significantly mitigated the negative employment effects of the pandemic. Business closings and openings are a key driver for both results, thus underlining the importance of properly correcting for sample churn…(More)”.

Closing the Data Divide for a More Equitable U.S. Digital Economy


Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.

Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly


Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.

This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.

The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.

In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:

  1. Establish data governance processes and roles;
  2. Engage external communities;
  3. Ensure responsible use and privacy protection; and
  4. Evaluate resource constraints.

These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.

OSTP Issues Guidance to Make Federally Funded Research Freely Available Without Delay


The White House: “Today, the White House Office of Science and Technology Policy (OSTP) updated U.S. policy guidance to make the results of taxpayer-supported research immediately available to the American public at no cost. In a memorandum to federal departments and agencies, Dr. Alondra Nelson, the head of OSTP, delivered guidance for agencies to update their public access policies as soon as possible to make publications and research funded by taxpayers publicly accessible, without an embargo or cost. All agencies will fully implement updated policies, including ending the optional 12-month embargo, no later than December 31, 2025.

This policy will likely yield significant benefits on a number of key priorities for the American people, from environmental justice to cancer breakthroughs, and from game-changing clean energy technologies to protecting civil liberties in an automated world.

For years, President Biden has been committed to delivering policy based on the best available science, and to working to ensure the American people have access to the findings of that research. “Right now, you work for years to come up with a significant breakthrough, and if you do, you get to publish a paper in one of the top journals,” said then-Vice President Biden in remarks to the American Association for Cancer Research in 2016. “For anyone to get access to that publication, they have to pay hundreds, or even thousands, of dollars to subscribe to a single journal. And here’s the kicker — the journal owns the data for a year. The taxpayers fund $5 billion a year in cancer research every year, but once it’s published, nearly all of that taxpayer-funded research sits behind walls. Tell me how this is moving the process along more rapidly.” The new public access guidance was developed with the input of multiple federal agencies over the course of this year, to enable progress on a number of Biden-Harris Administration priorities.

“When research is widely available to other researchers and the public, it can save lives, provide policymakers with the tools to make critical decisions, and drive more equitable outcomes across every sector of society,” said Dr. Alondra Nelson, head of OSTP. “The American people fund tens of billions of dollars of cutting-edge research annually. There should be no delay or barrier between the American public and the returns on their investments in research.”..(More)“.

Design in the Civic Space: Generating Impact in City Government


Paper by Stephanie Wade and Jon Freach:” When design in the private sector is used as a catalyst for innovation it can produce insight into human experience, awareness of equitable and inequitable conditions, and clarity about needs and wants. But when we think of applying design in a government complex, the complicated nature of the civic arena means that public sector servants need to learn and apply design in ways that are specific to the complex ecosystem of long-standing social challenges they face, and learn new mindsets, methods, and ways of working that challenge established practices in a bureaucratic environment.

Design offers tools to help navigate the ambiguous boundaries of these complex problems and improve the city’s organizational culture so that it delivers better services to residents and the communities they live in. For the new practitioner in government, design can seem exciting, inspiring, hopeful, and fun because, over the past decade, it has quickly become a popular and novel way to approach city policy and service design. In the early part of the learning process, people often report that using design helps visualize their thoughts, spark meaningful dialogue, and find connections between problems, data, and ideas. But for some, when the going gets tough, when the ambiguity of overlapping and long-standing complex civic problems, a large number of stakeholders, causes, and effects begin to surface, design practices can seem ineffective, illogical, slow, confusing, and burdensome.
This paper will explore the highs and lows of using design in local government to help cities innovate. The authors, who have worked together to conceive, create, and deliver innovation training to over 100 global cities through multiple innovation programs, in the United States Federal Government, and in higher education, share examples from their fieldwork supported by the experiences of city staff members who have applied design methods in their jobs. Readers will discover how design works to catalyze innovative thinking in the public sector, reframe complex problems, center opportunities in resident needs, especially among those residents who have historically been excluded from government decision-making, make sensemaking a cultural norm and idea generation a ritual in otherwise traditional bureaucratic cultures, and work through the ambiguity of contemporary civic problems to generate measurable impact for residents. They will also learn why design sometimes fails to deliver its promise of innovation in government and see what happens when its language, mindsets, and tools make it hard for city innovation teams to adopt and apply…(More)”.

U.S. Government Effort to Tap Private Weather Data Moves Along Slowly


Article by Isabelle Bousquette: “The U.S. government’s six-year-old effort to improve its weather forecasting ability by purchasing data from private-sector satellite companies has started to show results, although the process is moving more slowly than anticipated.

After a period of testing, the National Oceanic and Atmospheric Administration, a scientific, service and regulatory arm of the Commerce Department, began purchasing data from two satellite companies, Spire Global Inc. of Vienna, Va., and GeoOptics Inc. of Pasadena, Calif.

The weather data from these two companies fills gaps in coverage left by NOAA’s own satellites, the agency said. NOAA also began testing data from a third company this year.

Beyond these companies, new entrants to the field offering weather data based on a broader range of technologies have been slow to emerge, the agency said.

“We’re getting a subset of what we hoped,” said Dan St. Jean, deputy director of the Office of System Architecture and Advanced Planning at NOAA’s Satellite and Information Service.

NOAA’s weather forecasts help the government formulate hurricane evacuation plans and make other important decisions. The agency began seeking out private sources of satellite weather data in 2016. The idea was to find a more cost-effective alternative to funding NOAA’s own satellite constellations, the agency said. It also hoped to seed competition and innovation in the private satellite sector.

It isn’t yet clear whether there is a cost benefit to using private data, in part because the relatively small number of competitors in the market has made it challenging to determine a steady market price, NOAA said.

“All the signs in the nascent ‘new space’ industry indicated that there would be a plethora of venture capitalists wanting to compete for NOAA’s commercial pilot/purchase dollars. But that just never materialized,” said Mr. St. Jean…(More)”.

Artificial intelligence was supposed to transform health care. It hasn’t.


Article by Ben Leonard and Ruth Reader: “Artificial intelligence is spreading into health care, often as software or a computer program capable of learning from large amounts of data and making predictions to guide care or help patients. | Seth Wenig/AP Photo

Investors see health care’s future as inextricably linked with artificial intelligence. That’s obvious from the cash pouring into AI-enabled digital health startups, including more than $3 billion in the first half of 2022 alone and nearly $10 billion in 2021, according to a Rock Health investment analysis commissioned by POLITICO.

And no wonder, considering the bold predictions technologists have made. At a conference in 2016, Geoffrey Hinton, British cognitive psychologist and “godfather” of AI, said radiologists would soon go the way of typesetters and bank tellers: “People should stop training radiologists now. It’s just completely obvious that, within five years, deep learning is going to do better.”

But more than five years since Hinton’s forecast, radiologists are still training to read image scans. Instead of replacing doctors, health system administrators now see AI as a tool clinicians will use to improve everything from their diagnoses to billing practices. AI hasn’t lived up to the hype, medical experts said, because health systems’ infrastructure isn’t ready for it yet. And the government is just beginning to grapple with its regulatory role.

“Companies come in promising the world and often don’t deliver,” said Bob Wachter, head of the department of medicine at the University of California, San Francisco. “When I look for examples of … true AI and machine learning that’s really making a difference, they’re pretty few and far between. It’s pretty underwhelming.”

Administrators say algorithms — the software that processes data — from outside companies don’t always work as advertised because each health system has its own technological framework. So hospitals are building out engineering teams and developing artificial intelligence and other technology tailored to their own needs.

But it’s slow going. Research based on job postings shows health care behind every industry except construction in adopting AI…(More)”.

Architectures of Participation


Essay by Jay Lloyd and Annalee Saxenian: “Silicon Valley’s dynamism during the final three decades of the twentieth century highlighted the singular importance of social and professional networks to innovation. Since that time, contemporary and historical case studies have corroborated the link between networks and the pace of technological change. These studies have shown that networks of networks, or ecosystems, that are characterized by a mix of collaboration and competition, can accelerate learning and problem-solving.

However, these insights about networks, collaboration, and ecosystems remain surprisingly absent from public debates about science and technology policy. Since the end of World War II, innovation policy has targeted economic inputs such as funding for basic scientific research and a highly skilled workforce (via education, training, and/or immigration), as well as support for commercialization of technology, investments in information technology, and free trade. Work on national systems of innovation, by contrast, seeks to define the optimal ensembles of institutions and policies. Alternatively, policy attention is focused on achieving efficiencies and scale by gaining control over value chains, especially in critical industries such as semiconductors. Antitrust advocates have attributed stalled technological innovation to monopolistic concentration among large firms, arguing that divestiture or regulation is necessary to reinvigorate competition and speed gains for society. These approaches ignore the lessons of network research, potentially threatening the very ecosystems that could unlock competitive advantages. For example, attempts to strengthen value chains risk cutting producers off from global networks, leaving them vulnerable to shifting markets and technology and weakening the wider ecosystem. Breaking up large platform firms may likewise undermine less visible internal interdependencies that support innovation, while doing nothing to encourage external collaboration. 

Networks of networks, or ecosystems, that are characterized by a mix of collaboration and competition, can accelerate learning and problem-solving.

How might the public sector promote and strengthen important network connections in a world of continuous flux? This essay reexamines innovation policy through the lens of the current era of cloud computing, arguing that the public sector has a regulatory role as well as a nurturing one to play in fostering innovation ecosystems…(More)”.