Without appropriate metadata, data-sharing mandates are pointless


Article by Mark A. Musen: “Last month, the US government announced that research articles and most underlying data generated with federal funds should be made publicly available without cost, a policy to be implemented by the end of 2025. That’s atop other important moves. The European Union’s programme for science funding, Horizon Europe, already mandates that almost all data be FAIR (that is, findable, accessible, interoperable and reusable). The motivation behind such data-sharing policies is to make data more accessible so others can use them to both verify results and conduct further analyses.

But just getting those data sets online will not bring anticipated benefits: few data sets will really be FAIR, because most will be unfindable. What’s needed are policies and infrastructure to organize metadata.

Imagine having to search for publications on some topic — say, methods for carbon reclamation — but you could use only the article titles (no keywords, abstracts or search terms). That’s essentially the situation for finding data sets. If I wanted to identify all the deposited data related to carbon reclamation, the task would be futile. Current metadata often contain only administrative and organizational information, such as the name of the investigator and the date when the data were acquired.

What’s more, for scientific data to be useful to other researchers, metadata must sensibly and consistently communicate essentials of the experiments — what was measured, and under what conditions. As an investigator who builds technology to assist with data annotation, it’s frustrating that, in the majority of fields, the metadata standards needed to make data FAIR don’t even exist.

Metadata about data sets typically lack experiment-specific descriptors. If present, they’re sparse and idiosyncratic. An investigator searching the Gene Expression Omnibus (GEO), for example, might seek genomic data sets containing information on how a disease or condition manifests itself in young animals or humans. Performing such a search requires knowledge of how the age of individuals is represented — which in the GEO repository, could be age, AGE, age (after birth), age (years), Age (yr-old) or dozens of other possibilities. (Often, such information is missing from data sets altogether.) Because the metadata are so ad hoc, automated searches fail, and investigators waste enormous amounts of time manually sifting through records to locate relevant data sets, with no guarantee that most (or any) can be found…(More)”.

The New ADP National Employment Report


Press Release: “The new ADP National Employment Report (NER) launched today in collaboration with the Stanford Digital Economy Lab. Earlier this spring, the ADP Research Institute paused the NER in order to refine the methodology and design of the report. Part of that evolution was teaming up data scientists at the Stanford Digital Economy Lab to add a new perspective and rigor to the report. The new report uses fine-grained, high-frequency data on jobs and wages to deliver a richer and more useful analysis of the labor market.

Let’s take a look at some of the key changes with the new NER, along with the new ADP® Pay Insights Report.

It’s independent. The key change is that the new ADP NER is an independent measure of the US labor market, rather than a forecast of the BLS monthly jobs number. Jobs report and pay insights are based on anonymized and aggregated payroll data from more than 25 million US employees across 500,000 companies. The new report focuses solely on ADP’s clients and private-sector change…(More)”.

Measuring Small Business Dynamics and Employment with Private-Sector Real-Time Data


Paper by André Kurmann, Étienne Lalé and Lien Ta: “The COVID-19 pandemic has led to an explosion of research using private-sector datasets to measure business dynamics and employment in real-time. Yet questions remain about the representativeness of these datasets and how to distinguish business openings and closings from sample churn – i.e., sample entry of already operating businesses and sample exits of businesses that continue operating. This paper proposes new methods to address these issues and applies them to the case of Homebase, a real-time dataset of mostly small service-sector sector businesses that has been used extensively in the literature to study the effects of the pandemic. We match the Homebase establishment records with information on business activity from Safegraph, Google, and Facebook to assess the representativeness of the data and to estimate the probability of business closings and openings among sample exits and entries. We then exploit the high frequency / geographic detail of the data to study whether small service-sector businesses have been hit harder by the pandemic than larger firms, and the extent to which the Paycheck Protection Program (PPP) helped small businesses keep their workforce employed. We find that our real-time estimates of small business dynamics and employment during the pandemic are remarkably representative and closely fit population counterparts from administrative data that have recently become available. Distinguishing business closings and openings from sample churn is critical for these results. We also find that while employment by small businesses contracted more severely in the beginning of the pandemic than employment of larger businesses, it also recovered more strongly thereafter. In turn, our estimates suggests that the rapid rollout of PPP loans significantly mitigated the negative employment effects of the pandemic. Business closings and openings are a key driver for both results, thus underlining the importance of properly correcting for sample churn…(More)”.

Closing the Data Divide for a More Equitable U.S. Digital Economy


Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.

Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly


Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.

This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.

The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.

In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:

  1. Establish data governance processes and roles;
  2. Engage external communities;
  3. Ensure responsible use and privacy protection; and
  4. Evaluate resource constraints.

These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.

OSTP Issues Guidance to Make Federally Funded Research Freely Available Without Delay


The White House: “Today, the White House Office of Science and Technology Policy (OSTP) updated U.S. policy guidance to make the results of taxpayer-supported research immediately available to the American public at no cost. In a memorandum to federal departments and agencies, Dr. Alondra Nelson, the head of OSTP, delivered guidance for agencies to update their public access policies as soon as possible to make publications and research funded by taxpayers publicly accessible, without an embargo or cost. All agencies will fully implement updated policies, including ending the optional 12-month embargo, no later than December 31, 2025.

This policy will likely yield significant benefits on a number of key priorities for the American people, from environmental justice to cancer breakthroughs, and from game-changing clean energy technologies to protecting civil liberties in an automated world.

For years, President Biden has been committed to delivering policy based on the best available science, and to working to ensure the American people have access to the findings of that research. “Right now, you work for years to come up with a significant breakthrough, and if you do, you get to publish a paper in one of the top journals,” said then-Vice President Biden in remarks to the American Association for Cancer Research in 2016. “For anyone to get access to that publication, they have to pay hundreds, or even thousands, of dollars to subscribe to a single journal. And here’s the kicker — the journal owns the data for a year. The taxpayers fund $5 billion a year in cancer research every year, but once it’s published, nearly all of that taxpayer-funded research sits behind walls. Tell me how this is moving the process along more rapidly.” The new public access guidance was developed with the input of multiple federal agencies over the course of this year, to enable progress on a number of Biden-Harris Administration priorities.

“When research is widely available to other researchers and the public, it can save lives, provide policymakers with the tools to make critical decisions, and drive more equitable outcomes across every sector of society,” said Dr. Alondra Nelson, head of OSTP. “The American people fund tens of billions of dollars of cutting-edge research annually. There should be no delay or barrier between the American public and the returns on their investments in research.”..(More)“.

Design in the Civic Space: Generating Impact in City Government


Paper by Stephanie Wade and Jon Freach:” When design in the private sector is used as a catalyst for innovation it can produce insight into human experience, awareness of equitable and inequitable conditions, and clarity about needs and wants. But when we think of applying design in a government complex, the complicated nature of the civic arena means that public sector servants need to learn and apply design in ways that are specific to the complex ecosystem of long-standing social challenges they face, and learn new mindsets, methods, and ways of working that challenge established practices in a bureaucratic environment.

Design offers tools to help navigate the ambiguous boundaries of these complex problems and improve the city’s organizational culture so that it delivers better services to residents and the communities they live in. For the new practitioner in government, design can seem exciting, inspiring, hopeful, and fun because, over the past decade, it has quickly become a popular and novel way to approach city policy and service design. In the early part of the learning process, people often report that using design helps visualize their thoughts, spark meaningful dialogue, and find connections between problems, data, and ideas. But for some, when the going gets tough, when the ambiguity of overlapping and long-standing complex civic problems, a large number of stakeholders, causes, and effects begin to surface, design practices can seem ineffective, illogical, slow, confusing, and burdensome.
This paper will explore the highs and lows of using design in local government to help cities innovate. The authors, who have worked together to conceive, create, and deliver innovation training to over 100 global cities through multiple innovation programs, in the United States Federal Government, and in higher education, share examples from their fieldwork supported by the experiences of city staff members who have applied design methods in their jobs. Readers will discover how design works to catalyze innovative thinking in the public sector, reframe complex problems, center opportunities in resident needs, especially among those residents who have historically been excluded from government decision-making, make sensemaking a cultural norm and idea generation a ritual in otherwise traditional bureaucratic cultures, and work through the ambiguity of contemporary civic problems to generate measurable impact for residents. They will also learn why design sometimes fails to deliver its promise of innovation in government and see what happens when its language, mindsets, and tools make it hard for city innovation teams to adopt and apply…(More)”.

U.S. Government Effort to Tap Private Weather Data Moves Along Slowly


Article by Isabelle Bousquette: “The U.S. government’s six-year-old effort to improve its weather forecasting ability by purchasing data from private-sector satellite companies has started to show results, although the process is moving more slowly than anticipated.

After a period of testing, the National Oceanic and Atmospheric Administration, a scientific, service and regulatory arm of the Commerce Department, began purchasing data from two satellite companies, Spire Global Inc. of Vienna, Va., and GeoOptics Inc. of Pasadena, Calif.

The weather data from these two companies fills gaps in coverage left by NOAA’s own satellites, the agency said. NOAA also began testing data from a third company this year.

Beyond these companies, new entrants to the field offering weather data based on a broader range of technologies have been slow to emerge, the agency said.

“We’re getting a subset of what we hoped,” said Dan St. Jean, deputy director of the Office of System Architecture and Advanced Planning at NOAA’s Satellite and Information Service.

NOAA’s weather forecasts help the government formulate hurricane evacuation plans and make other important decisions. The agency began seeking out private sources of satellite weather data in 2016. The idea was to find a more cost-effective alternative to funding NOAA’s own satellite constellations, the agency said. It also hoped to seed competition and innovation in the private satellite sector.

It isn’t yet clear whether there is a cost benefit to using private data, in part because the relatively small number of competitors in the market has made it challenging to determine a steady market price, NOAA said.

“All the signs in the nascent ‘new space’ industry indicated that there would be a plethora of venture capitalists wanting to compete for NOAA’s commercial pilot/purchase dollars. But that just never materialized,” said Mr. St. Jean…(More)”.

Architectures of Participation


Essay by Jay Lloyd and Annalee Saxenian: “Silicon Valley’s dynamism during the final three decades of the twentieth century highlighted the singular importance of social and professional networks to innovation. Since that time, contemporary and historical case studies have corroborated the link between networks and the pace of technological change. These studies have shown that networks of networks, or ecosystems, that are characterized by a mix of collaboration and competition, can accelerate learning and problem-solving.

However, these insights about networks, collaboration, and ecosystems remain surprisingly absent from public debates about science and technology policy. Since the end of World War II, innovation policy has targeted economic inputs such as funding for basic scientific research and a highly skilled workforce (via education, training, and/or immigration), as well as support for commercialization of technology, investments in information technology, and free trade. Work on national systems of innovation, by contrast, seeks to define the optimal ensembles of institutions and policies. Alternatively, policy attention is focused on achieving efficiencies and scale by gaining control over value chains, especially in critical industries such as semiconductors. Antitrust advocates have attributed stalled technological innovation to monopolistic concentration among large firms, arguing that divestiture or regulation is necessary to reinvigorate competition and speed gains for society. These approaches ignore the lessons of network research, potentially threatening the very ecosystems that could unlock competitive advantages. For example, attempts to strengthen value chains risk cutting producers off from global networks, leaving them vulnerable to shifting markets and technology and weakening the wider ecosystem. Breaking up large platform firms may likewise undermine less visible internal interdependencies that support innovation, while doing nothing to encourage external collaboration. 

Networks of networks, or ecosystems, that are characterized by a mix of collaboration and competition, can accelerate learning and problem-solving.

How might the public sector promote and strengthen important network connections in a world of continuous flux? This essay reexamines innovation policy through the lens of the current era of cloud computing, arguing that the public sector has a regulatory role as well as a nurturing one to play in fostering innovation ecosystems…(More)”.

Culver City, Calif., Uses AR to Showcase Stormwater Project


Article by Julia Edinger: “Culver City, Calif., and Trigger XR have teamed up to enhance a stormwater project by adding an interactive augmented reality experience.

Government agencies have been seeing the value of augmented and virtual reality for improved training and accessibility in recent years. Now, governments are launching innovative projects to help educate and engage residents — from a project in Charlotte, N.C., that revives razed Black neighborhoods to efforts to animate parks in Buffalo, N.Y., and Fairfax, Va.

For Culver City, an infrastructure project’s signage will bring the project to life with an augmented reality experience that educates the public on both the project itself and the city’s history…

…as is the case with many infrastructure projects, a big portion of the action would happen out of sight, motivating the project team to include “interpretive signage” that explains the purpose of the project through an interactive, virtual experience, Sean Singletary, the city’s senior civil engineer, explained in a written response…

The AR experience will soon be available for visitors, who will be able to learn about the project by reading the information on the signs — printed in both Spanish and English — or by scanning the QR code to get deeper.

There are six different “experiences” in augmented reality that users can participate in. In one experience, users can visualize the stormwater project that exists beneath their feet or watch images of the city’s history float past them as if they were walking through a museum. Another features a turtle that is native to Ballona Creek, which will swim around users as informational text boxes about the turtle’s history and keeping the creek clean pop up to enhance the experience…(More)”.