A Prehistory of Social Media


Essay by Kevin Driscoll: “Over the past few years, I’ve asked dozens of college students to write down, in a sentence or two, where the internet came from. Year after year, they recount the same stories about the US government, Silicon Valley, the military, and the threat of nuclear war. A few students mention the Department of Defense’s ARPANET by name. Several get the chronology wrong, placing the World Wide Web before the internet or expressing confusion about the invention of email. Others mention “tech wizards” or “geniuses” from Silicon Valley firms and university labs. No fewer than four students have simply written, “Bill Gates.”

Despite the internet’s staggering scale and global reach, its folk histories are surprisingly narrow. This mismatch reflects the uncertain definition of “the internet.” When nonexperts look for internet origin stories, they want to know about the internet as they know it, the internet they carry around in their pockets, the internet they turn to, day after day. Yet the internet of today is not a stable object with a single, coherent history. It is a dynamic socio-technical phenomenon that came into being during the 1990s, at the intersection of hundreds of regional, national, commercial, and cooperative networks—only one of which was previously known as “the internet.” In short, the best-known histories describe an internet that hasn’t existed since 1994. So why do my students continue to repeat stories from 25 years ago? Why haven’t our histories kept up?

The standard account of internet history took shape in the early 1990s, as a mixture of commercial online services, university networks, and local community networks mutated into something bigger, more commercial, and more accessible to the general public. As hype began to build around the “information superhighway,” people wanted a backstory. In countless magazines, TV news reports, and how-to books, the origin of the internet was traced back to ARPANET, the computer network created by the Advanced Research Projects Agency during the Cold War. This founding mythology has become a resource for advancing arguments on issues related to censorship, national sovereignty, cybersecurity, privacy, net neutrality, copyright, and more. But with only this narrow history of the early internet to rely on, the arguments put forth are similarly impoverished…(More)”.

Cloud labs and remote research aren’t the future of science – they’re here


Article by Tom Ireland: “Cloud labs mean anybody, anywhere can conduct experiments by remote control, using nothing more than their web browser. Experiments are programmed through a subscription-based online interface – software then coordinates robots and automated scientific instruments to perform the experiment and process the data. Friday night is Emerald’s busiest time of the week, as scientists schedule experiments to run while they relax with their families over the weekend.

There are still some things robots can’t do, for example lifting giant carboys (containers for liquids) or unwrapping samples sent by mail, and there are a few instruments that just can’t be automated. Hence the people in blue coats, who look a little like pickers in an Amazon warehouse. It turns out that they are, in fact, mostly former Amazon employees.

Plugging an experiment into a browser forces researchers to translate the exact details of every step into unambiguous code

Emerald originally employed scientists and lab technicians to help the facility run smoothly, but they were creatively stifled with so little to do. Poaching Amazon employees has turned out to be an improvement. “We pay them twice what they were getting at Amazon to do something way more fulfilling than stuffing toilet paper into boxes,” says Frezza. “You’re keeping someone’s drug-discovery experiment running at full speed.”

Further south in the San Francisco Bay Area are two more cloud labs, run by the company Strateos. Racks of gleaming life science instruments – incubators, mixers, mass spectrometers, PCR machines – sit humming inside large Perspex boxes known as workcells. The setup is arguably even more futuristic than at Emerald. Here, reagents and samples whizz to the correct workcell on hi-tech magnetic conveyor belts and are gently loaded into place by dextrous robot arms. Researchers’ experiments are “delocalised”, as Strateos’s executive director of operations, Marc Siladi, puts it…(More)”.

A journey toward an open data culture through transformation of shared data into a data resource


Paper by Scott D. Kahn and Anne Koralova: “The transition to open data practices is straightforward albeit surprisingly challenging to implement largely due to cultural and policy issues. A general data sharing framework is presented along with two case studies that highlight these challenges and offer practical solutions that can be adjusted depending on the type of data collected, the country in which the study is initiated, and the prevailing research culture. Embracing the constraints imposed by data privacy considerations, especially for biomedical data, must be emphasized for data outside of the United States until data privacy law(s) are established at the Federal and/or State level…(More).”

The Technology Fallacy


Book by Gerald C. Kane, Anh Nguyen Phillips, Jonathan R. Copulsky and Garth R. Andrus on “How People Are the Real Key to Digital Transformation:..

Digital technologies are disrupting organizations of every size and shape, leaving managers scrambling to find a technology fix that will help their organizations compete. This book offers managers and business leaders a guide for surviving digital disruptions—but it is not a book about technology. It is about the organizational changes required to harness the power of technology. The authors argue that digital disruption is primarily about people and that effective digital transformation involves changes to organizational dynamics and how work gets done. A focus only on selecting and implementing the right digital technologies is not likely to lead to success. The best way to respond to digital disruption is by changing the company culture to be more agile, risk tolerant, and experimental.

The authors draw on four years of research, conducted in partnership with MIT Sloan Management Review and Deloitte, surveying more than 16,000 people and conducting interviews with managers at such companies as Walmart, Google, and Salesforce. They introduce the concept of digital maturity—the ability to take advantage of opportunities offered by the new technology—and address the specifics of digital transformation, including cultivating a digital environment, enabling intentional collaboration, and fostering an experimental mindset. Every organization needs to understand its “digital DNA” in order to stop “doing digital” and start “being digital.”

Digital disruption won’t end anytime soon; the average worker will probably experience numerous waves of disruption during the course of a career. The insights offered by The Technology Fallacy will hold true through them all….(More)”.

Smart Streetlights are Casting a Long Shadow Over Our Cities


Article by Zhile Xie: “This is not a surveillance system—nobody is watching it 24 hours a day,” said Erik Caldwell, director of economic development in San Diego, in an interview where he was asked if the wide deployment of “smart” streetlights had turned San Diego into a surveillance city. Innocuous at first glance, this statement demonstrates the pernicious impact of artificial intelligence on new “smart” streetlight systems. As Caldwell suggests, a central human vision is important for the streetlight to function as a surveillance instrument. However, the lack of human supervision only suggests its enhanced capacity. Smart sensors are able to process and communicate environmental information that does not present itself in a visual format and does not rely on human interpretation. On the one hand, they reinforce streetlights’ function as a surveillance instrument, historically associated with light and visibility. On the other hand, in tandem with a wide range of sensors embedded in our everyday environment, they also enable for-profit data extraction on a vast scale,  under the auspices of a partnership between local governments and tech corporations. 

The streetlight was originally designed as a surveillance device and has been refined to that end ever since then. Its association with surveillance and security can be found as early as 400 BC. Citizens of Ancient Rome started to install an oil lamp in front of every villa to prevent tripping or thefts, and an enslaved person would be designated to watch the lamp—lighting was already paired with the notion of control through slavery. As Wolfgang Schivelbusch has detailed in his book Disenchanted Light, street lighting also emerged in medieval European cities alongside practices of policing. Only designated watchmen who carried a torch and a weapon were allowed to be out on the street. This ancient connection between security and visibility has been the basis of the wide deployment of streetlights in modern cities. Moreover, as Edwin Heathcote has explained in a recent article for the Architectural Review, gas streetlights were first introduced to Paris during Baron Haussmann’s restructuring of the city between 1853 and 1870, which was designed in part to prevent revolutionary uprisings. The invention of electric light bulbs in the late nineteenth century in Europe triggered new fears and imaginations around the use of streetlights for social control. For instance, in his 1894 dystopian novel The Land of the Changing Sun, W.N. Harben envisions an electric-optical device that makes possible 24-hour surveillance over the entire population of an isolated country, Alpha. The telescopic system is aided by an artificial “sun” that lights up the atmosphere all year round, along with networked observatories across the land that capture images of their surroundings, which are transmitted to a “throne room” for inspection by the king and police…(More)”.

Without appropriate metadata, data-sharing mandates are pointless


Article by Mark A. Musen: “Last month, the US government announced that research articles and most underlying data generated with federal funds should be made publicly available without cost, a policy to be implemented by the end of 2025. That’s atop other important moves. The European Union’s programme for science funding, Horizon Europe, already mandates that almost all data be FAIR (that is, findable, accessible, interoperable and reusable). The motivation behind such data-sharing policies is to make data more accessible so others can use them to both verify results and conduct further analyses.

But just getting those data sets online will not bring anticipated benefits: few data sets will really be FAIR, because most will be unfindable. What’s needed are policies and infrastructure to organize metadata.

Imagine having to search for publications on some topic — say, methods for carbon reclamation — but you could use only the article titles (no keywords, abstracts or search terms). That’s essentially the situation for finding data sets. If I wanted to identify all the deposited data related to carbon reclamation, the task would be futile. Current metadata often contain only administrative and organizational information, such as the name of the investigator and the date when the data were acquired.

What’s more, for scientific data to be useful to other researchers, metadata must sensibly and consistently communicate essentials of the experiments — what was measured, and under what conditions. As an investigator who builds technology to assist with data annotation, it’s frustrating that, in the majority of fields, the metadata standards needed to make data FAIR don’t even exist.

Metadata about data sets typically lack experiment-specific descriptors. If present, they’re sparse and idiosyncratic. An investigator searching the Gene Expression Omnibus (GEO), for example, might seek genomic data sets containing information on how a disease or condition manifests itself in young animals or humans. Performing such a search requires knowledge of how the age of individuals is represented — which in the GEO repository, could be age, AGE, age (after birth), age (years), Age (yr-old) or dozens of other possibilities. (Often, such information is missing from data sets altogether.) Because the metadata are so ad hoc, automated searches fail, and investigators waste enormous amounts of time manually sifting through records to locate relevant data sets, with no guarantee that most (or any) can be found…(More)”.

The New ADP National Employment Report


Press Release: “The new ADP National Employment Report (NER) launched today in collaboration with the Stanford Digital Economy Lab. Earlier this spring, the ADP Research Institute paused the NER in order to refine the methodology and design of the report. Part of that evolution was teaming up data scientists at the Stanford Digital Economy Lab to add a new perspective and rigor to the report. The new report uses fine-grained, high-frequency data on jobs and wages to deliver a richer and more useful analysis of the labor market.

Let’s take a look at some of the key changes with the new NER, along with the new ADP® Pay Insights Report.

It’s independent. The key change is that the new ADP NER is an independent measure of the US labor market, rather than a forecast of the BLS monthly jobs number. Jobs report and pay insights are based on anonymized and aggregated payroll data from more than 25 million US employees across 500,000 companies. The new report focuses solely on ADP’s clients and private-sector change…(More)”.

Measuring Small Business Dynamics and Employment with Private-Sector Real-Time Data


Paper by André Kurmann, Étienne Lalé and Lien Ta: “The COVID-19 pandemic has led to an explosion of research using private-sector datasets to measure business dynamics and employment in real-time. Yet questions remain about the representativeness of these datasets and how to distinguish business openings and closings from sample churn – i.e., sample entry of already operating businesses and sample exits of businesses that continue operating. This paper proposes new methods to address these issues and applies them to the case of Homebase, a real-time dataset of mostly small service-sector sector businesses that has been used extensively in the literature to study the effects of the pandemic. We match the Homebase establishment records with information on business activity from Safegraph, Google, and Facebook to assess the representativeness of the data and to estimate the probability of business closings and openings among sample exits and entries. We then exploit the high frequency / geographic detail of the data to study whether small service-sector businesses have been hit harder by the pandemic than larger firms, and the extent to which the Paycheck Protection Program (PPP) helped small businesses keep their workforce employed. We find that our real-time estimates of small business dynamics and employment during the pandemic are remarkably representative and closely fit population counterparts from administrative data that have recently become available. Distinguishing business closings and openings from sample churn is critical for these results. We also find that while employment by small businesses contracted more severely in the beginning of the pandemic than employment of larger businesses, it also recovered more strongly thereafter. In turn, our estimates suggests that the rapid rollout of PPP loans significantly mitigated the negative employment effects of the pandemic. Business closings and openings are a key driver for both results, thus underlining the importance of properly correcting for sample churn…(More)”.

Closing the Data Divide for a More Equitable U.S. Digital Economy


Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.

Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly


Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.

This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.

The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.

In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:

  1. Establish data governance processes and roles;
  2. Engage external communities;
  3. Ensure responsible use and privacy protection; and
  4. Evaluate resource constraints.

These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.