The Risks of Empowering “Citizen Data Scientists”


Article by Reid Blackman and Tamara Sipes: “Until recently, the prevailing understanding of artificial intelligence (AI) and its subset machine learning (ML) was that expert data scientists and AI engineers were the only people that could push AI strategy and implementation forward. That was a reasonable view. After all, data science generally, and AI in particular, is a technical field requiring, among other things, expertise that requires many years of education and training to obtain.

Fast forward to today, however, and the conventional wisdom is rapidly changing. The advent of “auto-ML” — software that provides methods and processes for creating machine learning code — has led to calls to “democratize” data science and AI. The idea is that these tools enable organizations to invite and leverage non-data scientists — say, domain data experts, team members very familiar with the business processes, or heads of various business units — to propel their AI efforts.

In theory, making data science and AI more accessible to non-data scientists (including technologists who are not data scientists) can make a lot of business sense. Centralized and siloed data science units can fail to appreciate the vast array of data the organization has and the business problems that it can solve, particularly with multinational organizations with hundreds or thousands of business units distributed across several continents. Moreover, those in the weeds of business units know the data they have, the problems they’re trying to solve, and can, with training, see how that data can be leveraged to solve those problems. The opportunities are significant.

In short, with great business insight, augmented with auto-ML, can come great analytic responsibility. At the same time, we cannot forget that data science and AI are, in fact, very difficult, and there’s a very long journey from having data to solving a problem. In this article, we’ll lay out the pros and cons of integrating citizen data scientists into your AI strategy and suggest methods for optimizing success and minimizing risks…(More)”.

The Department of Everything


Article by Stephen Akey: “How do you find the life expectancy of a California condor? Google it. Or the gross national product of Morocco? Google it. Or the final resting place of Tom Paine? Google it. There was a time, however—not all that long ago—when you couldn’t Google it or ask Siri or whatever cyber equivalent comes next. You had to do it the hard way—by consulting reference books, indexes, catalogs, almanacs, statistical abstracts, and myriad other printed sources. Or you could save yourself all that time and trouble by taking the easiest available shortcut: You could call me.

From 1984 to 1988, I worked in the Telephone Reference Division of the Brooklyn Public Library. My seven or eight colleagues and I spent the days (and nights) answering exactly such questions. Our callers were as various as New York City itself: copyeditors, fact checkers, game show aspirants, journalists, bill collectors, bet settlers, police detectives, students and teachers, the idly curious, the lonely and loquacious, the park bench crazies, the nervously apprehensive. (This last category comprised many anxious patients about to undergo surgery who called us for background checks on their doctors.) There were telephone reference divisions in libraries all over the country, but this being New York City, we were an unusually large one with an unusually heavy volume of calls. And if I may say so, we were one of the best. More than one caller told me that we were a legend in the world of New York magazine publishing…(More)”.

10 profound answers about the math behind AI


Article by Ethan Siegel: “Why do machines learn? Even in the recent past, this would have been a ridiculous question, as machines — i.e., computers — were only capable of executing whatever instructions a human programmer had programmed into them. With the rise of generative AI, or artificial intelligence, however, machines truly appear to be gifted with the ability to learn, refining their answers based on continued interactions with both human and non-human users. Large language model-based artificial intelligence programs, such as ChatGPT, Claude, Gemini and more, are now so widespread that they’re replacing traditional tools, including Google searches, in applications all across the world.

How did this come to be? How did we so swiftly come to live in an era where many of us are happy to turn over aspects of our lives that traditionally needed a human expert to a computer program? From financial to medical decisions, from quantum systems to protein folding, and from sorting data to finding signals in a sea of noise, many programs that leverage artificial intelligence (AI) and machine learning (ML) are far superior at these tasks compared with even the greatest human experts.

In his new book, Why Machines Learn: The Elegant Math Behind Modern AI, science writer Anil Ananthaswamy explores all of these aspects and more. I was fortunate enough to get to do a question-and-answer interview with him, and here are the 10 most profound responses he was generous enough to give….(More)”

The MAGA Plan to End Free Weather Reports


Article by Zoë Schlanger: “In the United States, as in most other countries, weather forecasts are a freely accessible government amenity. The National Weather Service issues alerts and predictions, warning of hurricanes and excessive heat and rainfall, all at the total cost to American taxpayers of roughly $4 per person per year. Anyone with a TV, smartphone, radio, or newspaper can know what tomorrow’s weather will look like, whether a hurricane is heading toward their town, or if a drought has been forecast for the next season. Even if they get that news from a privately owned app or TV station, much of the underlying weather data are courtesy of meteorologists working for the federal government.

Charging for popular services that were previously free isn’t generally a winning political strategy. But hard-right policy makers appear poised to try to do just that should Republicans gain power in the next term. Project 2025—a nearly 900-page book of policy proposals published by the conservative think tank the Heritage Foundation—states that an incoming administration should all but dissolve the National Oceanic and Atmospheric Administration, under which the National Weather Service operates….NOAA “should be dismantled and many of its functions eliminated, sent to other agencies, privatized, or placed under the control of states and territories,” Project 2025 reads. … “The preponderance of its climate-change research should be disbanded,” the document says. It further notes that scientific agencies such as NOAA are “vulnerable to obstructionism of an Administration’s aims,” so appointees should be screened to ensure that their views are “wholly in sync” with the president’s…(More)”.

Gen AI: too much spend, too little benefit?


Article by Jason Koebler: “Investment giant Goldman Sachs published a research paper about the economic viability of generative AI which notes that there is “little to show for” the huge amount of spending on generative AI infrastructure and questions “whether this large spend will ever pay off in terms of AI benefits and returns.” 

The paper, called “Gen AI: too much spend, too little benefit?” is based on a series of interviews with Goldman Sachs economists and researchers, MIT professor Daron Acemoglu, and infrastructure experts. The paper ultimately questions whether generative AI will ever become the transformative technology that Silicon Valley and large portions of the stock market are currently betting on, but says investors may continue to get rich anyway. “Despite these concerns and constraints, we still see room for the AI theme to run, either because AI starts to deliver on its promise, or because bubbles take a long time to burst,” the paper notes. 

Goldman Sachs researchers also say that AI optimism is driving large growth in stocks like Nvidia and other S&P 500 companies (the largest companies in the stock market), but say that the stock price gains we’ve seen are based on the assumption that generative AI is going to lead to higher productivity (which necessarily means automation, layoffs, lower labor costs, and higher efficiency). These stock gains are already baked in, Goldman Sachs argues in the paper: “Although the productivity pick-up that AI promises could benefit equities via higher profit growth, we find that stocks often anticipate higher productivity growth before it materializes, raising the risk of overpaying. And using our new long-term return forecasting framework, we find that a very favorable AI scenario may be required for the S&P 500 to deliver above-average returns in the coming decade.”…(More)

Doing science backwards


Article by Stuart Ritchie: “…Usually, the process of publishing such a study would look like this: you run the study; you write it up as a paper; you submit it to a journal; the journal gets some other scientists to peer-review it; it gets published – or if it doesn’t, you either discard it, or send it off to a different journal and the whole process starts again.

That’s standard operating procedure. But it shouldn’t be. Think about the job of the peer-reviewer: when they start their work, they’re handed a full-fledged paper, reporting on a study and a statistical analysis that happened at some point in the past. It’s all now done and, if not fully dusted, then in a pretty final-looking form.

What can the reviewer do? They can check the analysis makes sense, sure; they can recommend new analyses are done; they can even, in extreme cases, make the original authors go off and collect some entirely new data in a further study – maybe the data the authors originally presented just aren’t convincing or don’t represent a proper test of the hypothesis.

Ronald Fisher described the study-first, review-later process in 1938:

To consult the statistician [or, in our case, peer-reviewer] after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

Clearly this isn’t the optimal, most efficient way to do science. Why don’t we review the statistics and design of a study right at the beginning of the process, rather than at the end?

This is where Registered Reports come in. They’re a new (well, new-ish) way of publishing papers where, before you go to the lab, or wherever you’re collecting data, you write down your plan for your study and send it off for peer-review. The reviewers can then give you genuinely constructive criticism – you can literally construct your experiment differently depending on their suggestions. You build consensus—between you, the reviewers, and the journal editor—on the method of the study. And then, once everyone agrees on what a good study of this question would look like, you go off and do it. The key part is that, at this point, the journal agrees to publish your study, regardless of what the results might eventually look like…(More)”.

Enhancing human mobility research with open and standardized datasets


Article by Takahiro Yabe et al: “The proliferation of large-scale, passively collected location data from mobile devices has enabled researchers to gain valuable insights into various societal phenomena. In particular, research into the science of human mobility has become increasingly critical thanks to its interdisciplinary effects in various fields, including urban planning, transportation engineering, public health, disaster management, and economic analysis. Researchers in the computational social science, complex systems, and behavioral science communities have used such granular mobility data to uncover universal laws and theories governing individual and collective human behavior. Moreover, computer science researchers have focused on developing computational and machine learning models capable of predicting complex behavior patterns in urban environments. Prominent papers include pattern-based and deep learning approaches to next-location prediction and physics-inspired approaches to flow prediction and generation.

Regardless of the research problem of interest, human mobility datasets often come with substantial limitations. Existing publicly available datasets are often small, limited to specific transport modes, or geographically restricted, owing to the lack of open-source and large-scale human mobility datasets caused by privacy concerns…(More)”.

AI-Ready FAIR Data: Accelerating Science through Responsible AI and Data Stewardship


Article by Sean Hill: “Imagine a future where scientific discovery is unbound by the limitations of data accessibility and interoperability. In this future, researchers across all disciplines — from biology and chemistry to astronomy and social sciences — can seamlessly access, integrate, and analyze vast datasets with the assistance of advanced artificial intelligence (AI). This world is one where AI-ready data empowers scientists to unravel complex problems at unprecedented speeds, leading to breakthroughs in medicine, environmental conservation, technology, and more. The vision of a truly FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready data ecosystem, underpinned by Responsible AI (RAI) practices and the pivotal role of data stewards, promises to revolutionize the way science is conducted, fostering an era of rapid innovation and global collaboration…(More)”.

The societal impact of Open Science: a scoping review


Report by Nicki Lisa Cole, Eva Kormann, Thomas Klebel, Simon Apartis and Tony Ross-Hellauer: “Open Science (OS) aims, in part, to drive greater societal impact of academic research. Government, funder and institutional policies state that it should further democratize research and increase learning and awareness, evidence-based policy-making, the relevance of research to society’s problems, and public trust in research. Yet, measuring the societal impact of OS has proven challenging and synthesized evidence of it is lacking. This study fills this gap by systematically scoping the existing evidence of societal impact driven by OS and its various aspects, including Citizen Science (CS), Open Access (OA), Open/FAIR Data (OFD), Open Code/Software and others. Using the PRISMA Extension for Scoping Reviews and searches conducted in Web of Science, Scopus and relevant grey literature, we identified 196 studies that contain evidence of societal impact. The majority concern CS, with some focused on OA, and only a few addressing other aspects. Key areas of impact found are education and awareness, climate and environment, and social engagement. We found no literature documenting evidence of the societal impact of OFD and limited evidence of societal impact in terms of policy, health, and trust in academic research. Our findings demonstrate a critical need for additional evidence and suggest practical and policy implications…(More)”.

Real Chaos, Today! Are Randomized Controlled Trials a good way to do economics?


Article by Maia Mindel: “A few weeks back, there was much social media drama about this a paper titled: “Social Media and Job Market Success: A Field Experiment on Twitter” (2024) by Jingyi Qiu, Yan Chen, Alain Cohn, and Alvin Roth (recipient of the 2012 Nobel Prize in Economics). The study posted job market papers by economics PhDs, and then assigned prominent economists (who had volunteered) to randomly promote half of them on their profiles(more detail on this paper in a bit).

The “drama” in question was generally: “it is immoral to throw dice around on the most important aspect of a young economist’s career”, versus “no it’s not”. This, of course, awakened interest in a broader subject: Randomized Controlled Trials, or RCTs.

R.C.T. T.O. G.O.

Let’s go back to the 1600s – bloodletting was a common way to cure diseases. Did it work? Well, doctor Joan Baptista van Helmont had an idea: randomly divvy up a few hundred invalids into two groups, one of which got bloodletting applied, and another one that didn’t.

While it’s not clear this experiment ever happened, it sets up the basic principle of the randomized control trial: the idea here is that, to study the effects of a treatment, (in a medical context, a medicine; in an economics context, a policy), a sample group is divided between two: the control group, which does not receive any treatment, and the treatment group, which does. The modern randomized controlled (or control) trial has three “legs”: it’s randomized because who’s in each group gets chosen at random, it’s controlled because there’s a group that doesn’t get the treatment to serve as a counterfactual, and it’s a trial because you’re not developing “at scale” just yet.

Why could it be important to randomly select people for economic studies? Well, you want the only difference, on average, between the two groups to be whether or not they get the treatment. Consider military service: it’s regularly trotted out that drafting kids would reduce crime rates. Is this true? Well, the average person who is exempted from the draft could be, systematically, different than the average person who isn’t – for example, people who volunteer could be from wealthier families who are more patriotic, or poorer families who need certain benefits; or they could have physical disabilities that impede their labor market participation, or wealthier university students who get a deferral. But because many countries use lotteries to allocate draftees versus non draftees, you can get a group of people who are randomly assigned to the draft, and who on average should be similar enough to each other. One study in particular, about Argentina’s mandatory military service in pretty much all of the 20th century, finds that being conscripted raises the crime rate relative to people who didn’t get drafted through the lottery. This doesn’t mean that soldiers have higher crime rates than non soldiers, because of selection issues – but it does provide pretty good evidence that getting drafted is not good for your non-criminal prospects…(More)”.