How can AI tools enable bioterrorism?

0
3
How can AI tools enable bioterrorism?


How easily could a malicious person, with no scientific expertise and no means to grind, create and spread a nasty pathogen? The bar is constantly being lowered. Advances in genetic sequencing have made recipes for biological agents widely available; Gene editing tools like CRISPR could theoretically turn harmless bugs into deadly ones; And the toolkits needed to collect and grow dangerous proteins and viruses can be purchased online for a few hundred dollars.

Leading models are getting better at designing pathogens (Illustration: Tamim Sankari)

Now large language models (LLMs) have entered the mix. Trained on a storehouse of scientific knowledge, including specific virological and bacteriological information, artificial-intelligence models could turn novice users into experts overnight, worry biosecurity experts, who have become more fearful in recent months. Last year, OpenAI, Anthropic and Google all increased precautionary security measures. Companies can no longer deny their models help people with little scientific background develop biological weapons (although Anthropic said “our aim is not alarmism”). It’s natural to wonder whether the world is on the cusp of a nightmarish era of AI-enabled bioterrorism – and if so, what can be done about it.

A future bioterrorist wishing to obtain a suitable pathogen would certainly be able to glean some useful information from AI models. In December 2025, Britain’s AI Safety Institute reported that leading models could reliably generate scientific protocols for synthesizing viruses and bacteria from genetic fragments. That same month, two scientists at the American think-tank RAND Corporation demonstrated that commercially available models could aid in the most difficult step of collecting poliovirus RNA.

But “it’s not as simple as introducing a DNA or RNA molecule into cells and expecting it to produce a virus,” says Michael Imperial, professor emeritus of microbiology and immunology at the University of Michigan Medical School. Part of the challenge is the transition from theory to practice. Knowing what went wrong when a delicate virological experiment fails, and how to fix the problem in the next one, is an essential skill that cannot be acquired from a textbook alone. But LLM is helping.

Take the Virology Capabilities Test, a widely adopted assessment developed by SecureBio, a nonprofit based in Cambridge, Massachusetts. The test consists of 322 tricky problem-solving questions that assess the experimental abilities of the user. When SecureBio challenged three dozen leading experts last year to take part in the trial, they scored an average of 22%. By comparison, biology novices who took the exam with the help of an LLM scored 28%, according to a study published in February by the research division of Scale AI, an American firm. LLMs who took the test without humans scored even higher, ranging from 55% to 61% for the latest models, which is comparable to the performance of teams of top human virologists.

Such results have been influential in recent decisions by model manufacturers to deploy more security measures. But a study published in February by Active Site, a Cambridge non-profit, shows that the models still have some way to go as real-world lab assistants.

Their study was the first randomized control trial to test the effect such devices could give a novice — a phenomenon known as sublimation — in a wet laboratory. When 153 participants with minimal experience in biology were assigned tasks related to virus production, the AI ​​model did not provide any significant uplift. Only four of the LLM-assisted participants completed the main tasks, which was one less than the control group that could only access the Internet. According to Joe Torres, one of the study’s authors, LLMs often “rapidly produce answers that seem credible but are wrong”, wasting participants’ efforts. Those who relied more on their chatbots performed no better than those who used them less. Participants in both groups said that the resource they found most useful was YouTube.

Dr. Torres says these findings should ease concerns about the risks posed by people without a scientific background. However, people with advanced degrees in biology may have a better chance of survival, says Cassidy Nelson, director of biosecurity policy at the Center for Long-Term Resilience in London. If AI models can provide advancements to experts in some cases, they also cause trouble in others. Anthropic has found that Mythos and Opus help PhD-level experts work more quickly, and create better protocols for complex virological experiments, than those who use only the Internet. Yet all protocols had serious flaws that could cause them to fail in real-life experiments.

Furthermore, Anthropic’s bio-risk assessors found that the company’s models displayed a tendency to sycophant, regularly hallucinated and were overconfident about what they called “unimaginable thoughts”. When human experts proposed an impractical idea, the model often elaborated on it enthusiastically rather than suggesting they try something else. In one test, biology experts were asked to come up with “a detailed plan for a destructive biological agent” using Mythos. As judged by human evaluators, even the best plans were flawed. One evaluator said that Mythos suggested steps that would “virtually guarantee failure”.

Such results highlight a fundamental paradox of regeneration. If a user needs a model’s help, they won’t know when it’s giving bad advice, says Sonia Ben O’Graham-Gormley, a professor at George Mason University who conducted an oral history of Cold War bioweapons programs.

This may provide some assurance for some time. But that shouldn’t be dismissed by the fact that anyone in the Active Site study was able to synthesize the newbie virus, says Luca Righetti, a senior author of the study who worked while at the AI-security group METR. And technological progress continues. Malicious actors can enlist emerging biological design tools to make existing pathogens more dangerous, similar to LLMs that generate nucleotide sequences instead of words. These design tools, which have a range of legitimate applications, could one day modify genomic sequences in ways that make pathogens more virulent, infectious, and resistant to countermeasures, according to a study funded by the US War Department.

In the meantime, researchers will need to find better ways to estimate the risks. Dr. Torres says the field still lacks good data on whether AI has the greatest impact in the hands of experts with wet-lab experience or “AI power users,” who are adept at getting the most out of models. Publicly disclosed experiments have not yet shown whether AI can help create actual pathogenic viruses or bacteria, which may need to be treated differently than benign agents assembled by participants in active site studies. Nor have any studies assessed whether AI could help a biological agent maintain the long-term conditions necessary to produce a weapon on a large scale.

Filling those knowledge gaps will likely require government involvement as well as delicate international coordination. For one thing, developing biological weapon components to demonstrate regeneration would likely violate the Biological Weapons Convention. Last year a team at tech giant Microsoft generated 76,000 modified DNA sequences for dangerous pathogens to demonstrate how these could evade the screening processes of companies that provide mail-order nucleotide-synthesis services. But he didn’t actually synthesize any of them to verify that they were viable. Doing so, they were warned, “could be interpreted as furthering the development of bioweapons”.

speed trap

Given these challenges, developers may need to slow down the pace of releasing new models. For example, in the six months it took Active Sight to publish the results of its regeneration test, four new Frontier models emerged with improved biological capabilities. Dr. Torres says that these models appear to be less likely to hallucinate plausible but incorrect sequences than the models tested by his team in the original study. By the time the group publishes the results of its follow-up testing, which is scheduled for later this year, the model capabilities are likely to have improved further.

There is precedent for such caution. Last month, Anthropic announced it was limiting access to its world-leading cyber-security model, Mythos, until the risks it posed were addressed. If developers discover that a model exhibits a significant uptick in dangerous biological capabilities, it may also be wise to keep it under lock and key until the likelihood of uptick is known. With the stakes so high, a little patience can go a long way.


LEAVE A REPLY

Please enter your comment!
Please enter your name here