Will artificial intelligence soon escape human control? (HT Tech)

0
4
Will artificial intelligence soon escape human control? (HT Tech)


When Anthropic, an artificial-intelligence lab, debuts on stock markets later this year, it is likely to be one of the largest. initial public offering In history. That’s because the company’s cloud chatbot is a darling of coders, who are willing to pay a lot for access. Since the launch of its software-engineering agent Cloud Code in February 2025, it has become indispensable for many human developers around the world. This also includes Anthropic’s own code: the company says more than four-fifths of the code published in May was written by Cloud. Before Cloud Code launched, the percentage was “low single-digits”.

The latest generation of AI models are so capable of coders, engineers, and (soon) scientists that many worry they may be among the last models ever created by humans (PEXEL)

The system has improved the quality as well as quantity of output. An impressive benchmark from think-tank METR shows that as early as 2025, Anthropic’s models could complete tasks that take human engineers a little less than an hour. The company’s latest systems can complete tasks that would otherwise take more than one working day.

And so when the company, at the top of its game and outpacing the competition, asks the world for “the option to slow or temporarily halt frontier AI development,” as it did on June 5, it might be easy to raise skeptical eyebrows. What market leader wouldn’t want its competition to stop trying to catch up?

Yet Anthropic’s leaders, who have been worried for years about the possibility of AI out of control While wreaking havoc, they seem honest. The latest generation of AI models are so capable of coders, engineers, and (soon) scientists that many worry they may be among the last models created by humans. Anthropic co-founder Jack Clark believes there is a 60% chance that, by the end of 2028, an AI system will be able to create its own successor without any human involvement.

That moment will mark the beginning of a process called “recursive self-improvement” (RSI), a closed loop. Version one of a model generates version two, which is faster and more capable; Version two begets version three, which is again even more. The loop continues, and improvements increase with each iteration. Build such a capable AI system, and your human engineers will never need to build another system again. “What may seem like a fantasy story to many people may be a real trend,” says Mr Clark.

No one knows for sure what the results of RSI will be. Because AI, unlike humans, can work tirelessly and continuously, some people think that this will lead to a superintelligent AI in a short time – a “fast take-off”. (It has also been onomatopoeically referred to as “going foam”, the sound of which makes one imagine an explosion of air). AI doomsayers fear that superintelligence will be beyond human control, and the beginning of the RSI is the moment when the fate of humanity is handed over to machines. Yet a self-improving AI will likely face speed limitations, at least initially.

Building models capable of RSI will require automating a range of specialist tasks currently performed by humans. Currently data scientists work on the theory of AI and coders put it into practice. Systems engineers build the foundation on which toy models can be scaled up to production scale. Others look for new sources of training data, or experiment with ways to generate it anew. Alignment and safety teams check that anything that comes out of the training process will not cause harm, intentionally or otherwise.

Not all of those teams are equally amenable to AI assistance, and within each specialization some tasks are more automated than others. It won’t be too long until a human coder can do his or her job without writing a single line of computer code, but it may be some time until an AI is able to interact with a previously-unordered collection of scientific papers. It is not always clear how the “jagged border” will proceed. Designing new algorithms seemed to be one of the safer tasks, until Alphavolve, one of Google DeepMind’s models, started doing it in May 2025. It proposed changes to the way Google spreads workloads across its data centers, which saved 0.7% of the company’s worldwide computing power, and better ways to perform matrix multiplication, which increased the training of Gemini, the company’s flagship large language model (LLM), by 1%.

For complete RSI every task in this series needs to be automated. However, an AI-driven acceleration in research and development (R&D) may be felt even earlier. “As the fraction of AI R&D performed by AI systems increases, the increase in productivity over human-only R&D could increase tenfold, then a hundredfold, then a thousandfold,” according to a report published in January by the Center for Security and Emerging Technology (CSET), a Georgetown University think-tank. In that scenario, it warns that even if some aspects of AI R&D may initially be difficult to automate, “the accelerated rate of progress means those barriers will soon be overcome.”

joy of repetition

Today no AI model can create its own successor. But larger AI models can create smaller models on their own. With human help they can also create other large AI models.

Earlier this year Andrzej Karpathy, then an independent researcher who now works for Anthropic, trained a chatbot known as GPT-2, a large language model created by OpenAI in 2019. It took 168 hours of training to build the model on 32 state-of-the-art chips at the time; Dr. Karpathy achieved the same result in just three hours, using a computer with eight GPUs, specialized chips used to create AI. With a few more months of work they reduced the training time for their model, NanoChat, to just over two hours.

In March, they handed over the task of speeding up the training process to an AI agent called AutoResearch. Within two days the training time dropped to one hour 48 minutes and after five days it dropped to one hour 39 minutes. “I didn’t touch anything,” says Dr. Karpathy. The 18% improvement in human performance is surprising because Dr. Karpathy is a particularly brilliant human being: he was a founding member of the research team at OpenAI and head of AI at Tesla for five years.

The reforms themselves were tepid. The AI ​​agent chose better starting values ​​for the training run, expanded the scope of the LLM’s “attention” window, and noticed that the model’s attention was wandering. None are particularly new, says Dr. Karpathy. But he had missed them. “They stacked up and really improved the nanoChat,” he says.

This type of movement is inevitable as models become more capable. Much of the work of building terabyte-sized frontier models is less lucrative than the huge salaries and fancy offices of the AI ​​industry. This involves piecing together layers of infrastructure stack that are purchased from third parties, debugging the hardware and software set-up, and tweaking the “hyperparameters”, the initial set-up of a training run, until the results look solid. Today an AI system can do a lot with less supervision.

But even more subtle intellectual work is nearing automation, says Joe Spisak, a researcher at Reflection AI, a New York-based laboratory that is building frontier models that are open-weighted (meaning their parameters are publicly released). Give a Frontier System a rough sketch of an idea to gain efficiency, and it is able to design an experiment, run tests on a toy model, see what works, and respond with a plan that is ready to be implemented at scale.

AI models can complete tasks like these that would take humans several hours, in about 30 minutes. Increasingly, humans simply play the role of research director, piloting AI to run experiments that the models code, debug, optimize, and monitor themselves. The increase in productivity is fascinating, but also worrying. As humans’ role in the production process decreases, they may lose control. The end result could be models trained by the models to achieve goals set by the models, the security of which is verified only by the models.

Some people are afraid of some calamity. Max Tegmark, a physicist and machine-learning researcher at the Massachusetts Institute of Technology who has devoted much of the last decade to campaigning for AI safety, compares it to a driver hitting the accelerator with his eyes closed on a motorway. The result will be certain destruction, unless the driver refuses to open his eyes, he said in an upcoming edition of The Economist’s “Inside Tech” video show. Professor Tegmark offers a number of scenarios in which things could go wrong: powerful AI systems could compete with humans as decision-makers in government and commerce, leaving humanity vulnerable; They could give supreme power to whoever builds them first, ushering in global totalitarianism; Or they may stop caring about humanity at all, and slowly squeeze people out to make room for more data centers and power generation.

Three years ago, Professor Tegmark led a call for a pause in global AI development, arguing that building the then-state-of-the-art GPT-4 was tantamount to traveling blindfolded. This year’s CSET report warned that the systems created by RSI “pose extreme risks. This requires early action now.” It seems Anthropic is now close to agreeing with that prescription.

hot Chip

There are also a number of physical constraints that, for the time being, will impose limits on the speed at which models can improve themselves. Most important is access to computation. Despite gains in efficiency, new models continue to use more computing power for training than their predecessors, slowing advances in data-center development.

Helen Toner, interim executive director of CSET and lead author of its recent report, says consumer use of AI could also slow down AI-driven R&D. The limited capacity in AI data centers needs to be carefully divided between serving paying customers, training future models, and conducting open-end R&D. The greater the demand in the first category, the less capacity there will be for the other two in the short term.

Then there is the issue of training data. Recent advances in AI have been in areas where models can teach themselves how to succeed thanks to “verifiable rewards”. A piece of software either runs or doesn’t run; Whether any mathematical proof is correct or not. In such cases, synthetic data, which is generated entirely by the model to train other models, can be checked for accuracy and added to the training data without risking the degradation that typically comes with training an AI on its own outputs. It is more difficult to make a model better in creative writing or legal judgment. If models need to learn from the real world, this may also limit the reach of self-improvement.

“Closing the loop” can be a step on the path to superintelligence and – depending on your disposition – utopia or destruction. But this is not the only step towards rapidly increasing AI capabilities.


LEAVE A REPLY

Please enter your comment!
Please enter your name here