Chinese artificial intelligence (AI) startup Zhipu AI, or Z.ai, has released an open-weight GLM-5.2 model that matches Anthropic’s controversial Mythos class-model in cybersecurity and software vulnerability identification tasks. Researchers who have tested and compared frontier models say the Chinese AI company continues to have a significant cost advantage, first illustrated by DeepSeek early last year.

American cybersecurity company Semgrep, using the IDOR (Insecure Direct Object Reference) benchmark that tests for a specific vulnerability where an application exposes an internal identifier such as a user ID without permission, noted that GLM-5.2 scored higher (39%) than Anthropic’s Claude Opus 4.6(32%) and Claude Opus 4.8/4.7 (28%).
The 744-billion-parameter Mixture-of-Experts (MoE) model proved surprisingly elite at reasoning through complex repository-scale code authorisation flaws. “Among models given the same minimal prompt and harness, GLM 5.2 a open-weight model, ⅙ the cost of a frontier LLM beat Claude Code at a genuinely difficult security research task,” the researchers note.
Z.ai had released GLM-5.2 earlier this month, and noted the optimisation for what the AI company called ‘long horizon tasks’, or agentic tasks, which relied less on more token usage.
“A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging,” the company had noted, at the time.
The cost advantage that Chinese AI models have continually exhibited since DeepSeek took the AI world by storm at the turn of last year, continues to remain an advantage. In terms of approximated token costs, Z.ai GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens—in comparison, Anthropic’s Claude Opus 4.8 will cost developers and enterprises around $5 and $25 for the same usage, respectively.
“GLM 5.2, with no scaffolding at all, beat Claude Code by seven points (39% vs. 32%). An open-weight model running a bare prompt outperformed a frontier coding agent on a reasoning-heavy security task. And it did so cheaply! At GLM 5.2’s pricing, the open-weight run cost roughly $0.17 per vulnerability found,” they add.
The other LLMs in this test include the MiniMax M3, Kimi K2.7 Code, OpenAI GPT-5.5 and DeepSeek v4.
That said, GLM isn’t always superior compared to other models from Anthropic and OpenAI in more general tasks. This however is a representation that Chinese AI models have systematically reduced the gap in average capabilities compared with other AI companies.
GLM-5.2 continues to rank among the 10 most-used AI models on AI marketplace OpenRouter’s LLM usage leaderboard, siting alongside models from Anthropic, Deepseek, Xiaomi and Tencent.
Unlike Anthropic’s Claude models or OpenAI’s GPT, open-weight models such as GLM-5.2 can be downloaded and modified, which means users can fine-tune them for specific tasks, operate them without relying on a commercial provider, and even remove safety guardrails. This will raise concerns about open models such as GLM being used to mount cybersecurity attacks, since threat actors will have equally powerful tools as the cyber defence apparatus.
Z.ai founder Jie Tang has already publicly voiced intent to have another open-source model that will directly compete with Anthropic’s Fable 5, the first “Mythos” model, before the end of this year.




