DeepSeek Unveils New AI Reasoning Method Ahead of Anticipated Model Launch

Chinese AI start-up DeepSeek has introduced a new technique to enhance the reasoning capabilities of large language models, as anticipation builds for the release of its next-generation model.

Credit: Andrey Rudakov/Bloomberg

In collaboration with researchers from Tsinghua University, DeepSeek developed a dual-method approach that combines generative reward modelling (GRM) and self-principled critique tuning. The method aims to guide AI models to deliver faster and more accurate responses aligned with human preferences.

The resulting DeepSeek-GRM models achieved competitive performance compared with existing public reward models, according to a paper published Friday on arXiv, an online scientific paper repository.

Reward modelling is a process used to align AI outputs with human values and expectations. DeepSeek’s new approach integrates this with a self-assessment mechanism to improve model reasoning.

The company plans to open source the GRM models, though no timeline has been provided.

The announcement comes amid growing speculation about the release of DeepSeek-R2, the successor to its R1 reasoning model. Reuters reported last month that the new model could launch as early as this month.

DeepSeek has not confirmed the release date. A customer service account reportedly denied the rumour in a group chat with business clients, according to Chinese media.

Founded in 2023 by entrepreneur Liang Wenfeng, DeepSeek has gained global attention for its cost-efficient models. Its R1 model was noted for rivaling leading AI systems in performance.

In March, the company upgraded its V3 model, DeepSeek-V3-0324, with improved reasoning, front-end web development capabilities, and enhanced Chinese writing proficiency.

In February, DeepSeek open-sourced five code repositories and pledged transparency in its development process. Liang also published a technical study on “native sparse attention,” a method to improve data processing efficiency in large language models.

Liang, who also founded DeepSeek’s parent company High-Flyer Quant, participated in a February symposium hosted by Chinese President Xi Jinping. The event highlighted DeepSeek as a symbol of China’s technological resilience amid US efforts to limit its AI development.

DeepSeek introduced a new AI reasoning method combining GRM and critique tuning
The GRM models outperformed existing public reward models
The company plans to open source the models but gave no timeline

Source: SCMP