
Liang Wenfeng, once the head of a quantitative hedge fund in China, transformed his career by betting on artificial intelligence research. He invested considerable resources, acquiring 10,000 Nvidia chips and assembling a team of brilliant young researchers. This bold project led, two years later, to the launch of DeepSeek, an initiative that quickly captured the attention of the global tech industry.
On January 20, the DeepSeek lab launched an open-source model that has attracted the attention of Silicon Valley. According to a company document, DeepSeek-R1 outperforms leading models like OpenAI in mathematics and reasoning, posing a serious challenge to Western AI giants.
DeepSeek's success illustrates an unexpected effect of US technological restrictions. With limited access to advanced chips, many Chinese companies have focused on applications rather than basic models. However, DeepSeek has taken a different approach: improving the architecture of its AI models to more efficiently utilize limited resources.
“DeepSeek excels at software-driven resource optimization, an approach that fosters collaborative innovation,” says Marina Zhang, associate professor at the University of Technology Sydney.
The Origin of DeepSeek
DeepSeek began as Fire-Flyer, a research arm of High-Flyer, a successful Chinese quantitative hedge fund. Founded in 2015, High-Flyer accumulated GPUs for financial analytics until Liang decided in 2023 to redirect those resources to building advanced AI models.
DeepSeek has adopted a unique recruitment strategy, targeting young graduates from prestigious universities like Beijing and Tsinghua University, eager to demonstrate their worth. "We chose researchers with no industry experience but with an innovative mindset," said Liang.
Innovation under Pressure
US restrictions on advanced chips have pushed DeepSeek to develop more efficient training methods. "They've optimized inter-chip communication and implemented mix-of-model strategies," says Wendy Chang of the Mercator Institute for China Studies. Their latest model is so efficient that it required only a tenth of the computing power needed to train Meta's Llama 3.1.
Sharing DeepSeek's innovations openly has strengthened its global reputation. "They demonstrate that advanced models can be built with fewer resources by optimizing training methods," Chang concludes.
US restrictions may prove ineffective in containing the advancement of Chinese AI, as alternative strategies like DeepSeek's are emerging with success.
Source Wired