Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Today, Deepseek is one of the only leading artificial intelligence companies in China that does not rely on financing from technology giants such as Baidu, alibaba or bytedance.
According to Liang, when he formed the Deepseek research team, he was not looking for experienced engineers to build a consumer product. Instead, he focused on doctoral students from the best universities in China, including Beijing University and the University of Tsinghua, who were keen to prove themselves. Many of them were published in prestigious magazines and received prizes at international academic conferences, but they lacked industrial experience, according to what he mentioned Chinese technology publication QBitai.
“Our basic technical positions are mostly occupied by people who have graduated this year, during the past year or two years,” Liang told 36kr in 2023. The recruitment strategy helped to create a cooperative culture for the company, as employees had freedom to use abundant computer resources to follow unconventional research projects. It is a completely different way to work from existing internet companies in China, where the difference is often competing for resources. (Hadith example: She was accused by pretered former trainees– A prestigious academic award, not less – by sabotaging the work of his colleagues in order to store more computing resources for his team.)
Liang said students can be more suitable for high investment research and low profit. He explained that “most people, when they are young, can completely devote themselves to a task without utilitarian considerations.” And his presentation to potential employees was that Deepseek was created “to solve the most difficult questions in the world.”
Experts say that the fact that these young researchers have almost taught them in China to increase their motivation. “This young generation also embodies a sense of patriotism, especially as they move through American restrictions and suffocation points in important hardware and software technologies,” Zhang explains. “Their determination to overcome these barriers not only reflects their personal ambition, but also reflects a broader commitment to enhancing China’s position as a global leader in the field of innovation.”
In October 2022, the United States government began to put export controls that strongly prevented Chinese artificial intelligence companies from reaching advanced chips such as Nvidia’s H100. This step was made a problem for Deepseek. The company started with a stock of 10,000 H100 devices, but it needed more to compete with companies such as Openai and Meta. “The problem we face was not funding at all, but rather controlling exports of advanced chips,” Liang told 36 KR. In a second interview in 2024.
Deepseek had to reach more efficient ways to train their models. “They have improved their model’s structure using a group of engineering tricks – allocated communication plans, reduce the size of fields to save memory, and use the innovative use of the model of the models.” Analyst at the Mercury Institute for Chinese Studies. “Many of these methods are not new ideas, but the successful combination of them to produce an advanced model is a great achievement.”
Deepseek also made a great progress in the field of multi -headed attention (MLA) and expert mixture, two technical designs that make Deepseek more effective models by asking for lower computer resources for training. In fact, the latest Deepseek model is so effective that it requires the tenth of the META similar Llama 3.1 computer to train. According to the research institution EPOCHAI.
Deepseek’s desire to share these innovations with the audience has gained it a great deal of goodwill within the global artificial intelligence research community. For many Chinese artificial intelligence companies, the development of open source models is the only way to catch their western counterparts, because they attract more users and shareholders, which in turn helps to grow models. “They have now proven that advanced models can be built using less money, although they are still many, and that the current standards for building models leave a big room for improvement,” says Zhang. “We are sure that we will see more attempts in this direction in the future.”
This news may lead to problems with the current export controls in the United States, which focus on creating bottlenecks in computing resources. “The current estimates of the strength of computing in the field of artificial intelligence in China, and what you can achieve through this force can turn upside down,” says Zhang.