DeepSeek Unveils DeepSeek-R1 Which Beats OpenAI-o1 in Reasoning

By Huma Ishfaq ⏐ 5 months ago ⏐

5 min read

Deepseek Unveils Deepseek R1 Which Beats Openai O1 In Reasoning

Recently, the DeepSeek-V3 AI lab in China released new software, and now they’re back with DeepSeek-R1, a massive language model with robust reasoning capabilities. In areas such as mathematics, programming, and general knowledge, the new model is on par with OpenAI’s frontier model o1, due to its comparable mixture-of-experts architecture. Compared to o1, the DeepSeek-R1 supposedly costs 90–95 percent less.

One way to think of DeepSeek-R1 is as a humanoid artificial intelligence system that can answer your inquiries and solve various difficulties. A Chinese AI startup called DeepSeek has released a new open-source reasoning model. DeepSeek made headlines earlier this month with their free and open-source AI model DeepSeek-V3, which blew away competitors like Meta and OpenAI at a fraction of the cost.

What is DeepSeek-R1?

When it comes to improving AI systems’ analytical and problem-solving capacities, the latest model from DeepSeek is a state-of-the-art reasoning model. According to the study, the updated model uses DeepSeek-R1-Zero and DeepSeek-R1 as its basic components.

The DeepSeek-R1-Zero variant does not employ supervised fine-tuning during training; instead, it relies solely on reinforcement learning (RL). The DeepSeek-R1 expands upon the findings of the R1-Zero. It improves reasoning powers and readability with a cold-start phase that uses curated data and multi-stage RL.

How does the model work?

Over a variety of benchmarks, the DeepSeek-R1 has shown outstanding performance. Comparable to OpenAI’s o1 in mathematics (AIME 2024), the model achieved a score of 79.8 percent (Passs@1). At MATH-500, another math benchmark, the DeepSeek-R1 model outperformed the majority of its competitors with an accuracy rate of 93%.

The model achieved a rank in the 96th percentile of human participants on Codeforces, a coding benchmark. This further proves that the model can code at an expert level. In terms of General Knowledge, DeepSeek-R1 achieved an accuracy of 71.5% and a score of 90.8% on the MMLU and GPQA Diamond benchmarks, respectively. At the AlpacaEval 2.0 benchmark, which evaluates AI models’ ability to write and answer questions, DeepSeek-R1 had an impressive victory rate of 87.6%.

Due to its ability to solve complicated reasoning and mathematical problems, DeepSeek-R1 is an AI model that could be used for tutoring systems or advanced education. Software developers can take use of its strengths in code creation and debugging, due to its excellent coding benchmarks. Research can benefit from the model’s strengths in long-context understanding and question-answering.

R1 is able to avoid some of the common mistakes made by models because it is a reasoning model and hence effectively examines its own facts. When compared to a standard nonreasoning model, reasoning models typically take a few extra seconds to minutes to find solutions. One advantage is that they are often more accurate in fields like mathematics, science, and physics.

R1, according to DeepSeek’s technical analysis, has 671 billion parameters. In general, models with more parameters tend to outperform those with fewer, since parameters are a good proxy for a model’s ability to solve problems.

Even though R1 with 671 billion parameters is enormous, DeepSeek has also published “distilled” versions of the model with 1.5–70 billion parameters. You can run the smallest one on a laptop. The entire R1 is 90%-95% less expensive than OpenAI’s o1 and requires beefier hardware. You can get it through DeepSeek’s API.

R1 has a disadvantage. The Chinese internet authority will check its responses to make sure they “embody core socialist values” since it is a Chinese model. R1 won’t talk about topics like Tiananmen Square or Taiwan’s independence.

When asked about conjecture about the Xi Jinping regime or other subjects that could anger Chinese regulators, many Chinese AI systems (and other reasoning models) choose not to answer.

R1 came just days after the Biden administration, which was leaving office, suggested stricter export regulations and limitations on artificial intelligence technology for Chinese businesses. Companies in China will face even more stringent limitations on the semiconductor technology and models required to establish advanced AI systems if the new regulations are implemented in their current form.

Last week, OpenAI urged the U.S. government to support American AI research and development. In a policy paper, they expressed concerns that Chinese AI models could match or surpass their own. During an interview with The Information, Chris Lehane, OpenAI’s VP of policy, specifically highlighted High Flyer Capital Management, DeepSeek’s parent company, as a point of concern.

Chinese unicorn Moonshot AI, Alibaba, and DeepSeek are the three labs in China that have developed models that they claim compete with o1. (It should be noted that DeepSeek was the pioneer; in late November, it provided a preview of R1). Dean Ball, an artificial intelligence researcher at George Mason University, wrote on X that this tendency indicates Chinese AI labs will remain passive observers.

“The impressive performance of DeepSeek’s distilled models […] means that very capable reasoners will continue to proliferate widely and be runnable on local hardware,” Ball wrote, “far from the eyes of any top-down control regime.”

DeepSeek r1 takeaways for policy:
1. Chinese labs will likely continue to be fast followers in terms of reaching similar benchmark performance to US models.
2. The impressive performance of DeepSeek’s distilled models (smaller versions of r1) means that very capable reasoners…

— Dean W. Ball (@deanwball) January 20, 2025