Is Claude’s Coding Crown Slipping?
The latest leaderboard from the large language model arena is out, and DeepSeek’s new R1 model has snagged the top spot in web programming, edging out Claude Opus 4.
This is significant, given that Claude Opus 4 has been widely considered the “world’s most powerful coding model.”
So, what’s the deal with DeepSeek R1-0528, the model that managed to dethrone Claude Opus 4 in coding?
At first glance, the name might suggest a minor update. In reality, that’s far from the case.
Its performance on LiveCodeBench puts it nearly on par with OpenAI’s o3-high, leading some industry watchers to speculate it might be the rumored R2.
Clearly, both of these models are forces to be reckoned with in the coding arena!
But enough talk – let’s get hands-on with DeepSeek R1-0528 to see what it can really do.
Let’s dive in.
The DeepSeek R1-0528 is now available on its official website and through dedicated apps.
We started with a web-based experience.
Test 1: Creating a Solar System Animation App
The prompt:
“Create an animated application of the solar system, using web search.”
After just 49 seconds of deliberation, DeepSeek R1-0528 produced the code for a Python program.
Running this code in VS Code resulted in a functional, albeit somewhat rudimentary, animation.
However, when given more detailed instructions, the results significantly improved.
The prompt: “Use Three.js to simulate the solar system, displaying planet names when hovering over them.”
In just 34 seconds, DeepSeek R1-0528 articulated the design and coding, producing a far more dynamic result.
This iteration was animated, interactive, and frankly, on another level.
Test 2: Front-End Web Development
Next, we tasked DeepSeek with creating a website thematically focused on AGI, using the following prompt:
“Design a webpage centered on Artificial General Intelligence (AGI), incorporating ‘Knowledge Sharing,’ ‘Community,’ and ‘Future Creation’ concepts. Each section should include corresponding icons and concise descriptions, with a modern, tech-forward design that highlights the innovation and collaborative spirit of AGI. Employ HTML, CSS, and JavaScript to implement interactive features and visual effects.”
After 23 seconds, DeepSeek R1-0528 generated a block of HTML code that was immediately functional.
Test 3: A Tetris Game
Finally, we tested the model’s capabilities with a prompt in English:
“Create a full-featured version of Tetris with beautiful graphics and controls.”
DeepSeek R1-0528 came up with a Python-based response in a swift 12 seconds.
And here’s what the result looked like:
While the code did produce a version of the Tetris game, it was riddled with bugs and severely lacking in interactive elements.
Deciding to see if the model could improve on the initial attempt, we prompted it to try again… only to be disappointed.
The revised version of the game still failed to run properly (with pieces frequently going through walls) and didn’t incorporate the interactive controls we requested.
To summarize: DeepSeek’s new R1, as an open-source model, shows significant gains in coding. However, it has room for improvement.
On the plus side it’s more accessible for the average user.
One More Thing…
Beyond its programming strengths, the DeepSeek R1 has also been recognized as the current top-performing open-source text model.
It holds the sixth overall spot under the MIT license, and is the leading open source model.
In specific subfields, it ranks fourth in “hard prompts” and fifth in mathematics – making the model a serious contender among open-source options.
But here’s an interesting development: Kimi’s new model recently secured a state-of-the-art ranking in open-source code.
The open-source code model, Kimi-Dev, parameters of just 72B, achieved a 60.4% score on SWE-bench Verified to obtain SOTA, surpassing the current best scores.
That means it outperforms even the latest DeepSeek R1 in programming and, moreover, shows impressive results compared to models of the proprietary type.
The future is uncertain (doge)!
Original article, Author: Tobias. If you wish to reprint this article, please indicate the source:https://aicnbc.com/2687.html