Thomas Wolf of Hugging Face on Dario's "On DeepSeek and Export Controls"

Download MP3

Thomas Wolf of Hugging Face on Dario's on Deep

Seek and Export Control blog post,

as always read by the AOK voice clone,

finally took time to go over Dario's essay on Deep SEQ and export control

and to be honest it was quite painful to read.

And I say this as a great admirer of Anthropic and big user

of Claud Star. The first half of the essay reads

like a lengthy attempt to justify that closed source

models are still significantly ahead of Deep Seq.

However, it mostly refers to internal unpublished evals

which limit the credit you can give it, and statements like deepseek

v3 is close to SOTA

models and stronger on some very narrow tasks.

Transforming in a general conclusion, DeepSeek V3

is actually worse than those U.S. frontier models, let's say

by approximately two times on the scaling curve left me generally

doubtful. The same applies to the takeaway that all discoveries

and efficiency improvements of Deep SEQ have been discovered long ago by closed

models companies. This statement mostly resulting from a comparison of Deep

seq openly published 6 million training numbers with some vague

few 10 million on anthropic side. Without providing much more

details. I have no doubts that the anthropic team

is extremely talented and I've regularly shared how impressed I

AM with Sonnet 3.5, but this long

winded comparison of open research with vague

closed research and undisclosed evals has left me less convinced

of their lead than I was before reading it. Even more

frustrating was the second half of the essay, which dive into

the US China race scenario and totally misses the point

that the Deep SEQ model is open weights and largely

open knowledge due to its detailed tech report. And feel

free to follow Hugging Face's opener one reproduction project for

the remaining non public part, the synthetic data set.

If both Deep SEQ and anthropic models had been closed source,

yes, the ARM race interpretation could have make sense. But having one

of the model freely widely available for download and

with detailed scientific report renders the whole close source

ARM race competition argument artificial and unconvincing

in my opinion. Here's the thing. Open source

knows no border both in its usage and its creation.

Every company in the world, be it in Europe, Africa, South America or

the usa, can now directly download and use Deep SEQ without

sending data to a specific country, China for instance, or depending

on a specific company or server for running the core part of its technology.

And just like most open source libraries in the world are typically

built by contributors from all over the world,

we've already seen several hundred derivative models on

the Hugging Face hub created everywhere in the world by

teams adapting the original model to their specific use cases and

explorations. What's More, with the OpenR1

reproduction and the deep seq paper, the the coming months

will clearly see many open source reasoning models being released

by teams from all over the world. Just today, two other

teams, Allen AI in Seattle and Mistral

in Paris, both independently released open source base

models, Tulu and Small3,

which are already challenging the new state of the art with

Allen AI, indicating that its to Lou model surpasses the

performance of deep seq v3 and the scope is

even much broader than this geographical aspect.

Here is the thing, we don't talk nearly enough Open source

will be more and more essential for our safety as

AI becomes central to our lives,

resiliency will increasingly become a very important element of

this technology. Today we're dependent on Internet

access for almost everything. Without access to the Internet

we lose all our social media news feeds, can't order

a taxi, book a restaurant or reach someone on WhatsApp.

Now imagine an alternate world to ours where all the data

transiting through the Internet would have to go through a single company's

data centers. The day this company suffers a single outage,

the whole world would basically stop spinning. Picture the recent crowd strike

outage magnified a million fold. Soon as AI assistance

and AI technology permeate our whole life to

simplify many of our online and offline tasks,

we and companies using AI will start to depend

more on more on this technology for our daily activities

and we will similarly start to find annoying or even

painful any downtime in these AI assistants

from outages. The most optimal way to avoid future

downtime situations will be to build resilience

deep in our technological chain. Open source

has many advantages like shared training, costs,

tunability, control, ownership,

privacy. But one of its most fundamental virtues in the

long term as AI becomes deeply embedded in

our world will likely be its strong resilience.

It is one of the most straightforward and cost effective ways to easily distribute

compute across many independent providers and to even run models

locally and on device with minimal complexity.

More than national prides and competitions, I think it's time to

start thinking globally about the challenges and social

changes that AI will bring everywhere in the world. And open

source technology is likely our most important asset for safely transitioning

to a resilient digital future where AI is integrated

into all aspects of society. P.S.

claude is my default LLM for complex coding.

I also love its character with hesitations and pondering like a

prelude to the chain of thoughts of more recent reasoning. Models like Deep seek

generations.

Creators and Guests

A-OK
Host
A-OK
An infinite number of monkeys on an infinite number of "typewriters" banging away on keys in hopes of getting the next-token predictor agent going.
Thomas Wolf of Hugging Face on Dario's "On DeepSeek and Export Controls"
Broadcast by