Thomas Wolf of Hugging Face on Dario's "On DeepSeek and Export Controls"
Download MP3Thomas Wolf of Hugging Face on Dario's on Deep
Seek and Export Control blog post,
as always read by the AOK voice clone,
finally took time to go over Dario's essay on Deep SEQ and export control
and to be honest it was quite painful to read.
And I say this as a great admirer of Anthropic and big user
of Claud Star. The first half of the essay reads
like a lengthy attempt to justify that closed source
models are still significantly ahead of Deep Seq.
However, it mostly refers to internal unpublished evals
which limit the credit you can give it, and statements like deepseek
v3 is close to SOTA
models and stronger on some very narrow tasks.
Transforming in a general conclusion, DeepSeek V3
is actually worse than those U.S. frontier models, let's say
by approximately two times on the scaling curve left me generally
doubtful. The same applies to the takeaway that all discoveries
and efficiency improvements of Deep SEQ have been discovered long ago by closed
models companies. This statement mostly resulting from a comparison of Deep
seq openly published 6 million training numbers with some vague
few 10 million on anthropic side. Without providing much more
details. I have no doubts that the anthropic team
is extremely talented and I've regularly shared how impressed I
AM with Sonnet 3.5, but this long
winded comparison of open research with vague
closed research and undisclosed evals has left me less convinced
of their lead than I was before reading it. Even more
frustrating was the second half of the essay, which dive into
the US China race scenario and totally misses the point
that the Deep SEQ model is open weights and largely
open knowledge due to its detailed tech report. And feel
free to follow Hugging Face's opener one reproduction project for
the remaining non public part, the synthetic data set.
If both Deep SEQ and anthropic models had been closed source,
yes, the ARM race interpretation could have make sense. But having one
of the model freely widely available for download and
with detailed scientific report renders the whole close source
ARM race competition argument artificial and unconvincing
in my opinion. Here's the thing. Open source
knows no border both in its usage and its creation.
Every company in the world, be it in Europe, Africa, South America or
the usa, can now directly download and use Deep SEQ without
sending data to a specific country, China for instance, or depending
on a specific company or server for running the core part of its technology.
And just like most open source libraries in the world are typically
built by contributors from all over the world,
we've already seen several hundred derivative models on
the Hugging Face hub created everywhere in the world by
teams adapting the original model to their specific use cases and
explorations. What's More, with the OpenR1
reproduction and the deep seq paper, the the coming months
will clearly see many open source reasoning models being released
by teams from all over the world. Just today, two other
teams, Allen AI in Seattle and Mistral
in Paris, both independently released open source base
models, Tulu and Small3,
which are already challenging the new state of the art with
Allen AI, indicating that its to Lou model surpasses the
performance of deep seq v3 and the scope is
even much broader than this geographical aspect.
Here is the thing, we don't talk nearly enough Open source
will be more and more essential for our safety as
AI becomes central to our lives,
resiliency will increasingly become a very important element of
this technology. Today we're dependent on Internet
access for almost everything. Without access to the Internet
we lose all our social media news feeds, can't order
a taxi, book a restaurant or reach someone on WhatsApp.
Now imagine an alternate world to ours where all the data
transiting through the Internet would have to go through a single company's
data centers. The day this company suffers a single outage,
the whole world would basically stop spinning. Picture the recent crowd strike
outage magnified a million fold. Soon as AI assistance
and AI technology permeate our whole life to
simplify many of our online and offline tasks,
we and companies using AI will start to depend
more on more on this technology for our daily activities
and we will similarly start to find annoying or even
painful any downtime in these AI assistants
from outages. The most optimal way to avoid future
downtime situations will be to build resilience
deep in our technological chain. Open source
has many advantages like shared training, costs,
tunability, control, ownership,
privacy. But one of its most fundamental virtues in the
long term as AI becomes deeply embedded in
our world will likely be its strong resilience.
It is one of the most straightforward and cost effective ways to easily distribute
compute across many independent providers and to even run models
locally and on device with minimal complexity.
More than national prides and competitions, I think it's time to
start thinking globally about the challenges and social
changes that AI will bring everywhere in the world. And open
source technology is likely our most important asset for safely transitioning
to a resilient digital future where AI is integrated
into all aspects of society. P.S.
claude is my default LLM for complex coding.
I also love its character with hesitations and pondering like a
prelude to the chain of thoughts of more recent reasoning. Models like Deep seek
generations.
