Disclaimer: The views and opinions expressed in this blog are entirely my own and do not necessarily reflect the views of my current or any previous employer. This blog may also contain links to other websites or resources. I am not responsible for the content on those external sites or any changes that may occur after the publication of my posts.
End Disclaimer
I wasn’t going to write anything on DeepSeek because everybody already has, but I’ve been asked enough times by people since the release that I’ll put my thoughts down in writing. (First mistake)
Everyone is giving you their 2 cents. I’m going to give you 1 cent more for a total of My 3 Cents (future autobiography title).
Everything about this story seems excessive, so I thought I’d add to it (1 cent more).
Background
On January 10th, 2025, a year-old Chinese startup called DeepSeek released a chatbot based on their large language model called DeepSeek-R1. This, in and of itself, was not news- AI companies have been releasing versions of their models at some frequency for a while now. What was news, however, was that the model cost a reported ~$5-$6 million dollars to build and was roughly on par with Open AI’s state of the art reasoning model -o1. The $6 million dollars represented, by some accounts, a 50x reduction in the cost of building a state of the art LLM. Things percolated for a week or two, but it’s important to remember that things for the most part were really quiet. People seemed to be more or less digesting the information and recognizing it as an impressive improvement in model methodology, training, and cost reduction.
But then, all of a sudden…
Sunday 26th, the DeepSeek app is the 1st or 2nd most downloaded free app in the iOS store in the US.
That Sunday night, the futures markets are down, various tech pundits are calling this AI’s “Sputnik moment”, and the bellwether of all AI stocks, Nvidia, is down 11% in the premarket.
The markets were freaking out. This is it everybody- NVDA finally cracked. It’s over.
NVDA would end the day that Monday, down ~18%, a loss of 590B dollars in market cap. The single biggest one day loss of any company…in history.
The sheer level of reaction to DeepSeek has been amazing.
Hyperbole.
Hysteria.
DeepSeek is impressive. Among the purported improvements is showing that pure reinforcement learning (RL), without initial supervised learning, can develop sophisticated reasoning capabilities, with further improvements possible through strategic use of minimal supervised data.
The reaction to the model’s release may be even more telling than its ostensible advancements.
The magnitude of this reaction- seems a bit like Buffett’s, “only when the tide goes out do you discover who's been swimming naked.”, and also the “pay no attention to the man behind the curtain scene” in the Wizard of Oz.
Some takeaways…
Nobody Knows Anything
Nobody saw this coming. No insiders, no one with their ear to the ground, no friends and family.
Nobody. (e.g.- I know I now have at least one thing in common with Sam Altman.)
The announcement comes after the $500B Stargate (Oracle, OpenAI, Softbank) press conference fanfare, geared toward AI infrastructure and compute investments.
DeepSeek comes out shortly thereafter and shows that a company can compete with state of the art models at some significant-seeming discount to the going expense rate.
Purposeful timing (probably) of the announcement by DeepSeek.
So for the people working at Open AI on GPT 4.345 or whatever version they’re on, and the researchers at Anthropic on Claude 3.875, etc.- are they thinking this is par for the course in terms of a predicted(expected) movement along the AI cost reduction curve, or is this more of an existential dread moment?
Nobody knows.
Sudden change happens.
Things fall apart. The falcon cannot hear the falconer. The (data) centre cannot hold.(I’ll see myself out).
You know the drill.
Great for the AI community and open source
I used to have this thought while I was commuting while on the train or subway, looking around at all the people giving 20% attention to either an app on their phone, fixing their makeup, or doing that half-asleep thing where their head lurches back up as they start to doze.
All that disparate, unfocused energy.
What if those people took the 17 or 35 or 95 minutes of their commute and collectively worked on something together?
What hard problems could all those humans solve?
That’s what the open source AI community is doing right now- the amount of work that’s come out in the time since DeepSeek’s release is amazing and equally deserving of slapping on some hyperbole.
The collective wisdom of the crowds, throughout the world, no breaks, no sleep, working 24/7- iterating on every other person’s work.
The compounding effect this has on the rate of change is incredible and difficult to comprehend. Does it nudge us closer to AGI? No idea, but open source(the human collective, not opened sourced(weighted) models by private and public companies- looking at you too Zuck) has always been the white hats in in this story. We’ll need them going forward. They are the John Connor resistance to Skynet (hyperbolic, yes, but why stop now?)
Not so good for current companies working on foundational models
If your currently business model is selling access to your LLMs and someone comes out with a model that is, say 80% as good, for free- that goes a long way towards destabilizing your business model.
Going to be hard for Open AI and Anthropic, et al, not to cut prices for their models. Silver lining is that it should foster more innovation in US to catch up, and differentiate in order to de-commoditize their pricing.
Lots of Questions Remain
Here’s some off the top of my head, rapid fire, that I’ll each answer with one highly insufficient word:
Did DeepSeek distill ( a student model learns from the output of the teacher model) their model from Open AI (against their terms of service) and/or from open source models like Llama or Mistral? Yes
Did DeepSeek outright steal swaths of GPT-o1? (people have been comparing the output from incorrect answers on each and they’re really similar) Maybe
Was NVDA ~20% drop an overreaction? Companies still need GPUs. Scaling still matters. Progress will still require millions of chips. Yes (Disclaimer- this should in no way be perceived as investment advice. I’m a troglodyte.)
Did DeepSeek train their models using only H800’s? (A slower chip that NVDA has been selling to Chinese companies due to the US throttling Chinese access to high end chips) . No If so- isn’t that a good thing, relatively speaking, for companies possessing more powerful chips? Probably
Was Deep Seek really built for so much less than current US company models? No To quote Anthropic’s CEO Dario Amodei, “DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)” and “ DeepSeek's total spend as a company (as distinct from spend to train an individual model) is not vastly different from US AI labs.”
One more question and answer before I leave you-
Slow down? NEVER