Transcript
Claims
  • Unknown A
    One of the really cool things about this job is just that when something like this happens, I get to kind of talk to everyone and everyone wants to talk. And I feel like I've talked to maybe not everyone in like all the top people in AI, but it feels like most of them. And there's definitely a lot of takes all over the map on Deep Seq, but I feel like I've started to put together a synthesis based on hearing from the top people in the field. It was a bit of a freakout. I mean, it's rare that a model release is going to be a global news story or cause a trillion dollars of market cap dec decline in one day. And so it is interesting to think about why was this such a potent news story. And I think it's because there's two things about that company that are different.
    (0:00:00)
  • Unknown A
    One is that obviously it's a Chinese company rather than an American company. And so you have the whole China versus US competition. And then the other is it's an open source company or at least an open source, the R1 model. And so you've kind of got the whole open source versus closed source debate. And if you take either one of those things out, it probably wouldn't have been such a big story. But I think the synthesis of these things got a lot of people's attention. A huge part of TikTok's audience, for example, is international. Some of them like the idea that the U.S. may not win the AI race, that the U.S. is kind of getting a comeuppance here. And I think that fueled some of the early attention on TikTok. Similarly, there's a lot of people who are rooting for open source or they have animosity towards OpenAI.
    (0:00:42)
  • Unknown A
    And so they were kind of rooting for this idea that oh, there's this open source model that's going to give away what OpenAI has done at 1 20th the cost. So I think all of these things provided fuel for the story. Now I think the question is, okay, what should we make of this? I mean, I think there are things that are true about the story and then things that are not true or should be debunked. I think that let's call it true thing here is that if you had said to people a few weeks ago that the second company to release a reasoning model along the lines of O1 would be a Chinese company, I think people would have been surprised by that. So I think there was a surprise. And just to kind of back up for people, there's Two major kinds of AI models.
    (0:01:30)
  • Unknown A
    Now there's kind of the base LLM model like ChatGPT 4.0 or the deep seq equivalent was V3, which they launched a month ago. And that's basically like a smart PhD. You ask a question, gives you an answer. Then there's the new reasoning models which are based on reinforcement learning, sort of a separate process as opposed to pre training. And O1 was the first model released along those lines. And you can think of a reasoning model as like a smart PhD who doesn't give you a snap answer, but actually goes off and does the work. You can give it a much more complicated question and it'll break that complicated problem into a subset of smaller problems and then it'll go step by step to solve the problem. And that's called chain of thought. Right. And so the new generation of agents that are coming are based on this type of idea of chain of thought, that an AI model can sequentially perform tasks, figure out much more complicated problems.
    (0:02:15)
  • Unknown A
    So OpenAI was the first to release this type of reasoning model. Google has a similar model they're working on called Gemini 2.0 flash thinking they've released kind of an early prototype of this called Deep Research 1.5. Anthropic has something, but I don't think they've released it yet. So other companies have similar models to O, either in the works or in some sort of private beta. But Deep seq was really the next one after OpenAI to release the full public version of it, and moreover they open sourced it. And so this created a pretty big splash. And I think it was legitimately surprising to people that the next big company to put out a reasoning model like this would be a Chinese company and moreover that they would open source it, give it away for free. And I think the API access is something like 1/20 the cost.
    (0:03:13)
  • Unknown A
    So all of these things really did drive the news cycle. And I think for good reason, because I think that if you had asked most people in the industry a few weeks ago, how far behind is China on AI models? They would say six to 12 months. And now I think they might say something more like three to six months, because O1 was released about four months ago and R1 is comparable to that. So I think it's definitely moved up people's timeframes for how close China is on AI. Now we should take the claim that they only did this for $6 million on this one. I'm with Palmer Luckey and Brad Gerstner and others, and I think this has been pretty much corroborated by everyone I've talked to that that number should be debunked. So first of all, it's very hard to validate a claim about how much money went into the training of this model.
    (0:04:06)
  • Unknown A
    It's not something that we can empirically discover. But even if you accepted it at face value that $6 million was for the final training run, so when the media is hyping up these stories saying that this Chinese company did it for 6 million and these dumb American companies did it for a billion, it's not an apples to apples comparison. Right? I mean if you were to make the apples to apples comparison, you would need to compare the final training run cost by Deep SEQ to that of OpenAI or Anthropic. And what the founder of Anthropic said, and what I think Brad has said, being an investor in OpenAI and having talked to them, is that the final training run cost was more in the tens of millions of dollars about nine or ten months ago. And so it's not six million versus a billion.
    (0:05:05)
  • Unknown B
    Okay, A billion dollar number might include all the hardware they bought, the years of putting into it a holistic number as opposed to the training number.
    (0:05:55)
  • Unknown A
    Yeah, it's not just running it. It's not fair to compare, let's call it a soup to nuts number, a fully loaded number by American AI companies to the final training run by the Chinese company.
    (0:06:03)
  • Unknown C
    But real quick, Sacks, you've got, you've got an open source model and they've, the white paper they put out there is very specific about what they did to make it and sort of the results they got out of it. I don't think they give the training data but you could start to stress test what they've already put out there and see if you can do it cheap essentially.
    (0:06:15)
  • Unknown A
    Like I said, I think it is hard to validate the number. I think that if let's just assume that we give them credit for the 6 million number. My point is less that they couldn't have done it, but just that we need to be comparing likes to likes. So if for example, you're going to look at the fully loaded cost of what it took Deep SEQ to get to this point, then you would need to look at what has been the R and D cost to date of all the models and all the experiments and all the training runs they've done. Right. And the compute cluster that they surely have. So Dylan Patel, who's leading semiconductor analyst has estimated that Deep seq has about 50,000 hoppers. And specifically he said they have about 10,000 H1 hundreds. They have 10,000 H8 hundreds and 30,000 H20s.
    (0:06:38)
  • Unknown B
    Now the cost Sachs, are they Deep Seq or it's Deep Seek plus the hedge fund.
    (0:07:28)
  • Unknown A
    Deep Seq plus the hedge fund. But it's the same founder, right? And by the way, that doesn't mean they did anything illegal, right? Because the one hundreds were banned under export controls in 2022. Then they did the eight hundreds in 2023. But this founder was very far sighted, he was very ahead of the curve and he was, through his hedge fund he was using AI to basically do algorithmic trading. So he bought these chips a while ago. In any event, you add up the cost of a compute cluster with 50,000 plus hoppers and it's going to be over a billion dollars. So this idea that you've got this scrappy company that did it for only 6 million, just not true. They have a substantial compute cluster that they use to train their models and frankly that doesn't count any chips that they might have beyond the 50,000, you know, that they might have obtained in violation of export restrictions that obviously they're not going to admit to.
    (0:07:32)
  • Unknown A
    And we just don't know. We don't really know the full extent of what they have. So I just think it's like worth pointing that out that I think that part of the story got overhyped.
    (0:08:33)
  • Unknown B
    It's hard to know what's fact and what's fiction. Everybody who's on the outside guessing has their own incentive, right? Like so if you're a semiconductor analyst, that effectively is massively bullish. Nvidia, you want it to be true that it wasn't possible to train on $6 million. Obviously if you're the person that makes an alternative that's that disruptive, you want it to be true that it was trained on $6 million. All of that I think is all speculation. The thing that struck me was how different their approach was. And TK just mentioned this. But if you dig into not just the original white paper of Deep Seq, but they've also published some subsequent papers that have refined some of the details. I do think that this is a case and Sacks, you can tell me if you disagree, but this is a case where necessity was the mother of invention.
    (0:08:43)
  • Unknown B
    So I'll give you two examples where I just read these things and I was like, man, these guys are like really clever. The first is, as you said, let's put in a pin on whether they distilled O1, which we can talk about in a second. But at the end of the day, these guys were like, well, how am I going to do this reinforcement learning thing? They invented a totally different algorithm. There was the orthodoxy, this thing called PPO that everybody used. And they were like, no, we're going to use something else called. I think it's called GRPO or something. It uses a lot less computer memory and it's highly performant. So maybe they were constrained sacks, practically speaking, by some amount of compute that caused them to find this, which you may not have found if you had just a total surplus of compute availability.
    (0:09:37)
  • Unknown B
    And then the second thing that was crazy is everybody is used to building models and compiling through Cuda, which is Nvidia's proprietary language, which I've said for a couple times is their biggest moat, but it's also the biggest threat vector for lock in. And these guys worked totally around Cuda and they did something called ptx, which goes right to the bare metal and it's controllable and it's effectively like writing assembly. Now, the only reason I'm bringing these up is we, meaning the west, with all the money that we've had, didn't come up with these ideas. And I think part of why we didn't come up is not that we're not smart enough to do it, but we weren't forced to because the constraints didn't exist. And so I just wonder how we make sure we learn this principle. Meaning when the AI company wakes up and rolls out of bed and some VC gives them $200 million, maybe that's not the right answer for a Series A or A seed.
    (0:10:23)
  • Unknown B
    And maybe the right answer is 2 million so that they do these deep seq like innovations. Constraint makes for great art. What do you think, Friedberg, when you're looking at this?
    (0:11:20)
  • Unknown D
    Well, I think it also enables a new class of investment opportunity. Given the low cost and the speed, it really highlights that maybe the opportunity to create value doesn't really sit at that level in the value chain. But further upstream, Balaji made a comment on Twitter today that was pretty funny. Or I think we're about the rapper. Yeah, he's like, turns out the wrapper may be the moat.
    (0:11:31)
  • Unknown B
    The moat. Which is true.
    (0:11:55)
  • Unknown D
    At the end of the day, if model performance continues to improve, get cheaper, and it's so competitive that it commoditizes much faster than anyone even thought, then the value is going to be created somewhere else in the value chain. Maybe it's not the wrapper, maybe it's the user. And maybe, by the way, here's an important point. Maybe it's further in the economy. You know, when electricity production took off in the United States, it's not like the companies are making a lot of money that are making all the electricity. It's the rest of the economy that accrues a lot of the value.
    (0:11:58)