Nobody knows for sure who is going to make all the money when it comes to artificial intelligence. Will it be the incumbent tech giants? Will it be startups? What will the business models look like? It's all up in the air. One thing is clear though — AI requires a lot of computing power and that means demand for semiconductors. Right now, Nvidia has been a huge winner in the space, with their chips powering both the training of AI models (like ChatGPT) and the inference (the results of a query.) But others want in on the action as well. So how big will this market be? Can other companies gain a foothold and "chip away" at Nvidia's dominance? On this episode we speak with Bernstein semiconductor analyst Stacy Rasgon about this rapidly growing space and who has a shot to win it. This transcript has been lightly edited for clarity.
Key insights from the pod:
The basic math behind AI — 4:55
Why is Nvidia so dominant? — 6:05
How does the tech actually work? — 9:42
What is a neural network? — 14:38
How big can this market get? — 20:15
How are Google and Microsoft nd Amazon competing? — 28:37
Where is Intel on AI? — 36:00
What does progress look like from here? — 42:58
Where are we in the cycle for semiconductor investors? — 49:06
---
Joe Weisenthal: (00:10)
Hello, and welcome to another episode of the Odd Lots podcast. I'm Joe Weisenthal.
Tracy Alloway: (00:15)
And I'm Tracy Alloway.
Joe: (00:16)
Tracy, I'm not sure if you've heard anyone talking about it, but have you heard about this sort of ‘AI thing’ people have been discussing?
Tracy: (00:25)
Oh, you know what? I discovered this really cool new thing called ChatGPT.
Joe: (00:29)
Oh, yeah, I saw that website too.
Tracy: (00:30)
Yeah. Have you tried it?
Joe: (00:32)
I tried it. I had it write a poem for me. It's pretty cool technology. We should probably learn more about it.
Tracy: (00:39)
Yeah, I think we should.
Ok, obviously we're being facetious and joking, but everyone has been talking about AI and these new sort of natural language interfaces that allow you to ask questions or generate all different types of texts and things like that. It feels like everyone is very excited about that space.
Joe: (01:02)
Almost every conversation...
Tracy: (01:03)
To put it mildly...
Joe: (01:05)
I went out with some friends that I hadn't seen in a long time. I was at a bar last night. And the conversation turned to AI within like two minutes. And everyone's talking about the experiments they did, but yes, there is a lot.
It's basically like this wall of noise. And everyone's been talking about it, actually, except us. Because I don't think we have done, as far as I can recall, an actual AI episode, and we don't want to just add to the noise and do another sort of chin-stroking round. But obviously there's a lot there for us to discuss.
Tracy: (01:34)
Totally. And I'm sure this will be the first of many episodes, but one of the ways that it fits into sort of classic Odd Lots lore is via semiconductors, right? If you think about what ChatGPT, for instance, is doing, it's taking words and transforming them into numbers, and then spitting those words back out at you. And the thing that enables it to do that is semiconductors, chips, right?
Joe: (02:02)
So here are the four things I think I know about this. A) is that training the AI models so that they can do that is a computationally intensive process. B) each query is much more computationally intensive than say a Google search. Three, the company that's absolutely crushing the space and printing money because of this is Nvidia. And four, there's a general scarcity of computing power so that even if you and I were brilliant mathematicians and AI theorists, etc., if we wanted to start a ChatGPT competitor, just getting access to the computing power in order to do that would not be trivial. Even if we had tons of money.
Tracy: (02:46)
I’m going to buy an out of business crypto mine and take all the chips out of that...
Joe: (02:50)
Tracy, they've already been bought. Someone already got there. But that's it. That's basically the extent of my understanding of the nexus between AI and chips. And I suspect there's more to know than just those facts.
Tracy: (03:01)
Well, I also think having a conversation about semiconductors and AI is yes a really good way to understand the underlying technology of both those things. So that's what I'm hoping for, out of this conversation.
Joe: (03:14)
All right. Well, as you mentioned, we've done lots of chips episodes in the past, so we are going to go back to the future or something like that. We are going to go back to our first episode, our first guest where we started exploring chips. I think it was the first one that we did. Sometime maybe in early 2021. We are going to be speaking with Stacy Rasgon, managing director and senior analyst of US semiconductors and semiconductor capital equipment at Bernstein Research — someone who is great at breaking all this stuff down, has been doing a lot of research on this question now. So Stacy, thank you so much for coming back on Odd Lots.
Stacy Rasgon: (03:49)
I am so happy to be back. Thank you so much for having me.
Joe: (03:52)
Right, so I'm gonna start with just sort of, not even a business question, but a sort of semiconductor design question, which is this company Nvidia. For years I just sort of knew them as, they were the company that made graphics cards for video games, and then for a while it was like “oh and they're also good for crypto mining”. And they were very popular for a while in Ethereum mining when it used proof of work. And now my understanding is everyone wants their chips for AI purposes, and we'll get into all that.
But just to start, what is it about the design of their chips that makes them naturally suited for these other things? A company that started in graphics cards, that makes them naturally suited for these things like AI in a way apparently that other chip makers like say an Intel, their chips do not seem to be as used for this space?
Stacy: (04:45)
Yeah, so let me step back...
Joe: (04:49)
If the question is totally flawed in its premise, then feel free to say “your question is totally flawed.”
Stacy: (04:55)
Let me step back. So I'd say the idea of like using compute in artificial intelligence has obviously been around for a long, long time. And actually the AI industry has been through a number of what they call “AI winters” over the years where people would get really excited about this and then they would do work and then it would just turn it wasn't working. And pretty much it was just because the compute capacity and capabilities of the hardware at the time wasn't really up to the task. And so interest would wane and we'd go through this winter period.
A while back — I don’t know 10, 15 years ago, whenever it was — it was sort of discovered that the types of calculations that are used for neural networks and machine learning; it turns out they are very similar to the kinds of application or the kinds of a mathematics that are used for graphics processing — processing and graphics rendering.
As it turns out, it's primarily matrix multiplication. And we'll probably get into it on this call a little bit in terms of how these machine learning models and everything actually work, but at the end of the day, it really it comes down to really, really large amounts of matrix multiplication in parallel operations. And as it turned out the GPU, the graphics processing unit, was quite suitable.
Joe: (06:07)
Before you go on and maybe we'll get into this in hour three of this conversation — no, we're not going to go that long — but what is matrix multiplication?
Stacy: (06:16)
So I don’t know how many of your listeners here have had linear algebra or anything. But a matrix is just like an array of numbers. Think about like a square array of numbers, okay? And matrix multiplication is I've got two of these arrays and I'm multiplying them together. And it's not as simple as the kind of math or multiplication that maybe you're typically used to, but it can be done.
And it turns out there are some of these characteristics of these kinds of number of these can be really big. And there's like lots and lots of operations that need to happen. And this stuff needs to happen quite rapidly. And again I'm grossly simplifying here, for the listeners. But when you're working through these kinds of machine learning models, that's really what you're doing.
It's a bunch of different matrixes, a bunch of different arrays of numbers that contain all of the different parameters and things. And by the way, we should probably step up a bit and talk about what we actually mean when we talk about machine learning and models and all kinds of things.
But at the end of the day, you have these really large arrays of numbers that have to get multiplied together in many cases over and over again, many, many times. And it turns into a very large compute problem. And it's something that the GPU architecture can actually do really, really efficiently, much more efficiently than, than you could on a traditional CPU. And so, as it turns out, the GPU has become a good architecture for this.
Now, what Nvidia has done on top of this, not only with having the hardware, is they've also built a really massive software ecosystem around all of this. Their software is called Cuda. Think about it as kind of like the software, the programming environment. Like the parallel programming environment for these GPUs. And they've layered on all kinds of other libraries and STKs and everything on top of that, that actually makes this relatively easy to use and to deploy and to deliver.
And so they've built up not just the hardware, but also the software around this. And it's given them a really, really sort of like massive gap versus a lot of the other competitors that are now trying to get into this market as well. And so, and it's funny, if you look at Nvidia as a stock, I today, this morning it's about, I don’t know $260 or $270 a share. This was a 10 to $20 stock forever.
And frankly, they did a 4-1 stock split recently. So that'd be more like, you know, like a $2.50 cent to $5 stock on today's basis for years and years and years. And just the magnitude of the growth that we've had with these guys over the last like five or 10 years, particularly around their data center business and artificial intelligence and everything has just been like quite remarkable.
And so the earnings have gone through the roof and clearly the multiple that you're placing on those earnings has gone through the roof because, you know, the view is that the opportunity here is massive and that we're early and there's a lot of runway ahead of us. And the stock, I mean, it's had its ups and downs, but in general it's been a home run.
Tracy: (09:08)
I definitely want to ask you about where we are in the sort of semiconductor stock price cycle. But before we get into that, you know, I will also bite on the really basic question that you already alluded to, but how does machine learning/AI actually work? You mentioned this idea of, I guess processing a bunch of data in parallel, versus, I guess, old style computing where it would be sequential, but talk to us about what is actually happening here. And how does it fit into the semiconductor space?
Stacy: (09:42)
You bet. So let me first abstract this up and I'll give you a really contrived example, just sort of simplistically about what's going on. And then we can go a little bit more into the actual details of what's happening. But let's imagine you want to have some kind of a neural network, by the way, machine learning is typically done with something called a neural network. And I'll talk about what that is in a moment.
And let's just imagine, for example, you want to build an artificial intelligence, a neural network to recognize pictures of cats. So let's imagine I've got this black box sitting in front of me and it's got a slot on one side where I'm taking pictures and I'm feeding them in. It's got a display on the other side, which tells me, “yes, it's a cat,” or “no, it's not.”
And on the side of the box, there are a billion knobs that you can turn. And they'll change various parameters of this model that right now are inside the black box. Don't worry about what those parameters are, but there's knobs that can change them. And so effectively what you're doing when you're training this thing, and by the, when you have the outer vision, what you have is you have this big black box. You need to train it to do a specific task. That's what I'm going to talk about in a moment. That's called training.
And then once it's trained, you need to use it for whatever task you've trained it for. That task is called inference. So you’ve got to do training and inference. So the training, here's what we got. I got my box with a slot and the display and a billion knobs.
So what I do for the training process effectively is I take a known picture. So I know if it's a cat or not. I feed it into the box and I look at the display and it tells me, “yes, it's a cat,” or “it's not,” and it probably gets it wrong. And so then what I do is I turn some of the knobs and I feed another picture in, and then I turn some of the knobs.
And I'm basically tuning all of the parameters and sort of measuring how accurate is this network at recognizing “is this a picture of a cat or is it not?” And I keep feeding pictures in known pictures, known data sets, and I keep playing with all the knobs and hold the accuracy of the thing is wherever I want it to be.
So yes, it's decided that now it's very good at recognizing “is this a picture of a cat or is it not?” Now at that point, my model, my box is trained. I now lock all of those knobs in place. I don't move them anymore. And now I use it. Now I can just feed in pictures and it'll tell me, “yes, it's a cat,” or “no it's not.”
And so the process of training this model is what that's really what it's about. It's about varying all of the parameters. And by the way, these models can have billions or hundreds of billions or even more parameters that that can be changed. And that's the process of training. You're basically trying to optimize this sort of situation. I'm changing the parameters a little bit at a time such that I can optimize the response of this thing such that I can get the performance of, the accuracy of the network, to be high.
So that's the training process. And it is very compute intensive. Because you can imagine if I've got a billion different knobs that I'm turning, I'm trying to optimize the output, that takes a lot of compute. The inference process, once it's all said and done, is much less compute intensive because I'm not changing anything. I'm just applying the network as it is to whatever data that I'm feeding in. At that point, I'm not changing anything, but I may be doing a lot more. The difference is with the inference, I may be using it all the time, whereas once I've trained the model, I've trained it. So it's more like a one and done versus a continual use sort of thing.
Joe: (13:04)
Since we're getting into sort of the economics of training versus inference. Is there sort of any way to get a sense of like, let's say Tracy and me start OddLotsGPT as a competitor to ChatGPT a competitor to OpenAI. What are we thinking of in terms of just that skill? And then how are we spending in compute on the training part. And then how much are our recurring costs in terms of inference?
And then I'm also just curious, I know you said the inference is much cheaper, but how much cheaper is it versus say, asking Google a question? How much more expensive is it? How much more expensive is a ChatGPT query or an OddLotsGPT query versus just a normal Google search?
Stacy: (13:49)
Yeah, when I say cheaper, it's like for any given single use, right? Again, if I've got like a hundred billion different inference activities, maybe it's not cheap.
Joe: (13:58)
It's still expensive.
Stacy: (14:00)
Yeah. But I first want to talk so that this is my big abstract, contrived example about what's going on. If I go just a little bit deeper about what this thing is, let's talk just briefly about a neural network and then I will get to the question. But it kind of influences it. So think what is a neural net? If I were to draw a representation of a neural network for you, what I would do is I would have a bunch of circles. Each of those circles would be a neuron. And I wish I was there, I could draw a picture for you, but imagine like some...
Joe: (14:27)
Send a picture after you're done, send a picture and we'll run it with the episode. We'll run it with the episode.
Stacy: (14:32)
Okay I can do that.
Joe: (14:35)
Your hand drawn explanation of how a neural network looks.
Stacy: (14:38)
These are very easy though. These are very easy to find. But anyways, imagine I've got like a group of circles. I've got like a column.
You know in column one with like three circles. And then column two I've got, I don’t know, three or four circles. In column three, I've got some circles. These are my neurons. And imagine I've got arrows that are connecting each circle to the circles in one row to all of the circles in the next row. Those are my connections between my neurons. So you can see it looks like kind of a network. And so within each circle I've got some, what, what's called an activation function.
So what each circle does is it takes an input, the arrow that's coming into it, and it has to decide based on those inputs, “Do I send an output out out the other side or not?
So there's some certain threshold if the inputs reach some amount of threshold, the neuron will fire, just like the neuron in your brain. Okay. Each neuron can have more than one input coming in from more than one neuron in, in the previous. These are called layers by the way. These rows of circles can have more than one input from the different neurons in the previous layer. And that the neuron can weight those different inputs differently.
It's good, it can say, “You know from this one neuron, I'm going to give that a 50% weight. And from the other neuron I'll only weight at 20%. I'm not going to take the full signal.” So those are called the the weights of the network. And so each neuron has inputs coming in and outputs going out, and each of those inputs and outputs will have a weight associated with it.
So those are — I remember I talked about those knobs, those parameters — those weights are one set of parameters. And then within each neuron, there's basically, there's a certain threshold with all those signals coming in when you add them up, if they reach a certain threshold, then the neuron fires. So that threshold is called the bias. And you can tune that. Like, I can have a really sensitive neuron where I don't need a lot of signal coming in to make it fire. I can have a neuron that's less sensitive. I need a lot of signal coming in for it to fire. That's called a bias.
That's also a parameter. So those are the parameters that you're setting the structure of the network itself. The number of neurons and the number of layers and everything that's sort of set. And then you're trying to determine these weights and biases. And again, just to level set… ChatGPT, which I'm getting excited about, has 175 billion separate parameters that get set during the training process. Okay. So that's kind of what's going on.
Tracy: (17:19)
Before you talk about economics, can I just ask, so one of the things about the technology is it's supposed to be iterative, right? It's learning as it goes along. Can you talk just briefly maybe about how it's incorporating new inputs as it develops?
Stacy: (17:37)
Yeah. So when you train, let's talk about training now. So when you train the network, it happens on a static dataset. Okay. So you have to start with a dataset. And in terms of ChatGPT, it has a large corpus of data that it was trained on. Basically a lot of data from the internet and from other sources, right?
Joe: (17:59)
Basically we've trained the smartest...
Tracy: (18:01)
It was the whole of the internet wasn’t it?
Joe: (18:03)
But also I think, a lot of Reddit. Right? So we’ve trained the greatest brain of all time. It's Reddit-pilled.
Tracy: (18:11)
Now it talks like a 17 year old boy?
Stacy: (18:14)
So there's a lot of data. And so you asked like sort of how does that data get incorporated into this? I’m worried about getting too complicated, I don’t want to get too complicated. Let me talk about how standard training works and then we can talk about ChatGPT, because that uses a different kind of model. It's called a transformer model.
When I'm training this, so what happens is I feed this stuff there, a process called backpropagation. Basically what you do is you sort of feed this stuff through the network itself. And then you work it backwards. You're basically, what you're doing is you're measuring the output against a known response. That's my cat picture. Is it a cat or is it not a cat? I'm trying to minimize the difference between, because I want it to be accurate, right?
So what you sort of do is you roll a certain step through the network, right? You measure the output against the “known,” what it should be. And then there's a process that's called backpropagation where what you're doing, you're actually calculating what's called the gradients of all of these things. You're basically looking at sort of like the rate of change of these different parameters. And you sort of work the network backwards. And that gradient that you're calculating kind of tells you how much to adjust each parameter. So you work it back and then you work it forward again, and then you work it backward, and then you work it forward and you work it backward.
And then you do that until you've converged that the network itself is accurate to wherever you want it to be to be accurate at. Again, I'm grossly simplifying here. I'm trying to keep this as high level as possible. It's not easy, but that's kind of what you're doing. And just in terms of the amount of sort of training. ChatGPT they've actually released all the details of the network, like how many layers and what's the dimension and like parameters, all this stuff. So we can do this math. It turns out to take about three times 10 to the 23rd operations to train it. That's 300 sextillion operations it took to train ChatGPT.
Now, in terms of how much it costs. So ChatGPT — they kind of said this — it was trained on 10,000 Nvidia, what they call the V100, that's the Volta chip. That's a chip that's several years old for Nvidia, but it was trained on supposedly about 10,000 of these. And we did some of this math ourselves. I was coming out more like 3,000 or 4,000. But there's a ton of another assumptions you have to make in here. 10,000 seems to be the right order of magnitude for that part. That part of the time cost about, you know, I don't know, 8,000 bucks. And so the number that was kind of tossed at was something like $80 million to train ChatGPT one time.
Joe: (20:49)
$80 million doesn't seem like that much to me.
Stacy: (20:52)
Well...
Joe: (20:53)
Like, I get it. But there are a lot of companies that could spend $80 million.
Stacy: (20:57)
I actually agree with you. We're jumping ahead. But my take is that for large language models, and we can talk about these different things, but for large language models, like ChatGPT I actually think inference is a bigger opportunity and you're kind of getting to the heart of it. It's because inference scales directly. The more queries I run, the more...
Joe: (21:13)
You train once and that's done and that's $80 million.
Stacy: (21:17)
Or even if you're training more than once. And again to your question Tracy, like you can add to the data set and then retrain it. Let's say I'm training it every two weeks. That'd be training it like 24, 25 times a year. But I've got the infrastructure that is in place already. And so the training TAM will be more around how many different entities actually develop these models and how many models each do they develop, and how often do they train those models?
And importantly, how big do the models get? Because this is one of the things. ChatGPT is big. But GPT4, which they've released now, is even bigger. They haven't talked about specs, but I wouldn't be surprised. GPT4 is rumored to have over a trillion parameters. It very well might. And we're very early into this. These models are going to keep getting bigger and bigger and bigger. And so that's how I think the training market, the training TAM will be growing. It's a function of the number of entities training of all these models we're doing every year and the size of these models and the models will get big.
Joe: (22:17)
So let's get into it, but in your view the big money is going to be made on the inference. So let's talk about that. Talk about what happens then, and your sort of sense of the size, or I don't know. Talk to us about the inference part and the economics of that?
Stacy: (22:33)
You bet. So ChatGPT and these large language models, it's a new type of model. It's called a transformer model. And there's a bunch of compute steps that have to happen. There's also a step in there that helps it map the relation, capture the relationship between, you know, by the way, if you've ever used ChatGPT, you know, you type in a query into a box and it returns a response.
So that query is broken into what are called tokens. It's basically, you think about a token as kind of like a word or a group of words, but okay, the transformer model has something that's called a self attention mechanism. And what that does is it captures the relationship between those different tokens and the input sequence based on the training data that it has.
And that's how it knows what it's really doing. It's predictive text. It knows based on this query, I'm going to start the response with this word. And based on this word and this query and my dataset, I know these other words typically follow, and it kind of constructs the response from that. And so our math suggests that for a typical query response like, you know, 500 tokens or maybe 2000 words, it was something like 400 quadrillion operations needed to accomplish something like that.
And so you can size this up because I know for like an Nvidia GPU, and you can do it for different GPUs, I know how many operations per second each GPU can run, and I know how much these GPU's ballpark kind of cost. And so then you’ve got to assume like, “Well, okay, how many queries per day are you going to do?”
And you can come up with with a number. And I mean, frankly, the number can be as big as you want. It depends on how many queries. But I think a TAM, you know, at least in the multiple tens of billions of dollars is not unreasonable, if not more.
And just to level set, I mean, it gets to your Google question. Google does about 10 billion searches a day, give or take. I think a lot of people have been looking at that level as part of like, you know, like the end all be all for where this could go. I'll be honest, I understand why people are — especially internet investors — are concerned that large language models and things like ChatGPT can start to disrupt search.
I'm not exactly sure that search is the right proxy. Personally, it feels kind of limiting to me. I mean, you could imagine, I've watched a little too much Star Trek, I guess, but I mean, you could imagine you have like a virtual assist in the ceiling, I'm calling out to it. And you know, it doesn't have to be just search on my screen. I could have it in my car, right? I could call up American Airlines and change my airline tickets and it's a check box that the chat bot bot is talking to me.
So this could be very big. And by the way, I think to guess, by the way, the one problem with this type of calculation that's kind of static. Like the cost is sort of an output rather than an input, I think to drive adoption cost will come down. And we've already seen that, like Nvidia has a new product that's called Hopper, which is like two generations past those V100s that I was talking about, past the Volta generation.
The cost per query to do this or the cost per training on Hopper is much lower than Volta because it's a much more efficient. That's a good thing though. It’s TAM-accretive. it will drive adoption. NVidia actually has specific products specifically designed to do this, this kind of thing. And Hopper has specific blocks on it that actually help with the training and inference on these kind of large language models. And so I actually think over time as the efficiency gets better and better, you're going to drive adoption more and more. I think this is a big thing. And remember, we're still really early. ChatGPT only showed up in November, right?
Tracy: (25:57)
Yeah. It's crazy isn’t it.
Stacy: (25:59)
It's really early still.
Tracy: (26:00)
Just on that note, can you draw directly the connection between the software and the hardware here? Because I think, at this point, probably everyone listening has tried ChatGPT and you're used to seeing it as a sort of, you know, it’s an interface on the internet and you type stuff into it and it spits something out.
But where do the semiconductors actually come in when we're talking about crunching these enormous data sets? And what makes, you kind of touched on this a little bit with Nvidia, but what makes a semiconductor better at doing AI versus more traditional computational processes?
Stacy: (26:39)
Yeah, you bet. So to answer that second question, I think AI is really much more around parallel processing. And a particular thing, it's this kind of matrix math. It's a single class of calculations that these things do very, very efficiently and do very well. And they do them much more efficiently than a CPU that that performs a little more serially versus parallel. You just couldn't run this stuff on CPUs.
By the way, don't get me wrong, you do some. We've been talking about inference on large language models. There's all kinds of inference, inference workloads that range from very simplistic to very, very complex. Again my cat recognition example was very simplistic. Something like this, or frankly something like autonomous driving that, that is an inference activity, but is a hugely computationally-intense inference activity.
And so there's still a lot of inference today that actually happens. In fact, most inference today actually happens on CPUs. But I'd say the types of things that you're trying to do are getting more and more complex and CPUs are getting less and less viable for that kind of math. And so that's kind of the difference between GPUs and other types of parallel offerings versus like a CPU. I should say, by the way, GPUs are not the only way to do this. Google, for example, has their own AI chips. They call them a TPU. Tensor Processing Unit.
Joe: (27:57)
One thing I really like about talking to Stacy. Two things is A) I think he comes up with better versions of our questions than we do, which I really like...
Tracy: (28:07)
“One thing about the question you just asked but you didn’t actually ask...”
Joe: (28:09)
He's always like, “Alright, that's a good question, but let me actually reframe the question to get a better response.” So I appreciate that. And he also anticipates, because I literally, like on my computer right now, I had “Google Cloud Tensor Processing Unit.” Because that was my next question. And also in part, because I think yesterday The Information reported that Microsoft is also doing this. So talk to us about that.
Stacy: (28:34)
Yeah, you bet. By the way, this is not new. Google has been doing their own chips for seven or eight years. It is not new. But they have what they call a TPU and they use it extensively for their own internal workloads. Absolutely. Amazon has their own chips. They have a training chip, that's called, you know, kind of hysterically, it's called Trainium. They have an inference chip, it's called Inferencia. Microsoft apparently is working on their own.
My feeling is every hyperscaler is working on their own chip, particularly for their own internal workloads. And that is an area, you know, we talked about Nvidia’s software moat. Like Google doesn't need Nvidia’s software moat. They're not running Cuda. they're just running TensorFlow. And doing their thing. They don't need Cuda.
Anything however that is facing an end customer, like an enterprise like on a public cloud, like a customer going to AWS and renting, you know, compute power. That tends to be GPUs because customers don't have Google's just sophistication. They really do need the software ecosystem that's built around this. So for example, I can go to Google Cloud, I can actually rent a TPU instance. It can be done. Nobody really does it.
And actually, if you look how they're priced typically it's actually more expensive usually even than how the way that Google's pricing GPUs on Google Cloud. It's similar for Amazon and others. And so I do think that all the hyperscalers are working on their own. And there is a certain, certainly a place for that, especially for their own internal workloads. Anything that's facing a customer, that Nvidia GPU ecosystem is really kind of sprung up.
Joe: (30:09)
So just to clarify, because that point is really interesting that for like, again, if Tracy and I want to launch OddLotsGPT part of the issue would be not necessarily the hardware, the silicon, but actually that Nvidia's software suite built around it would make it much easier for us to sort of start and use on Nvidia for training our model?
Stacy: (30:38)
Yes it would would. And they've built a lot. And it's funny, you can go listen to Nvidia's announcements in their analyst days and things, and they're as much about software as they are about hardware. So not only have they continued to extend like the basic Cuda ecosystem, they've layered all kinds of other application-specific things on top of it. So they've got what they call Rapids, which is for enterprise machine learning. They've got a library package called Isaacs, which is for automation robotics.
They've got a package called Clara, which is specifically for medical imaging and diagnostics. They've got something called cuQuantum, which is actually for quantum computer simulations. They've got something for drug discovery. So they're layering all these things on top, right? Depending on your application. They've got internal teams that are working on this, not just throwing the software out there.
They've got people there that can actually like like help and come along with it. But they're doing other things easier, you know, so they actually just launched a cloud service. And this is with Google and Oracle and Google and Microsoft where you can almost, they'll do like a fully provisioned Nvidia AI supercomputer in the cloud.
So, because like they sell these AI servers and they can cost hundreds of thousands of dollars a piece. If you want now, you can just go to Oracle Cloud or Google Cloud or whatever, and you can sort of rent a fully provisioned Nvidia supercomputer sitting in the cloud that they’ll, all you’ve got to do is access it right through a web browser. They’ll make it super easy.
Tracy: (32:00)
This was going to be my next question actually. So I take the point about software, but what do the AI super computers actually look like nowadays? Is there a physical thing in a giant data center somewhere? Or are they mostly like cloud-based or what does this look like? Walk us through the ecosystem.
Stacy: (32:18)
Nvidia sells something they call the DGX. It's a box. I mean it's I don’t know... two feet? I don't know what the dimensions are. Two feet by two feet or something like that. It's got eight GPUs and two CPUs and a bunch of memory and a bunch of networking. They've got their own, you know, they bought a company called Melanox a while back that did networking hardware. So it's got a bunch of proprietary networking, that's something else we haven't talked about. It's not just enough to have the computer, the compute, these models are so big, they don't fit on a single GPU. So you have to be able to network all this stuff together.
And so they've got networking in there and then they have this box. And then you can stack a whole bunch of boxes together. Nvidia has their own internal supercomputer. It's fairly high on the top 500 list. They call it Celine. It's a bunch of these DGX-size servers that they make all just like stack together effectively. And they sell for the older generation... Their prior generation was called Ampere and that box sold for $199,000. I don't believe they've released pricing on the Hopper version, but I know for the Hopper GPU it costs two to three X what Ampere cost, in the prior generation.
Joe: (33:28)
Sothis raises a separate question to me, which is, okay, there's the price and it exists and you could theoretically go and use Google's Tensor-based cloud. But is it available? Because I sort of get the impression that for some of the technology that people want to use, it's not available at any price and that there is a actual scale. Is that real or not?
Stacy: (33:52)
It seems to be. So their new generation, which is called Hopper, which like I said has characteristics of it that make it very attractive, especially for these kind of ChatGPT-like large language models. It’s tight to play. We're at the very beginning of that product cycle. They just launched it, like in the last like couple of quarters.
And so that ramp up takes time and it does seem like they are seeing accelerated demand because of this kind of stuff. And so, yeah, I think supply is tight. We've heard stories about GPU shortages at Microsoft and the cloud vendors. And I think there was a Bloomberg story the other day that said these things were selling for like $40,000 on eBay or something. I took a look at some of those listings. They looked a little shady to me. But yeah, it's tight. And you have to remember these parts are very complicated. So the lead times to actually have more made, it takes a while.
Tracy: (34:38)
Wait, so just on this note, I joked about this in the intro, but could I buy like a Bitcoin mining facility and take all that computer processing power and convert it into something that could be used for AI? Is that a possibility?
Stacy: (34:54)
You could… The Bitcoin stuff, at least, a lot of the Bitcoin stuff was done, that was with GPUs. Those were still mostly gaming GPUs. People were buying gaming GPUs and repurposing them for Bitcoin and Ethereum, mostly Ethereum mining. They're not nearly as compute efficient as the data center parts. But I mean, in theory, yeah, you could get gaming GPUs and string them… But it would be prohibitive. And even now most of that stuff's cleared out, I think, as Joe said. But the math is somewhat similar.
I'd say for these kinds of models, though, again, like a Hopper, Nvidia's new data center product has something that they call a transformer engine. What it really does is it allows you to do the training at a slightly lower precision. It lets you do it at eight bit floating point versus 16 bit. So it lets you get higher performance and then there's another process, there's like a conversion process sometimes that has to go, when you go from training to inference, it's something called quantization. And with these transformer engines, you don't have to do that. So it increases the efficiency, which you wouldn't get by by picking some random GPUs.
Joe: (35:58)
Where is Intel in this story?
Stacy: (36:00)
So let's talk about the other competitive options that are out there, right? Okay. So we talked about some of the captive silicon and hyperscalers. That is there and it is real and they're all building their own and they've been doing it forever and it hasn't slowed anything down the slightest because we're still early and the opportunity is big.
By the way, I will say, I don't worry so much about competition at this point because, think about it, Nvidia's run rate in their data center business right now, it's something like $15 billion a year. It's growing, but that's where it is. So Jensen, Nvidia's CEO, likes to throw out big numbers and he threw out, I think he said for silicon and hardware TAM in the data center, he thought that their TAM overtime was $300 billion. And it seemed kind of crazy, although I would say, it's seeming a little less and less crazy every day. But if you thought the TAM was $300 billion or $200 billion, $100 billion or like whatever, and they're run-rating at $15 billion, there's tons of headroom. Competition doesn't really matter.
And that's what we've seen. We've seen competition, but there's so much opportunity, like who cares, right? Versus like if you thought it was a $20 billion TAM, they would've a problem like already today. So that's why I don't worry too much because I think the opportunity is still very large relative to where they're running the business today.
In terms of other competitors though, so yes. So you mentioned, let's talk about AMD first. Cause AMD actually makes GPUs. They make data center GPUs, they don't sell very many of them. Their current product is something called the MI250 and they've sold de minimus basically. And in fact, you know, when the China sanctions were put on you know, we didn't talk about that, but the US has stopped allowing high-end AI chips from being shipped to China, right? The MI250 part was on the list, but it didn't affect them at all because they weren't selling any. So their sales were zero.
They've got another product coming out in the fall that's called the MI300. And people have been getting kind of excited about AMD, they've been sort of looking to play it as kind of like the poor man’s Nvidia. I'll be honest, I don't think it's the poor man’s Nvidia. Nvidia's been doing, you know, close to $4 billion a a quarter in data center revenues. I don't know that I see anything like that with the MI300. AMD, as far as I tell, has not even released any sort of specifications for what it looks like at this point.
But that is an option. And some people would say there's maybe some truth to this is, is if you want an alternative, AMD will present an alternative and if the opportunity's really that big, they'll get some. They'll probably get some, if you have that.
You have Intel. So Intel's got a few things. Om their CPUs, their current version is called Sapphire Rapids. It has AI-specific accelerators for inference. Not so much maybe for this kind of stuff, but for general inference activities. They're trying to play up the capabilities of their CPU on on that. Fine. And why are they doing that? It's because their accelerator roadmap isn't so good. So they have a GPU roadmap, the code name for it was Ponte Vecchio and they've kind of gutted that roadmap. So the follow on product was something called Rialto Bridge that they've since canceled.
And one of the Ponte Vecchio products recently they just canceled. And Ponte Vecchio originally was designed for the Aurora supercomputer and it was massively late. So they took a, how much was it? It was something like a $300 million charge. I think it was the end at the end of 2021. It was either the end of 2020 or end of 2021. Basically they gave it away it was so late. So that's how late they were. They also have another product. They bought an Israeli AI company called Habana. And Habana has a product called Gaudi. It's not a GPU exactly, but it's like a specific accelerator technology. And Amazon bought some of them and they sell a little bit, but again versus Intel's total revenues, it’s de minimus. So they're not really there.
There's also a bunch of startups. And the problem with most of the startups is their their story tends to be something like you know “We have a product that's 10 times as good as Nvidia”. And the issue is with every generation Nvidia has something that's 10 times as good as Nvidia. And they have the software ecosystem that goes with it. By the way, neither AMD nor Intel nor most of the startups have anything remotely resembling Nvidia's software. So that's another huge issue that all of them are facing.
There's a few startups that have some niche success. One of the one that's that's probably gotten the most, you know, attention is called Cerebrus. And their whole thing, they make a chip. Imagine taking a 300 millimeter silicon wafer, inscribing a square on it. That's their chip. It's like one chip per wafer. And so you can put very large models onto these chips and they've been deploying them for those kinds of things. But again the software becomes an issue.
But they've had a little bit of success. There's some other names that, you know, you've got Groq and some others I think that are still out there. And then there's a company called TensTorrent, which is interesting not because of so far what they're doing because it's early, but it's run now by Jim Keller. And do you guys know who Jim Keller is?
Tracy: (40:51)
I do not.
Stacy: (40:52)
Jim Keller, he's sort of like a star chip designer. He designed Apple's first custom processor. He designed AMD’s Zen and Epic roadmaps that they've been taking a lot of share with. He was even at Tesla for a while and at Intel. And so he's now running Tenstorrent. And they do, it's a Risc-V. Risc-V is another type of architecture, and and they do an AI chip. So Jim is running that.
Tracy: (41:14)
So can I just ask, based on that, I mean how CapEx intensive is developing chips that are well suited for AI versus other types of chips? And then secondly, where do the improvements come from or what are the improvements focused on? Is it speed or scale given the data sets involved and the parallel processes that you described?
Stacy: (41:42)
Yeah, so it's a few things. So in terms of CapEx intensive, these are mostly design companies, so they don't have a lot of CapEx. It's certainly R&D intensive. So maybe that that's what you're getting at. Nvidia spends many billions of dollars a year on R&D. And Nvidia has a little bit of an advantage too because it's effectively the same architecture between data center and gaming.
So they've got other volume effectively to sort of amortize some of those investments over. Although now, I mean this year, I mean data centers are probably 60% of Nvidia’s's revenues now. So I mean Nvidia is sort of, data center is the center of gravity for Nvidia now. But it's very R&D intensive and probably getting more so and you've got folks all up and down the value chain that are investing. Both the silicon guys, you know, and the cloud guys and the customers, and everything else. But I mean, that's kind of where we are.
In terms of what you're looking for, so there's a few things. You're looking for performance. On training, quite often that comes down to like ‘time to train’. So I've got a model, some of these models, I mean you, you could imagine, it could take weeks or months historically to train. And that's a problem. Like you want it to be faster. So if you can get that down, you know to weeks or to days or hours, that would be better. So that's one thing clearly that they work on.
Adjacent to the thing I was talking abou,t there's something around like scale out. So basically remember I said you're connecting lots and lots of these chips together. So for example, if I increase the number of chips by 10X, does my training time go back down by like a factor of 10? Or is it like by factor of two? So ideally you would want like linear scaling, right? As I add resources, it scales linearly.
Joe: (43:20)
So this kind of gets to my next question actually. And you know, we can talk with someone else about certain like AI fantasy dooms scenario.
Stacy: (43:30)
By the way, I'm not an AI architecture expert. I'm a dumb plasma engineer. So you may want to get an AI expert...
Joe: (43:38)
No, I know. But I am curious though, because I do think it relates to this question. Which is that okay, with each one, like GPT5 they're going to keep adding more knobs on the box, etc. Is your perception that the quality of the output is growing exponentially? Or is it the kind of thing where it's like GPT4, there's a lot more knobs and they got a big jump from GPT3. GPT5 will be way more knobs, but is it going to be marginally better? … What does the shape of the output curve going to look like and this sort of cost of you know, these chip developments in terms of getting there…
Stacy: (44:22)
It’s a couple of things. So first of all, when you're talking about large language models, accuracy is sort of a nebulous term. Because it's not just accuracy, it's also capability, like what can it do? And we can talk a little bit about what ChatGPT and GPT4 can do. And also, I think as you're going forward and you talk about the trajectories here, it's not just text, right? We're talking text to text. But there's also text to images and anybody played with like DALL-E where it's, you know, it's generating images from a text prompt. And now we've got like video, what is it? Was it mid-summer? Is that what it's called? Midjourney...
Joe: (44:55)
Midjourney, yeah.
Stacy: (44:55)
Midjourney, yeah. So it's creating like video prompts. I mean, so text is just the tip of the iceberg, I think, in terms of what we're going to to need.
Joe: (45:08)
But they're never going to get to where they could have three people having a conversation with voices that sound like Tracy, Joe and Stacy, right?
Stacy: (45:17)
Why?
Joe: (45:18)
No, I'm just kidding. It feels like we have one more year on this job.
Stacy: (45:20)
And maybe this gets to capabilities. So one thing with ChatGPT is it's very, very good — this is where I should worry about my job — because it's very good at sounding like it knows what it's talking about where maybe it doesn't. So maybe I should be worried about my job, you know and accuracy I think is a big issue. You have to remember...
Joe: (45:40)
But, so on this accuracy question. I assume, you know, like self-driving cars, when people were really hyped about them 10 years ago, they're like, “Oh, it's 95% solved.” We just need a little bit more. And then it's solved. And 10 years later, it feels like they haven't made any progress on that final 5%.
Stacy: (46:00)
Yeah, I mean these things are always a power log.
Joe: (46:02)
So this is my question. When we talk about accuracy or these things, are we at the point where is it going to be the kind of thing where it's like, yeah, GPT5 will definitely be better than GPT4, but it will be like 96% of the way there? That’s sort of what I’m trying to get at...
Stacy: (46:18)
Again, let me separate out, let me separate out accuracy from capability again. So accuracy, you have to remember, the model has no idea what accurate even means. It doesn't, remember these things are not actually intelligent. I know there's a lot of worry about what they call like AGI, like Artificial General Intelligence, right? I don't think this is it. This is predictive text. That's all.
The model doesn't know if it's spewing bull crap or truth, it has no idea. It's just predicting the next word in the thing. And it's because of what it's trained on. So you need to add on maybe other kinds of things to ensure accuracy, maybe to put guardrails or things like that. You may need to very carefully, more harsh, like your input data sets and things like that.
I think that's a problem now. I think it'll get solved. But this has already been an issue. You can take it the other way. I don’t know if it's the converse of it or not, but things like deep fakes. People are deliberately trying to use AI to deceive. I mean, this is just human nature. This is why we have problems. But I think they can work through that.
In terms of capabilities, it's really interesting to look at a response to a similar prompt between like ChatGPT and GPT4, and like what people are getting out of GPT4 It's miles ahead of some of the stuff that ChatGPT, which was trained on GPT3, the model, than what it's delivering in terms of nuance and color and every and everything else.
I think that's going to continue. We’re already at the point where these things could already pass the Turing Test. It can be very difficult to know — putting the question of accuracy aside — it's very difficult to know for some of these things, if you didn't know any better, whether it was coming from a real person or not. I think it's going to get harder and harder to tell like whether, you know, even if it's not, you know, “really thinking” it's going to be hard for us to tell what's really going on. That is sort of like other interesting you know, implications for what this might mean over the next five to 10 years.
Tracy: (48:35)
Just going back to the stock prices, I mean, we mentioned the Nvidia chart, which is up quite a lot, although it hasn't reached its peak back in 2021. The SOX index is recovering, but still below. And Intel, I mean, I won't even mention. But where are we in the semiconductor cycle? Because it feels like, on the one hand there's talk about excess capacity and orders starting to fall. But on the other hand, there is this real excitement about the future in the form of AI.
Stacy: (49:06)
Yes. So semis in general were pretty lousy last year. They've had a very strong year to date performance and the sector is up. The sector is up you know, 20-22% year to date, quite a bit above the overall market. And the reason is, to your point, we've been in a cycle. Numbers have been coming down, and we may have talked about this last time, I don't remember, but semiconductor investors, it turns out the best thing is to buy stocks in general after numbers come down, but before they hit bottom, like if you could buy them right before the last cut, if you could have perfect foresight — you never know when that is.
But I mean, numbers have have come down a lot. So forward estimates for the industry peaked last June, and they are down over 30%, like 35% since that point. It's actually the largest negative earnings revision we've had probably since the financial crisis. And people are looking for, you know, playing the bottoming theme and that hopefully things get better into the second half. You know, we get hopefully China reopening. And you got markets like, and this relates to Intel, like PCs and things where, you know, we've now corrected kind of we're back, we're on a pre-Covid run rate for PCs versus where we were and the CPUs, which were massively over shipping at the peak, they're now under shipping. And so we're in that inventory flush part of the cycle.
And so people have been sort of playing the space for that second half recovery. Now all that being said, if you look at the overall industry, if you look at numbers in the second half, they're actually above seasonal. So people are starting to bake in that cyclical recovery into the numbers. And if you look at inventories in just overall in the space, they are ludicrously high. I've actually never seen them this high before. So we've had some inventory correction, but we may just be getting started there. And if you look at valuations, I think the sector's trading at something like a 30% premium to the S&P 500, which is the largest premium we've had, again, probably since things normalized after the tech bubble or after the financial crisis at least.
So people have been playing this backhalf recovery, but yeah, we better get it. As it relates to some of the other names, some of the individual stocks, like you mentioned Intel. It's funny, you guys may not know this, I just upgraded Intel.
Joe: (51:15)
Oh, how come?
Stacy: (51:17)
The title of the note was ‘We Hate This Call, But It's The Right One’... I desperately would like to stay underperform. And it was not a “We like Intel” call. It was just, I think that that they're now under-shipping PCs by a wide margin. And I think for the first time in a while, the second half street numbers might actually be too low. So it's not like a super compelling call; I felt uncomfortable pushing it. Although they report earnings next week, I may be kicking myself.
Nvidia, however, it hasn't reached its prior peak from a stock price base. And the reason is the numbers have come down a lot. I mean, let's be honest, the gaming business was inflated significantly by crypto, right? And so that's all come out right? And then, you know, with data center, we had some impacts from China. China in general was weak. And then we had some of the export controls that they had to work their way around. And so we had some issues there. Now, all of that being said, graphics cards in gaming, we talked about some of these inventory corrections, graphics cards actually crafted the most and the most rapidly. So those have already hit bottom and they're growing again.
And Nvidia's got a product cycle there that they just kicked off. The new cards are called Lovelace. And they look really good and especially at the high end, and they're starting to fill out like the rest of the stack. So gaming's okay.
And then in data centering, again, this generative AI has really caught everybody's fancy. And Nvidia had a data center and they're saying that they're at the beginning of a product cycle in data center. And you know, they had an event a couple weeks ago, their GTC event, where they actually basically, I mean directly said “we're seeing upside from generative AI even now.”
So people have been buying Nvidia on that thesis. And the last time the stock hit these peaks, at least in terms of valuation, the issue is we were at the peak of their product cycles and numbers came down. This time valuations kind of went back to where they were at those peaks, but we're at the beginning of the product cycles and numbers are probably going up, not down. So that's why.
Joe: (53:16)
Stacy, I joked at the beginning that we could talk about this for three hours and I'm sure we could, because it’s such a deep area. But that was a great overview of just like the state of competition, the state of play and the economics of this in a very good way for us to sort of enter talking about AI stuff more broadly. Thank you so much for coming back on Odd Lots.
Stacy: (53:39)
My pleasure. Anytime you guys want me here, just let me know.
Joe: (53:42)
We’ll have you back next week for Intel.
Tracy: (53:47)
Thanks Stacy!
Joe: (54:01)
I really like talking to Stacy. He's really good at explaining complicated things.
Tracy: (54:07)
Yeah, I know he made a point of saying that he's not an AI expert, but I thought he did a pretty good job of explaining it. I do think the trajectory of how all this — I mean this is such an obvious thing to say — but it's going to be really interesting to watch. And how businesses adapt to this and what's kind of fascinating to me is that we're already seeing that differentiation play out in the market with Nvidia shares up quite a bit and Intel, which is seen as not as competitive in the space, down quite a bit.
Joe: (54:37)
I was really interested in some of his points about software in particular.
Tracy: (54:43)
Yeah, I hadn't realized that.
Joe: (54:44)
Sometimes I see, like someone will post on Twitter, it's like, “look at this cool thing Nvidia just rolled out where they can make your face look like something else or whatever.” But thinking about how important that is in terms of like, “okay, you and I wanna start an AI company.” We have a model to train. There's going to be a big advantage going with the company that has this huge wealth of libraries and code bases and specific tools around specific industries as opposed to, it seems like where some of the other competitors are, where it's just much more technically challenging to even like use the chips, if they exist, like Google's TPUs.
Tracy: (55:27)
Totally. The other thing that caught my attention. And I know these are very different spaces in many ways, but there's so much of the terminology that's very reminiscent of crypto. So just the idea of like “AI winter” and a “crypto winter.” And you can see, I mean you can see the pivot happening right now from like crypto people moving into AI. So that's going to be interesting to watch play out — how much of it is hype, a classic sort of Gartner Hype Cycle versus the real thing.
Joe: (55:58)
But you know, two things I would say. So two things I think would be interesting. It'd be interesting to go back to like past AI summers. Like what were some past periods which people thought we made this breakthrough and then what happened? So that might be interesting.
And then the other thing is, look, you know in 2023, I have never actually found a reason I've ever felt compelled to use a blockchain for something. And I get use out of ChatGPT almost every day. And so for example we recently did an episode on lending and so I was like, “Oh, what is the difference sort of structurally between the leveraged loan market and the private debt market?” And I thought this might be an interesting question for a ChatGPT. And I got very useful clean answer from it that I couldn't have gotten perhaps as easily from a Google search. So I do think some of these hype cycles are really useful, but I am already in my daily life getting use out of this technology in a way that I cannot say for anything related to like Web3.
Tracy: (57:08)
No, that is very true. And you know, the fact that this only came out a few months ago and everyone has been talking about it and experimenting with it kind of speaks for itself. Shall we leave it there?
Joe: (57:18)
Let’s leave it there.
You can follow Stacy Rasgon on Twitter at @srasgon.