Here Are the Actual Mechanics Behind Powering AI

Artificial Intelligence is all the rage right now and most of the investor excitement has so far been focused on the companies providing the hardware and computing power to actually run this new technology. So how does it all work and what does it actually take to run these complex models? On this episode, we speak with Brannin McBee, co-founder of CoreWeave, which provides cloud computing services based on GPUs, the type of chips pioneered by Nvidia and which have now become immensely popular for generative AI. He walks us through the infrastructure involved in powering AI, how difficult it is to get chips right now, who has them, and how the landscape might change in the future. This transcript has been lightly edited for clarity.

Key insights from the pod:
What does CoreWeave do in the space? — 5:38
What does building infrastructure for AI entail? — 6:57
Shortages at existing data centers — 10:28
Connecting GPUs vs CPUs — 14:07
Relationship with Nvidia — 17:57
Scale of compute needed for AI — 20:31
How quickly can hyperscalers ramp up? — 24:17
Competition from other chips — 26:40
Competing with Hyperscalers — 31:06
Pivot from Ethereum mining — 35:21
---
Joe Weisenthal (00:10):
Hello and welcome to another episode of the Odd Lots podcast. I'm Joe Weisenthal.

Tracy Alloway (00:15):
And I'm Tracy Alloway.

Joe (00:16):
Tracy, have you looked at [the] Nvidia stock chart lately? And by lately, I don't mean over the last two years, I mean just like over the last two weeks or two months.

Tracy (00:24):
I don't need to look at it because everyone keeps talking about it, so I know what's happening.

Joe (00:28):
You know what I'm pretty happy about? You know, we did that episode like two months ago with Stacy Rasgon, and we were like, "What's up with Nvidia? I know it's at the center of the AI chips boom and whatever.” And then we did that episode and it came out. And then a week later, they just knocked it out of the park.

Tracy (00:48):
The stock took off.

Joe (00:49):
Yeah. So, you know,

Tracy (00:50):
We were early?

Joe (00:51):
We were at least a good two weeks early on that.

Tracy (00:54):

Yeah. Hey, two weeks matters.

Joe (00:56):
I'll take it. I'll take it. So clearly something that – and we talked about this with Stacy – something that Nvidia has is everyone's trying to buy it, everyone's trying to get it. But then it raises the next question of like, okay, but what is that market like? How do you buy a chip?

Tracy (01:12):
Yeah. How do you buy a chip? And then I guess what do you actually do with it once you have it? Because my impression is that for a lot of these AI applications, the way you use the chips, the way you set up the data centers is very, very different to what we've seen in the past.

And I think also what Nvidia is doing now is kind of different, but maybe we can get into this with our guests. My impression is they're trying to create sort of like a holistic approach for customers where they provide not just the hardware, but also some services to go along with it.

Joe (01:45):
Yes. Right. And like all the software and Stacy talked about that with the Cuda ecosystem.

Tracy (01:50):
Cuda, that was it.

Joe (01:50):
How dominant that is But right. What do you do with it? How do you get one, what would we do, Tracy, if a big pallet of Nvidia chips wound up here?

Tracy (02:01):
Joe, you want to know a secret?

Joe (02:03):
Yeah?

Tracy (02:03):
My basement is filled with H100 chips. Just got a pile of them. It came with the house

Joe (02:08):
It was on that ship that was stuck on the Chesapeake and instead of getting your couch, you got...

Tracy (02:13):
I just got a pallet of H100s.

Joe (02:15):
Yeah, of course. We're manifesting that into reality. So anyway, how this world works. So essentially like the trading and dealing of the hottest commodity in the world, right? Which is these advanced chips from AI and how that works and who can get one, I still think is a sort of mystery that we need to delve further into this question.

Tracy (02:38):
I agree. And there is also, there's a lot of excitement around it right now for the obvious reasons of everyone is really into generative AI.

Joe (02:46):
Yes.

Tracy (02:46):
And Nvidia stock is exploding as we already talked about. But we're also seeing a lot of previous, I guess, consumers of chips like the crypto miners start to pivot into the space. And I'd be curious to see what they're doing in it as well, and how much of that is just desperation versus a real business opportunity

Joe (03:08):
And the video game market!

Tracy (03:09):
Oh, totally. I forgot about video games.

Joe (03:11):
Which was the other thing. For years I thought of Nvidia as the video game company. Because they had their logo on Xbox.

Tracy (03:17):
And how realistic is that pivot? What proportion of those types of chips can be used for AI now?

Joe (03:24):
Well, I'm very excited. We do have, I believe, the perfect guest. We are going to be speaking with Brannin McBee. He is the chief strategy officer and co-founder of CoreWeave, which is a specialized cloud services provider that's basically providing this sort of high-volume compute to AI-type companies. They recently raised over $400 million and have been in this space for a little while. So Brannin, thank you so much for coming on Odd Lots.

Brannin McBee (03:51):
Thanks for the opportunity, guys. Really excited to chat with you all today.

Joe (03:54):
So if Tracy and I – I don't know why they would do this – but if like some VC was like, you know, "We want you to do OddLotsGPT, we want you to like do a core, base a large language model off of all the work you've done. We want you to compete with OpenAI." And they gave us like, I don't know, some hundred million dollar raise. If they said, go do your startup, could I call Nvidia and buy chips? Would I be able to like get in the door there?

Brannin (04:22):
Gosh, I mean I think you and everyone else is asking that question. And you're going to have a huge problem doing that right now. It's mostly just around how much in demand this infrastructure became, right? I mean, you could argue it's one of the most critical pieces of information technology resources on the planet right now. And suddenly everyone needs it.

And, you know, I like to contextualize it in that the pace of software adoption for AI is one of the fastest adoption curves we've ever seen, right? Like you're hitting these milestones faster than any other software platform previously. And now all of a sudden you're asking for infrastructure built to keep up with that, right? A space that traditionally takes more time. And it has created this massive supply-demand imbalance just on in-place infrastructure today. And not all the infrastructure is available to purchase and it's an issue that is going to be ongoing for a bit as well we think.

Tracy (05:23):
So can I ask the basic question, which is: CoreWeave, what do you do exactly? Joe mentioned the capital raise, which I think has you valued at something like $2 billion. So congrats. But what exactly are you doing here?

Brannin (05:38):
Yeah, thank you. So CoreWeave is a specialized cloud service provider that is focused on highly parallelizable workloads. So we build and operate the world's most performant GPU infrastructure at scale and predominantly serve three sectors. That's the artificial intelligence sector, the media and entertainment sector, and the computational chemistry sector.

So we specialize in building this infrastructure at supercomputer scale. It's like quite literally, you know, 16,000 GPU fabric. And we can get into all the details and how complex that is, but we build that so that entities can come in and train these next-generation foundation machine-learning models on.

And, you know, we found ourselves in a spot where we can do that better than literally anyone else in the market and do it on a timeline that's faster. We’re, I think, the only entity with H100 available to clients at scale globally today.

Tracy (06:32):

So you have an actual basement full of H100 chips. Well, you know, when you say infrastructure – “we help clients build out the infrastructure” – help us conceptualize this. What does the infrastructure for this type of AI actually look like and how does it differ to infrastructure for other types of large-scale technology projects?

Brannin (06:57):
Yeah, totally. So, you know, I think during the last Nvidia quarterly earnings call, Jensen put this a really great way in the Q&A session. He said that we were at the first year of a decade long modernization of the data center. Making the data center intelligent, right?

You can kind of suggest that the last generation, or the 2010s data center was comprised of CPU compute storage and these things that, you know, didn't really work together that intelligently. And the way that Nvidia has positioned itself is to make it a smart data center that's like smart routing of data, of packets, of different pieces of infrastructure in there that's all focused on how do you expand the throughput and communicability of and in-between pieces of infrastructure, right? It's this amazingly different approach to data center deployments.

And so the way that we're building it, and we're working with Nvidia infrastructure, we design everything to a DGX reference spec. A DGX is Nvidia’s, like how do you draw the most performance out of Nvidia infrastructure as possible with all the ancillary components associated with it.

So all this stuff is going into what's qualified as a Tier 3 or a Tier 4 data center. We co-locate within these things. So we're not quite building in a basement even though in our past history, we certainly, you know, had a time doing that. But this is within, you know, just amazing co-location sites that are operated by our partners such as Switch, right?

So a Tier 3 or Tier 4 site is something that's qualified based on its ability to serve workloads with an extremely high uptime. So we're talking like 99.999% uptime rate. And that's guaranteed by its power redundancy, its internet redundancy and its security. And then ultimately like it's connectivity to the internet backbone, right? So as a first step, you're housed within these data centers that are just critical parts of the internet infrastructure. And then from there you start building out the servers within there. And I can go into that detail.

Joe (09:10):
So you mentioned, actually, I want to define some terms. Can you just really quickly before we move on: Tier 3, Tier 4 – what do you mean by this?

Brannin (09:18):
Yeah, so Tier 3, Tier 4. This all goes back to like the quality of the data center that you're in. It's all about the reliability and uptime that you should be able to achieve out of that data center. It's another way to qualify the services around it. So like power, you get redundant power, right? Like multiple power services in case one goes offline, there's another one. You get, you know, redundant cooling. You get redundant internet connectivity. It's all these services that like have extra fail safes that allow for you to operate at the highest uptime and security level possible.

Joe (09:52):
Is higher tier better? Like Tier 3, 4, is that better than Tier 1 and 2?

Brannin (09:56):
That's correct.

Joe (09:57):
Okay. So quick follow-up question then. You know, we're interested in like, okay, where the rubber hits the road, the scarcity is here. Let's say Tracy miraculously opens her basement and there really is you know, all these pallets of these Nvidia chips there. Is there capacity at the data centers right now? She's like "You know what? We want to co-locate with you. You guys have great power, you're pretty well connected to the internet. You have like good security guards. That's operated 24/7. We want to set something up. Is there space there?”

Brannin (10:28):
Yeah, it's a fantastic question. It's an issue that didn't really pop up until really in the last eight weeks or so.

Joe (10:36):
Whoa! It's really happening that fast?

Brannin (10:38):
It's happening that fast, Joe.

Tracy (10:42):
So the two-week lead time on Nvidia is very important, Joe.

Joe (10:45):
You're right, you're right. Wow. Wait, what happened? So wait, what happened? Describe 16 weeks ago versus eight weeks ago?

Brannin (10:54):
Sure. Even last year, right? So this is a space – the data center space, co-location space – that's been fairly chronically under-invested in because the hyperscalers just built out their own data centers instead.

But what's happened is the infrastructure changed. The type of compute that we're putting in these data centers, it's different than the last generation, right? So we're predominantly focused on GPU compute instead of CPU compute. And GPU compute, it's about four times more power dense than CPU compute. And that throws the data center planning into chaos, right? Because ultimately, let's say you have a 10,000 square foot room in the data center, right? And you have a certain amount of power, let's call it a hundred units of power that go into that 10,000 square feet. Well, because I'm four times more power dense, it means that now I take those a hundred units of power, but I only require about 25% of that data center footprint. Or in other words, 2,500 square feet within that 10,000 square foot footprint.

So that then leads to not only is the space in the data center being used inefficiently now. Because you theoretically have to run more power into the data center to use that full 10,000 square feet due to the power density delta. But now you have cooling issues, right? Because you designed that footprint to be able to cool 10,000 square feet spread out across that entire area. But now you're dropping total power.

Joe (12:21)
:
Sorry, I just want to back up because this is extremely interesting, so I just want to get the detail right and then move on. Given an X amount of power at a hundred units of power, what you are saying is that with this next generation of compute, it that's now only sufficient for a quarter of the data center? In other words, that to power the whole space, you really would need like 4X the power?

Brannin (12:53):
That's accurate. And the complication really arises out of the cooling that that's required from that, right? So if you imagine you could cool a 10,000 square foot space and you design for that, that's one thing. But now if you have to cool in a much more dense area, that's a different type of cooling requirement. And so that's led to this issue where there's only a certain subset of Tier 3 and 4 data centers across the US that are currently designed for, or can quickly be designed and changed, to be able to accommodate this new power density issue.

So now not only, like, if you had all those H100 s in your basement, you might not have a place to plug them into. And that's become a pretty big problem for the industry very quickly and truly has only arisen in the last eight weeks or so and it's going to persist for a few quarters.

Tracy (13:46):
So you were describing the difference between CPU and GPU. How do you actually connect these different types of chips together? Because I imagine, you know, old data centers, I guess you just have a bunch of like ethernet cables or something like that. But for this type of processing power, do you need something different?

Brannin (14:07):
That's exactly correct, Tracy. So the legacy generalized compute data centers are really what the hyperscalers look like. You know, Amazon, Google, Microsoft, Oracle. They predominantly use something that's called ethernet to connect all the servers together. And the reason you use that was, you know, you don't really need to have high data throughput to connect all these servers together, right? They just need to be able to send some messages back and forth. They talk to each other about what they're working on but they're not necessarily doing highly collaborative tasks that require moving lots of data in between each other. That has changed.

So today, what people are focused on and need to build are these effectively supercomputers, right? And so we refer to the connectivity between them – the network between them – as a fabric, right? It's called a network fabric. So if we're building something to help train like the next generation GPT model, typically clients are coming to us saying, "Hey, I need a 16,000 GPU fabric of H100." So there's about eight GPUs that go into each server. And then you have to run this connectivity between each one of those servers.

But it's now done in a different way, to your point. So we're using a Nvidia technology called InfiniBand, which has the highest data throughput to connect each of these devices together. And, you know, taking this 16,000 GPU cluster as an example, there's two crazy numbers in here. One is that there are 48,000 discrete connections that need to be made, right? Like plugging one thing in from one computer to another computer. But there's lots of switches and routers that are between there. But you need to do those 48,000 times. And it takes over 500 miles of fiber optic cabling to do that successfully across the 16,000 GPU cluster. And now again, you're doing that within a small space with a ton of power density, with a ton of cooling, and it's just a completely different way to build this infrastructure.

And it's just because the requirements have changed, right? Like we've moved into this area where we are, you know, designing next generation AI models and it requires a completely different type of compute. And it has just caught the whole sector by surprise. So much so that, you know, it's really challenging to go procure it at the hyperscalers today because they didn't specialize in building it.

And that's where CoreWeave comes in. We only focus on building this type of compute for clients. It's our specialty. We hire all of our engineering around it, all of our research goes into it and it has been a fantastic spot to be. But, you know, our goal at the end of the day is just to be able to get this infrastructure into the hands of end consumers so that they can build the amazing AI companies that everyone's looking forward to using and incorporating into enterprises and software companies.

Tracy (17:21):
You know, you mentioned these special or purpose-built connections that Nvidia is making. And this kind of leads nicely into my next question, which is, what exactly is your relationship with Nvidia? And in order to provide this type of service – vast amounts of processing power that is well-suited to a particular type of technology, in this case AI – do you have to have a really good relationship with Nvidia to make that work? Like do you have to have special access to H100s and other chips?

Brannin (17:57):
It's a great question and I'll try to offer it from Nvidia's perspective. It goes a little bit back to the answer I just provided as well. I would think from Nvidia's seat, what's most important is empowering end users of their compute to be able to access their compute in the most performant variant possible at scale. And to be able to access it quickly, right? Like a new generation comes out, they want to be able to get their hands on it, right?

And we've built CoreWeave around hitting every single one of those check boxes, right? We build it at DGX reference spec, we build it at scale, and we bring it online on a timeline that's, you know, within months of a next generation chipset launch as opposed to the more traditional legacy hyperscalers that take quarters at a time.

So us being in a position to do that has enabled fantastic access within Nvidia. And we have a history of consistently executing on exactly what we say we'll do, right? We underpromise and overdeliver as a business, and I think that's just put us in this place where Nvidia has confidence in allocating infrastructure to us because they know it's going to come online. They know it's going to get the consumers faster than anyone else in the market, and they know it's going to be delivered in its most performant configuration that exists.

Joe (19:19):
You know, as I listened to some of these answers, I keep having these like [imagined scenarios]. Like, you know, there's probably some random industrial company that's traded on the S&P 400 that makes some cooling fluid whose sales are going to be up 10X. So I'm like Googling while we're talking like,“Oh, what is a company that makes cooling fluid?” Or like “Who is some company that's really good at making these like InfiniBands?” Because it just seems like...

Tracy (19:45):
Invest in HVAC!

Joe (19:46):
Right! Yeah. You know there's going to be some charts...

Tracy (19:51):
The secondary plays.

Joe (19:52):
Yeah. Or tertiary plays that are like 30X up. But, you know, I want to get a sense from you of, so it’s really changed a lot in the last several months, we see it from Nvidia results and what you're describing. How big is the market getting? And I know with AI there's training and they sort of build the model, and then there's inference and the inference is how they spit out the results. Can you talk a little bit about what you are seeing in terms of the growth of both of those aspects of AI? Which is bigger and which is growing faster? And how do they compare to the size of the installed compute base that already exists?

Brannin (20:31):
Oh, absolutely. So this is one of my favorite topics because it is just mind-blowing. The scale that's going to be needed to support AI and the scale of this infrastructure. So, okay. So today most of the funding that's going into the AI space is for funding to train next generation foundation models, right? So when a company is raising a bunch of money at the end of the day, most of that money is going into cloud compute to go train this next generation of that model, to build that intellectual property. So they have this model and they can go bring it into the inference market.

And what I would say is we're having a supply-demand issue. Like a chip access crunch in the training phase. Where in reality, the scale of the inference market is where all the demand truly is going to sit. So what I'd offer to help contextualize that is – let's take, you know, there's some well-known models in the market today. Let's say there's an end market trained model, and it took about, let's say 10,000X A100 or so to train. A100 is the last generation GPU, but it still applies in terms of relative scale here. So that company that used 10,000 A100 to train their model, our understanding is they're going to need about a million GPUs within one to two years of launch to support the entire inference demand.

Joe (22:00):
Sorry, you could train the model on 10,000 of these chips. 10,000 of these systems, whatever they are. And then if they're actually going to be in the market and sell something or provide some service to make it worthwhile, they're going to need a million?

Brannin (22:15):
A million. And I think that's just within first two years of launch, Joe. We're talking about something that's going to continue growing afterwards. And so, what does a million GPUs mean? Obviously, right. So, you know, a couple, I think it was like end of last year — all the hyperscalers combined, right? Amazon, Google, Microsoft, Oracle, you could throw CoreWeave in there. There was about, you know, 500,000 GPUs globally, right? Available across those platforms. I'd say at the end of this year it'll be closer to a million or so.

But that's suggesting then that one AI company with one model could consume the entire global footprint of GPUs. And now you start to think “Wait. Aren't there a bunch of other companies training these models in market right now?” And I would say yes, there are. So it could imply that there are, in the short term, the demand of several million GPUs just to support the inference market. And there's just nowhere near enough globally of this infrastructure.

And it's going to be a big challenge for the market as we exit this training phase and move into the productization, or really just the commercialization of these models. Like how do you generate revenue off them? And it's something that I don't think many people truly understand just the amount of scale and construction that needs to take place. And now you put that in the same framework of the data centers that we were talking about, right? So there's lack of data center space, there's lack of chipset supply. It’s going to be an issue for years that we see.

Tracy (23:50):
So when it comes to scale, you know, you ca you keep mentioning the hyperscalers, which is a great term, but people like Amazon, Google, I guess, Microsoft, IBM, etc. What is your impression of how quickly they are able to ramp up in this space? How fast could they react to some of the trends that you've been outlining?

Brannin (24:17):
Yeah, so I can offer what I'm seeing today. You know, the, the H100s started to be distributed globally to all of us, right? Like all the entities that have these upper tier relationships with Nvidia back in March, right? So we started getting them this infrastructure online in April, really scaling in May. And, you know, we have builds going on at 10 data centers across the US right now, and we're delivering it to clients.

The guidance that we're seeing from the hyperscalers is that they're not going to begin delivering scale access to the H100 chipset until late Q3, maybe mid-Q4. And some of them are even beginning to guide into Q1. And it's all driven by the fact that this is just a different type of compute that they're building relative to last generation, right?

You're no longer just running ethernet to your point between all these devices. You're not just plugging in CPU blades, you're having to deal with like totally different data center power density and cooling requirements. You're having to build supercomputers instead with 500 miles of fiber and all these connections. It's just a completely different way to build the cloud. And it's taking them some time to catch up because you have to retrain entire organizations to do this.

So, you know, as of now, I'd say the direct answer is three quarters after a chipset launch, but it might take longer. And I think that's all going to contribute to this just kind of slower ability to scale infrastructure than what's being dictated by the adoption rate of AI software. And it's going to lead to this supply-demand imbalance that will just last for a while.

Tracy (25:57):
You know, you keep mentioning – or we both keep mentioning – the H100 for obvious reasons, but do you look at other chips? Or, what would happen to your own business if, for instance, a new chip was developed that could do the same thing or better than an Nvidia H100? Like for instance, I hear a lot of excitement about some of the stuff that AMD is developing. And I'm not a chips expert, except maybe when it comes to Fritos or Lays, but how big a difference would that make to you if we suddenly got a different chip manufacturer gain prominence in AI?

Brannin (26:40):
Sure. So I'd offer kind of two broad responses. One, typically when you train a model, you're going to use the same chips for inference on that model as well, right? So GPT-4, for example, it was trained on A100s, they're predominantly going to use A100s going forward. You might fit in some kind of newer generation hyper efficient chips into there, but it's not like you need a GPU with more VRAM on it, right? Like you're going to need your 40GB or your 80GB RAM chip, because that's the size of the model that you trained, right? You're not going to need like next multiple generations.

You're not going to really to be able to adopt them to change the efficiency of serving that model. So what we view is that a chip's lifespan is, like, its first two to three years is spent training models, and then it's next four to five years is spent doing inference for those models that it trained. And then within there as well, you do this thing called fine-tuning, which is updating the model with new information, right? Like how do you keep a model up-to-date with what's happened on Twitter or what's happened in the media, right? You have to keep retraining it, right? And you'll use those same chips to do that.

So your question on other chipsets — and this is something that we have a particularly interesting view into because we have, you know, call it 650 AI clients, right? — and we're having conversations with them daily to ensure that we are meeting or scaling their demands. So it gives us a look into six to 12 months into the future what type of infrastructure they expect to need. And it's overwhelmingly people still want access to Nvidia chips. And the reason for this is something that dates back, I think it's nearly 15 years, when Nvidia and Jensen made the decision to open source Cuda and to make this software set accessible to the machine learning community.

And you know, today, if you go to GitHub and you search a machine learning project, they just all reference Cuda drivers, and he's established this utter dominance of ecosystem around his compute within the ML space, really similar to like the X86 instruction set for CPU versus ARM, right? Like X86 is used predominantly. ARM has been trying to find its way into the space for a while now, and it has just really struggled because all the engineers and developers are used to X86. Similar to how all the engineers and developers in the AI space are used to using Cuda.

So it's something that like, obviously AMD is highly incentivized to find a way into the sector, but they just don't have the ecosystem. And it's a huge moat to deal with and, you know, kudos to Nvidia for establishing themselves and having the patience to stick with it and to continue to support that community over the last 15 years. And it's really paying off for them in spades today.

You know, if the demand comes for that infrastructure at some point, we can run other pieces of infrastructure within our data center. But I also find that Nvidia has such an advantage on the competition with not only its GPUs, but all of its components that support the GPUs like the InfiniBand fabric, that it's going to be a really difficult company to displace from the market in terms of the best standard for AI infrastructure.

Joe (30:12):
Can I ask you a question? And I want to ask this politely because it's not intended to be accusatory or anything like that, so I don't want you to hear it as such, but when you're talking about hyperscalers – Amazon, Google, Microsoft, and you know, kind of CoreWeave – and it's like, okay, those are trillion-dollar companies and you're a $2 billion company.

And I know they're all talking about AI, etc. Can you still just like explain to me a little bit, why aren't they just going to, frankly, steamroll you or be able to, let's put it this way, be able to – okay, maybe it'll take a few quarters to reevaluate things – but, you know, eventually this just becomes this sort of de facto offering from these big companies that have these huge cloud budgets that must be orders of magnitude larger than yours?

Brannin (31:06):
Yeah, yeah. I would really love to be able to have access to their cost of capital, that's for sure. So the way I talk about this is we don't have a silver bullet necessarily, right? I can't point to like a super-secret piece of technology that we put inside of our servers or anything along those lines.

But the way I like to broadly contextualize it, is referencing another sector. And it's that Ford should be able to produce a Model Y, right? Like, they have the budget, they have the people, they have the decades of expertise. But in order to ask them to produce a Model Y, you would have to ask them to foundationally change the way that they produce a vehicle, all the way from research to servicing and that entire mechanism. It's a giant organization. Now you have to go ask that huge organization of people to change the way that they go about producing things.

Joe (32:04):
And I get that, but just to push back a little bit, and this is a theme that comes up in various flavors on Odd Lots a lot, which is that it's really hard to replicate tacit knowledge within a corporation. And we see that with companies that make semiconductor equipment. We see that with companies that make airplanes. We see that with real estate developers that know how to turn an office building into a condo. And so I think this is a deep point, but you know, they are offering AI stuff. I can look at Google right now, there's Cloud AI. And there's Azure AI, and they all have their announcements. So I'm still trying to understand. What is it that you are offering that all the hyperscalers — they all say they have AI offerings. So what is the difference between what you have and what they say is like their AI compute platforms?

Brannin (32:54):
Absolutely. And this will really depend on how much technical detail you'd like for me to get into, but broadly through infrastructure differentiation, literally using different components to build our cloud. And through software differentiation, we use different pieces of software to operate and optimize our cloud. We're able to deliver a product that's about 40% to 60% more efficient on a workload-adjusted basis than what you find across any of the hyperscalers.

So in other words, if you were to take the same workload or go do the same process at a hyperscaler on the exact same GPU compute versus CoreWeave, we're going to be 40% to 60% more efficient at doing that because of the way that we've configured everything relative to the hyperscalers.

And it comes back to this analogy between like why Ford can't produce Model Y? Again. Like, they can. These are trillion dollar companies we're talking about. To your point, they have the budget, they have the personnel, and they certainly have the motivation to do so. But you know, it's not just one singular thing they have to change. It's a completely different way to building their business that they would have to orchestrate.

And — what's the analogy? It’s however many miles it takes to turn an aircraft carrier, right? It's going to take them a while to do that. And I think if they do get there at some point, which, you know, I don't disagree with you, they're certainly motivated to, it's going to have taken them some time, literally years, to get there. And they're going to look really similar to us. And meanwhile, I've dominated market share and I've really established my product and market, and I'll continue to differentiate myself on the software-side business as well.

Tracy (34:48):
Since we're on the topic of adaptation, can I ask about your own evolution as a company? Because I think I've read that you started out in Ethereum mining, and at one point, I'm pretty sure crypto mining was a substantial, if not the biggest portion of your business, but you have clearly adapted or pivoted into this AI space. So what has that been like? And can you maybe describe some of the trends that you've seen over your history?

Brannin (35:21):
Yes, absolutely. And you're right. We did start within the cryptocurrency space back in 2017 or so, and that was spawned out of just, frankly, curiosity from a group of former commodity traders. So myself, my two co-founders; we ran hedge funds, we ran family offices. So we traded in these energy markets, so we were always attracted to supply-demand mechanics.

But what attracted us within cryptocurrency was there's this arbitrage opportunity that was a permissionless revenue stream, right? I knew the cost of power; I knew what the hardware could generate in terms of revenue with using a power input. Plus it's effectively an arbitrage, right? So we explored that, we had some infrastructure operating literally in our basements, as you said. And then that like quickly turned into scaling across warehouses. And at some point, in 2018, maybe late 2018, we were the largest Ethereum miner in North America. We were operating over 50,000 GPUs. We represented over 1% of the Ethereum network.

But during that whole time, we just kept coming back to the idea that there's no moat, there's no advantage that we could create for ourselves relative to our competitors, right? Like sure, you could maybe focus on power price and just kind of chase the cheapest power, but that just felt like chasing to the bottom of the bucket, right?

You know, I think an area we could have gone into is producing our own chips, right? Because if you produce your own chips and you run the mining equipment before anyone else has access to it, then you have an advantage for that period. But, you know, we weren't going to go design and fab our own chips. So what we kept coming back to was this GPU compute. Man, like what if it could do other things, right? Like what if we could develop uncorrelated optionality into multiple high-growth markets, right?

And those markets are where we predominantly sit today within artificial intelligence, media and entertainment and computational chemistry. And the original thesis was, well, whenever our compute isn't being allocated into those sectors, we'll just have it mining cryptocurrency and we'll build out this fantastic company that has 100% utilization rate across the infrastructure. Because it could switch immediately from being released from an AI workload into going back into the Ethereum network.

And we did get a brief glimpse of being able to operate that way in 2021 as we had our cloud live and we had AI clients in place, but Ethereum mining effectively ended during the merge in Q3 of 2022.

But I'd say the other thing that we never appreciated was the utter complexity of running a CSP. Forgetting about the software side of the business, which in and of itself, we spent about four years developing the software to build a modern cloud to do infrastructure orchestration and actually be a cloud service provider. The components themselves that the sector broadly used for crypto mining were these retail-grade GPUs, right? The kind of things that you plug in your desktop to go play...

Tracy (38:36):
Right, the video game cards. .

Joe (38:37):
Yeah, they were like selling them on StockX.

Brannin (38:40):
Yes. Yes. It was crazy during that period to get your hands on that infrastructure for crypto mining.

Joe (38:46):
And all the video gamers hated the crypto people, right? Because they're lik, "I want to like play this game," and they would like line up, what was it? At the GameStop and the GeekWire shop and all that. And they couldn't get it because you got, not you, but crypto people were getting access to the chips first and getting more value out of them so that you could bid them up.

Brannin (39:05):
We were certainly part of the problem. And that that's absolutely correct. But you know, what we found ultimately is those chips, that's not what you run enterprise-grade workloads on. That's not what's supporting, you know, the largest AI companies in the world. And starting in 2019, we stopped buying any of those chips and only focused on purchasing enterprise-grade GPU chipsets that, you know, Nvidia has probably about 12 different SKUs that they offer, including A100 and H100 chips and really oriented our business around it.

So I don't expect to see much repurposing of this kind of older retail grade GPU equipment that was used for crypto mining because in crypto mining, you want to buy the cheapest chip that can do the thing for it, right? That can participate in crypto mining. But there's a huge difference in price between a retail plug-it-into-your-computer-so-you-can-play-video-games chip and an enterprise grade. You can run it 24/7, there's not going to be downtime. You're going to have a low failure rate. Like there's a large technology difference and there's a large pricing difference between those.

And the crypto miners, you only needed the retail-grade chip. Because, you know, if it went down for 2%, 5% of the time for a failure rate, that's not a big deal. But the tolerance, the uptime tolerance, for these enterprise grade workloads, it's measured on the thousandths of a percent. And it's a different type of infrastructure. So we don't expect to see the components really being reused, if at all.

And then the other variable – going back to the very beginning of our conversation – are the data centers in which these are housed. So Joe, to your point earlier, you know, we sit within Tier 3, Tier 4 data centers and that's basically the broad industry standard for being able to serve these kinds of workloads.

The crypto miners sat within Tier 0, Tier 1 data centers. And these things are highly interruptible. They do like really interesting things like helping load balance the power markets in places like ERCOT, right? Like they'll shut down when power prices go too high and it load balances the grid. But enterprise AI workloads don't have a tolerance for that. Their tolerance, again, is measured on the thousandths of a percentage in terms of uptime. So not only does the infrastructure not work from crypto mining, but the data centers that they built within don't work either. The way that they're currently configured now, they could potentially convert their sites into Tier 3 and Tier 4 data centers.

I'll tell you that in and of itself, that is an extremely challenging task and it takes a lot of proprietary knowledge and industry expertise to do so. It's not just throwing a few fans in a room and a few air conditioning units. It honestly feels like walking to a spaceship.

Joe (41:58):
Tracy, this is an episode, I don't know about you, Tracy. There's like another six follow-on episodes. No seriously, like the whole like data center market and the coolant and all, you know, the electricity. There's so many different rabbit holes you could go down just like with the infrastructure you're talking about.

Tracy (42:15):
For sure. And I think the estimates that I've seen on repurposing crypto GPUs, I think I've seen like 5% to 15%. So to Brannin's point, but I'm sure, there will be people out there who try.

Brannin (42:32):
You’ve got to try, right? Because what if it works? Right? If you can make that work, that's amazing. But we're just, you know, coming as an entity that was an extremely large operator of that infrastructure and has built one of the largest cloud service providers for AI workloads, I can tell you it's going to be really, really hard to do it. Because we've had exposure in both of those places and at the end of the day, they're just very, very different businesses both from the type of engineering and developers that you employ, to the infrastructure, to the data centers that you sit within.

Joe (43:04):
So can I just go back, you know, just sort of big picture. And I guess it sort of goes back to like who gets access to what, who gets access to chips? And I imagine that, you know, not only do you need a lot of money to like build a relationship with Nvidia, you also probably need an expectation you're going to be back the next year and back the next year and back the next year, and that you actually, like, have a relationship and so forth.

But I have to imagine planning is really tough and when you have this sort of AI, machine language industry and then something like ChatGPT comes out and suddenly everyone’s like, "Oh, I need to have AI access." Talk to us about the sort of challenge of just planning the build when it can move that fast. And everyone is just sort of guessing how big this market is going to be in two to three years.

Brannin (43:56):
Oh my gosh. It’s been utterly insane, right? Back to last year, you know, the supply chain and your ability to get your hands on components. You know, you would call your OEM. The OEM is the original equipment manufacturer. Like those are the super micros, the gigabytes of the world who actually build the nodes, build the servers and you're buying through them and then they buy the GPUs from Nvidia and build all the components together, right?

So if you called them and said, "Hey, I need this many nodes to be delivered," they'll say, "Great, we will start assembling.” Takes us a week to two weeks to get the parts in assembling. And then it's another week for them to ship them to you. And then it takes us two to three weeks to plug them in, put them online get them going.

Now that's completely changed. As you know. All the supply chain has gotten thrown off. So much so that, you know, Nvidia is fully allocated, like they fully sold out their infrastructure through the end of the year, right? You can't call them. You can't call the OEM and just say, you need more compute chips. That's not possible.

So much so that, you know, when clients are coming to us today and they're asking for, you know, like a 4,000 GPU cluster to be built for them. We're telling them Q1, and increasingly it's moving towards Q2 at this point because Q1 is starting to get booked up right now. So it's something that a lot of time has been added to it.

And then there's other supply chain variables within there as well. You know, we had a client earlier this year that, we were in negotiations with them on the contract and we really wanted to perform well on timing for it. So we knew because of our orientation within the supply chain that there were some critical components that needed to be ordered ahead of time so that it would reduce our time to bringing the infrastructure online. And at that point it was the power supply units and the fans for the nodes that the OEMs were putting together. And if we hadn't done that, it would've been another, I think eight weeks on top of the build process just because not all the components would've been there at the same time.

And you're navigating this, you know, within other kind of global supply chain disruptions and inflation and all these other things that are going on right now. And it's just an insanely complex task that I think, you know, the generation of software developers and founders that we're working with today were used to being able to go to a cloud service provider and just getting whatever infrastructure they needed, right?

You'd go to your hyperscalers and say, "All right, I need this." And it was just there and available. And that just doesn't exist today because the pace of demand growth that we've been on and just the lack of this infrastructure availability and it's just caught everyone by surprise. Again, you're asking infrastructure to keep pace with the fastest adoption of a new piece of software that has ever occurred.

Joe (46:58):
Brannin McBee, CoreWeave, thank you so much. That was a great conversation. Like I said, I always sort of measure the quality of a conversation of like, do I get seven ideas?

Tracy (47:07):
How many additional episodes come out of it?

Joe (47:08):
That is a pretty good proxy for a good conversation. Do you get eight ideas for future episodes? We got a bunch there. So thank you so much for coming on the podcast.

Brannin (47:16):
Always happy to chat with you guys and thank you for the invite.

Tracy (47:19):
Thanks Brannin!

Joe (47:33):
Tracy, I want to find that company that makes the coolant for the data — no, seriously — for the data centers that allows them to pack more compute and more energy into this space. Because it feels like they're probably going to make a fortune in the next video.

Tracy (47:46):
Joe, I think you just want to talk to an HVAC contractor that's installing air conditioning.

Joe (47:52):
Yeah. Can we talk to just some random... I love the idea, that’d be such a funny thought, these like really advanced data centers, they're like, "Oh, do we have like a local air conditioning guy who can come in?"

Tracy (48:01):
No, but I imagine — actually that would've been a good question for Brannin, wouldn't it? Like the labor constraints in building and adapting true some of those data centers. But there was so much in there. One of the things that I was thinking about was the point about how well, okay, if you train a model on one type of chip, you're going to keep using that type of chip.

And I guess it's kind of obvious, but it does suggest that there's some stickiness there. Like if you start out using an Nvidia H100 you're going to keep using them and in fact you're going to consume even more because the processing power required or the compute required for the inference, is higher than for the actual initial training.

Joe (48:42):
I knew that that was the case because Stacy said so as well, but I did not realize quite the scale of like how much more, like okay, like if you train a model and then we try to take it to market — productize it as a business person would say. If we try to productize, how much more computing power we would need for the inference aspect? And meanwhile we have to keep training it all the time to keep it up with fresh data and stuff like that.

Tracy (49:06):
Yeah, totally. And the other thing that I was thinking about, and again, Stacy mentioned this in our discussion with him as well, but this idea of Nvidia building a kind of large ecosystem around the hardware. So you have the open source software, Cuda, which we talked about a little bit. And then you have these sort of high-touch partnerships with companies like CoreWeave where they're trying to make it as easy as possible for you to use their chips and set them up in a way that works for you. It feels almost like what Bitmain used to do. Do you remember that?

Joe (49:46):
No.

Tracy (49:46):
Maybe they're still doing it. Anyway, but it does feel like they're trying to build this like ecosystem moat around the chip technology.

Joe (49:54):
No, it's absolutely true. And you know, I really do take that point that Brannin made about like every company has its sort of like knowledge that cannot be written down on a piece of paper. Which is a Dan Wang point that we've been talking about for years. And so it's like, to your point, you have to like use different types of connectors and different types of power and all these stuff. The ease with which any sort of traditional cloud provider or data center provider can, you know, sort of switch to it is like, you know, it's not trivial, even with lots of money.

Tracy (50:25):
No, but I'm coming away from that conversation thinking like, the big question here is how quickly can those other hyperscalers adapt? And how big a moat can Nvidia build around this business?

Joe (50:38):
And then the other question I have is what if none of these companies make any money building AI models? I still don't think that's been proven. And so you have this huge boom in like, "Hey, we gotta build an AI model," or where we're going to build you know, OddLotsGPT for data stuff and whatever. But it all is somewhat predicated on these companies being successful in making a lot of money. And if they're not, and if it turns out that the monetization of AI products is trickier than expected, then then that also raises question about how long this boom will last?

Tracy (51:09):
I'm sorry Joe, so you're saying that tech companies should make money, is that it? Are you sure?

Joe (51:16):
That's real post-Zirp thinking of me.

Tracy (51:20):
I know. Alright, shall we leave it there?

Joe (51:22):
Let's leave it there.

You can follow Brannin McBee on Twitter at @

branninmcbee

.