Think in Context: NVIDIA CEO Jensen Huang Keynote at Computex 2024

Published: 2024-06-02

Lastmod: 2024-06-09

In this keynote speech at Computex, NVIDIA CEO Jensen Huang focused on the company’s latest advancements in accelerated computing and artificial intelligence (AI), and their profound impact across industries. He emphasized the underlying infrastructure required for generative AI, which will necessitate and drive a complete reformation of the entire computing industry, and NVIDIA has already accumulated a sizeable installed base to facilitate this transformation.

Huang highlighted NVIDIA’s pivotal role in this technological shift, having developed numerous groundbreaking technologies such as the Omniverse simulation platform, CUDA accelerated computing, and NVIDIA Inference Microservices (NIMs). This allows researchers across domains to focus on building domain-specific models and applications without worrying about the underlying technology. Huang painted a vision of a future where AI will be ubiquitous, from customer service agents and digital humans, to digital twins and robotics models that understand the laws of physics. He also discussed NVIDIA’s GPU roadmap, previewing upcoming larger and more energy-efficient GPU products.

tl;dr

NVIDIA has developed an “AI Generator” comparable to Tesla’s AC generator, capable of generating tokens (text, images, videos, etc.) that can serve industries worth trillions of dollars.
NVIDIA CUDA technology can accelerate various tasks, providing extraordinary performance boosts while reducing power consumption and costs, effectively addressing computation inflation.
NVIDIA Omniverse platform leverages accelerated computing and AI, and NVIDIA already possesses 350 domain-specific libraries, allowing them to support various industries and markets.
NIMs are a new software packaging approach. NIMs can build and organize AI teams to handle complex tasks, and can run both in the cloud and on personal computers.
Future AI models will need to understand the laws of physics, requiring more compute power and larger GPUs. NVIDIA is enhancing reliability, continuing to improve data compression/decompression efficiency, and data transfer efficiency.

Ernest’s Notes:

Recalling the historical experiences of “Apple Mac vs Windows + Intel” and “Apple iOS vs Android”, the market is likely to form two or three dominant camps. One camp may have the advantage of highly integration, while the others (in the short term) will provide flexible solutions with constraints.

After roughly ten iteration cycles, the major camps will have penetrated various customers and industries. The services and features they can offer will gradually converge (moving closer to essential needs). At the same time, all of us will also accumulate new problems, setting the stage for the next situation.

Previously, Apple’s hardware came first, followed by Apple’s software, transitioning from packaged software to Apps and SaaS. Now, it is NVIDIA’s hardware and software integration. In the future, it is believed that there will also be a rethinking of the infrastructure. It is worth continuously observing where there is redundancy, operational inefficiency, as those areas present opportunities.

My Study Notes

Opening

Video: Opening

COMPUTEX is a collaboration of NVIDIA employees from across the globe.

[Music] [Music] oh [Music] [Music] [Music] w [Music]
[Music] [Music] [Music] [Music] oh oh [Music] [Music]
[Applause] [Music] w o [Applause] [Music] [Music] [Music] ¹

Here is a look behind the scenes…

All right, let’s get started.
Good take, good take oh oh.
Okay, you guys ready?
Yeah, yeah.
Everybody thinks we make GPUs.
but we’re so much more than that.
This whole keynote is going to be about that.
Okay, so we’ll start at the top examples of the use cases and then seeing it in action.
That’s kind of the flow in such a compelling story.
I’m super nervous about this.
Just got to get in the rhythm.
We’re 2 weeks away from you guys can go really really make it bre we should take a swing at it.
Yeah, that’s the plan.
We need to get daily status on that animation.
Can you mute?
Cuz I hear myself, sorry.
What’s the drop date for all the videos?
It needs to be done on the 28th.
Did you get all that?
Safe travels everybody, super excited to see everyone. (EVA AIR)
See you guys soon.
Okay bye.
We're basically moving as fast as the world can absorb technology so we've got to leap from ourselves [Music] in now the spine you just have to figure out a way to make it pop you know what I’m [Music] saying.
You know what I’m saying.
You want to yeah that kind of thing [Music].

Okay thank you I’m super late
let’s Go.

NVIDIA founder and CEO Jensen Huang

Please welcome, welcome to the stage NVIDIA founder and CEO Jensen [Music].
I am very happy to be back.
Thank you NTU for letting us use your stadium.
The last time I was here I received a degree from [Applause] NTU and I gave the "Run don't walk" speech.
Today we have a lot to cover.
So I cannot walk, I must run.
We have a lot to cover.
I have many things to tell you.
I’m very happy to be here in Taiwan.
Taiwan is the home of our treasured partners.
This is in fact where everything NVIDIA does begins.
Our partners and ourselves take it to the world.
Taiwan and our partnership has created the world’s AI infrastructure.
Today I want to talk to you about several things.
One, what is happening and the meaning of the work that we do together.
What is generative AI?
What is its impact on our industry and on every industry.
A blueprint for how we will go forward and engage this incredible opportunity and what’s coming next.
Generative AI and its impact, our blueprint and what comes next.
These are really really exciting times.
A restart of our computer industry.
An industry that you have forged.
An industry that you have created and now you’re prepared for the next major journey.
But before we start NVIDIA lives at the intersection of computer graphics, simulations, and artificial intelligence.
This is our soul.
Everything that I show you today is simulation.
It’s math, it’s science, it’s computer science.
It’s amazing computer architecture.
None of it’s animated and it’s all homemade.
This is NVIDIA’s soul and we put it all into this virtual world we called Omniverse.
Please enjoy [Music] [Music] [Music].

1️⃣ Omniverse

Video: Omniverse

[Music] [Music] [Music] [Music] [Music] [Applause]. (NVIDIA ACE NIM ²) (NVIDIA physX ³) (NVIDIA Warp ⁴) (Embergen, NeuralVDB ⁵) (NeuralVDB, SideFX Houdini ⁶) (Ansys, Omniverse Cloud APIs ⁷) (Wistron, OpenFOAM Modulus, Omniverse Cloud APIs ⁸) *(Earth-2 ⁹) (Build a SimReady factory) (BMW Group) (Siemens Digital Industries) (BMW Group) (Mercedes Benz) (BMW Group) (HD Hyundai, Siemens Teamcenter X with Omniverse Cloud APIs) (Wistron) (NVIDIA Research ¹⁰) (Project GROOT ¹¹)

A New Computing Age Is Starting

When doubt [Applause] I want to speak to you in Chinese but I have so much to tell you.
I have to think too hard to speak Chinese, so I have to speak to you in English.
At the foundation of everything that you saw was two fundamental technologies: accelerated computing and artificial intelligence. Running inside the Omniverse.
Those two technologies, those two fundamental forces of computing, are going to reshape the computer industry.
The computer industry is now some 60 years old. (Computer industry history (last 60 years))
In a lot of ways, everything that we do today was invented the year after my birth in 1964.
The IBM System 360 introduced central processing units, general-purpose computing, the separation of hardware and software through an operating system, multitasking, IO subsystems, DMA ¹², all kinds of technologies that we use today: architectural compatibility, backwards compatibility, family compatibility, all of the things that we know today about computing largely described in 1964.
Of course, the PC revolution democratized computing and put it in the hands and the houses of everybody.
And then off in 2007, the iPhone introduced.
Mobile computing and put the computer in our pocket.
Ever since everything is connected and running all the time through the mobile cloud.
This last 60 years we saw several, just several, not that many actually, two or three major technology shifts, two or three tectonic shifts in computing where everything changed.
And we're about to see that happen again.

Accelerate Every Application

There are two fundamental things that are happening.
The first is that the processor, the engine by which the computer industry runs on, the central processing unit, the performance scaling has slowed tremendously.
And yet the amount of computation we have to do is still doubling very quickly, exponentially. (CPU scaling slows… and compute demand grows exponentially.)
If processing requirement, if the data that is that we need to process continues to scale exponentially but performance does not, we will experience computation inflation. (GPU-accelerated computing, 2006, CUDA)
And in fact we’re seeing that right now as we speak.
The amount of data center power that’s used all over the world is growing quite substantially.
The cost of computing is growing.
We are seeing computation inflation.
This of course cannot.
Continue.
The data is going to continue to increase exponentially and CPU performance scaling will never return.
There is a better way.
For almost two decades now we’ve been working on accelerated computing.
CUDA augments a CPU, offloads and accelerates the work that a specialized processor can do much much better.
In fact, the performance is so extraordinary that it is very clear now as CPU scaling has slowed and eventually substantially stopped, we should accelerate everything.
I predict that every application that is processing.
Intensive will be accelerated and surely every data center will be accelerated in the near future.
Now accelerated computing is very sensible.
It’s very common sense.
If you take a look at an application and here the 100t means 100 units of time. (The more you buy… the more you save.)
It could be 100 seconds, it could be 100 hours.
And in many cases, as you know, we’re now working on artificial intelligence applications that run for 100 days.
The 1t is code that requires sequential processing where single-threaded CPUs are really quite essential.
Operating systems control logic really essential to have one instruction executed after another instruction.
However, there are many algorithms.
Computer graphics is one that you can operate completely in parallel.
Computer graphics, image processing, physics simulations, combinatorial optimizations, graph processing, database processing, and of course the very famous linear algebra of deep learning.
There are many types of algorithms that are very conducive to acceleration through parallel processing.
So we invented an architecture to do that by adding the GPU to the CPU.
The specialized processor can take something that takes a great deal of time and accelerate it down to something that is incredibly fast.
And because the two processors can work side by side, they’re both autonomous and they’re both separate and independent.
That is, we could accelerate what used to take 100 units of time down to one unit of time.
Well, the speed up is incredible, it almost sounds unbelievable.
It almost sounds unbelievable, but today I’ll demonstrate many examples for you.
The benefit is quite extraordinary, a 100 times speed up but you only increase the power by about a factor of three. And you increase the cost by only about 50%.
We do this all the time in the PC industry. We add a GPU, a $500 GPU, GeForce GPU, to a $1,000 PC and the performance increases tremendously.
We do this in a data center, a billion-dollar data center. We add $500 million worth of GPUs and all of a sudden it becomes an AI factory.
This is happening all over the world today.
Well, the savings are quite extraordinary.
You're getting 60 times performance per dollar, a 100 times speed up, you only increase your power by 3x, 100 times speed up, you only increase your cost by 1.5x.
The savings are incredible.
The savings are measured in dollars.
It is very clear that many many companies spend hundreds of millions of dollars processing data in the cloud.
If it was accelerated, it is not unexpected that you could save hundreds of millions of dollars.
Now why is that?
Well, the reason for that is very clear.
We've been experiencing inflation for so long in general-purpose computing.
Now that we finally came to, we finally determined to accelerate.
There’s an enormous amount of captured loss that we can now regain.
A great deal of captured retained waste ¹³ that we can now relieve out of the system.
And that will translate into savings.
Savings in money, savings in energy.
And that’s the reason why you’ve heard me say the more you buy the more you save.
And now I’ve shown you the mathematics.
It is not accurate but it is correct.
Okay, that’s called CEO math. CEO math is not accurate but it is correct.
The more you buy the more you save.

CUDA Libraries

Well, accelerated computing does deliver extraordinary results but it is not easy.
Why is it that it saves so much money but people haven’t done it for so long?
The reason for that is because it’s incredibly hard.
There is no such thing as a software that you can just run through a C compiler and all of a sudden that application runs a 100 times faster.
That is not even logical.
If it was possible to do that, they would have just changed the CPU to do that.
You in fact have to rewrite the software. That’s the hard part.
The software has to be completely rewritten so that you could re-express the algorithms that were written on a CPU so that it could be accelerated, offloaded, accelerated, and run in parallel.
That computer science exercise is insanely hard.
Well, we’ve made it easy for the world over the last 20 years.
Of course, the very famous cuDNN, the deep learning library that processes neural networks.
We have a library for AI physics that you could use for fluid dynamics and many other applications where the neural network has to obey the laws of physics.
We have a great new library called Aerial that is a CUDA-accelerated 5G radio so that we can software define and accelerate the telecommunications networks the way that we’ve software-defined the world’s networking, internet.
And so the ability for us to accelerate that allows us to turn all of telecom into essentially the same type of platform, a computing platform, just like we have in the cloud.
cuLITHO is a computational lithography platform that allows us to process the most computationally intensive parts of chip manufacturing, making the mask.
TSMC is in the process of going to production with cuLITHO saving enormous amounts of energy and enormous amounts of money.
But the goal for TSMC is to accelerate their stack so that they’re prepared for even further advances in algorithms and more computation for deeper and deeper, narrower and narrower transistors.
Parabricks is our gene sequencing library. It is the highest throughput library in the world for gene sequencing.
cuOPT is an incredible library for combinatorial optimization, route planning optimization, the traveling salesman problem. Incredibly complicated, people do, well scientists have largely concluded that you needed a quantum computer to do that.
We created an algorithm that runs on accelerated computing that runs lightning fast.
23 world records. We hold every single major world record today.
cuQUANTUM is an emulation system for a quantum computer.
If you want to design a quantum computer, you need a simulator to do so.
If you want to design quantum algorithms, you need a quantum emulator to do so.
How would you do that?
How would you design these quantum computers, create these quantum algorithms if the quantum computer doesn’t exist?
Well, you use the fastest computer in the world that exists today and we call it of course NVIDIA CUDA.
And on that, we have an emulator that simulates quantum computers.
It is used by several hundred thousand researchers around the world.
It is integrated into all the leading frameworks for quantum computing and is used in scientific supercomputing centers all over the world.
cuDF is an unbelievable library for data processing.
Data processing consumes the vast majority of cloud spend today.
All of it should be accelerated.
cuDF accelerates the major libraries used in the world: Spark, many of you probably use Spark in your companies, Pandas, a new one called Polars ¹⁴, and of course NetworkX which is a graph processing, graph processing database library.
And so these are just some examples.
There are so many more.
Each one of them had to be created so that we can enable the ecosystem to take advantage of accelerated computing.
If we hadn’t created cuDNN, CUDA alone wouldn’t have been able, wouldn’t have been possible for the deep learning scientists around the world to use.
Because CUDA and the algorithms that are used in TensorFlow and PyTorch, the deep learning algorithms, the separation is too far apart.
It’s almost like trying to do computer graphics without OpenGL.
It’s almost like doing data processing without SQL.
These domain-specific libraries are really the treasure of our company. We have 350 of them. These libraries are what it takes and what has made it possible for us to have open so many markets.
I’ll show you some other examples today.
Well, just last week Google announced that they've put cuDF in the cloud and accelerated Pandas ¹⁵.
Pandas is the most popular data science library in the world. Many of you in here probably already use Pandas. It’s used by 10 million data scientists in the world, downloaded 170 million times each month. It is the Excel, it is the spreadsheet of data scientists.
Well, with just one click you can now use Pandas in Colab, which is Google’s cloud data centers platform accelerated by cuDF.
The speed up is really incredible.
Let’s take a look.
[Music] That was a great demo right?
Didn’t take [Applause] long.
When you accelerate data processing that fast, demos don’t take long.

CUDA Virtuous Cycle

Okay, well CUDA has now achieved what people call a tipping point, but it’s even better than that.
CUDA has now achieved a virtuous cycle.
This rarely happens if you look at history and all the computing architecture, computing platforms.
In the case of microprocessor CPUs, it has been here for 60 years. It has not been changed for 60 years.
At this level, this way of doing computing, accelerated computing has been around.
Creating a new platform is extremely hard because it's a chicken and egg problem.
If there are no developers that use your platform, then of course there will be no users.
But if there are no users, there is no install base.
If there’s no install base, developers aren’t interested in it.
Developers want to write software for a large install base.
But a large install base requires a lot of applications so that users would create that install base.
This chicken or the egg problem has rarely been broken and has taken us now 20 years.
One domain library after another, one acceleration library another.
And now we have 5 million developers around the world.
We serve every single industry from healthcare, financial services, of course the computer industry, automotive industry, just about every major industry in the world, just about every field of science.
Because there are so many customers for our architecture, OEMs and cloud service providers are interested in building our systems.
System makers, amazing system makers like the ones here in Taiwan, are interested in building our systems which then takes and offers more systems to the market.
Which of course creates greater opportunity for us, which allows us to increase our scale, R&D scale, which speeds up the application even more.
Well, every single time we speed up the application, the cost of computing goes down.
This is that slide I was showing you earlier.
100x speed up translates to 97%, 96%, 98% savings.
And so when we go from 100x speed up to 200x speed up to 1000x speed up, the savings, the marginal cost of computing continues to fall.
Well, of course we believe that by reducing the cost of computing incredibly, the market, developers, scientists, inventors will continue to discover new algorithms that consume more and more and more computing.
So that one day something happens, that a phase shift happens, that the marginal cost of computing is so low that a new way of using computers emerge.
In fact, that’s what we’re seeing now.
Over the years, we have driven down the marginal cost of computing.
In the last 10 years, in one particular algorithm, by a million times.
Well, as a result it is now very logical and very common sense to train large language models with all of the data on the internet.
Nobody thinks twice.
This idea that you could create a computer that could process so much data to write its own software, the emergence of artificial intelligence was made possible because of this complete belief that if we made computing cheaper and cheaper and cheaper, somebody’s going to find a great use.
Well, today CUDA has achieved the virtuous cycle.
Install base is growing.
Computing cost is coming down, which causes more developers to come up with more ideas, which drives more demand.
And now we’re on in the beginning of something very, very important.
But before I show you that, I want to show you what is not possible if not for the fact that we created CUDA, that we created the modern version of general, the modern Big Bang of AI. Generative AI. What I'm about to show you would not be possible.

Earth 2

This is Earth 2.
The idea that we would create a digital twin of the Earth, that we would go and simulate the Earth so that we could predict the future of our planet.
To better avert disasters or better understand the impact of climate change so that we can adapt better, so that we could change our habits.
Now this digital twin of Earth is probably one of the most ambitious projects that the world’s ever undertaken.
And we’re taking large steps every single year and I’ll show you results every single year.
But this year we made some great breakthroughs.
Let’s take a look. (All weather visualizations you are about to see are Omniverse simulations - not movies)
On Monday the storm will veer north again and approach Taiwan.
There are big uncertainties regarding its path.
Different paths will have different levels of impact on Taiwan.
Someday in the near future, we will have continuous weather prediction at every square kilometer on the planet.
You will always know what the climate’s going to be.
You will always know and this will run continuously because we trained the AI and the AI requires so little energy.
This is just an incredible achievement.
I hope you enjoyed it.

2️⃣ AI Factory

DGX

And very importantly, what [Applause] the truth is that was a Jensen AI, that was not me.
I wrote it but an AI Jensen AI had to say it because of our dedication to continuously improve the performance of Drive the cost down.
Researchers discovered AI.
Researchers discovered CUDA in 2012.
That was NVIDIA’s first contact with AI.
This was a very important day.
We had the good wisdom to work with the scientists to make it possible for deep learning to happen and AlexNet achieved of course a tremendous computer vision breakthrough.
But the great wisdom was to take a step back and understand what was the background.
What is the foundation of deep learning?
What is its long-term impact?
What is its potential?
And we realized that this technology has great potential to scale an algorithm that was invented and discovered decades ago.
All of a sudden because of more data, larger networks, and very importantly a lot more compute, deep learning was able to achieve what no human algorithm was able to.
Now imagine if we were to scale up the architecture even more, larger networks, more data, and more compute.
What could be possible?
So we dedicated ourselves to reinvent everything.
After 2012, we changed the architecture of our GPU to add tensor cores.
We invented NVLink that was 10 years ago.
Now cuDNN, TensorRT, NCCL (pronounced “Nickel”), we bought Mellanox, TensorRT LLM, the Triton inference server, and all of it came together on a brand new computer.
Nobody understood, nobody asked for it, nobody understood it, and in fact, I was certain nobody wanted to buy it.
And so we announced it at GTC, and OpenAI, a small company in San Francisco, saw it and they asked me to deliver one to them.
I delivered the first DGX, the world's first AI supercomputer, to OpenAI in 2016.
Well, after that we continued to scale from one AI supercomputer, one AI appliance.
We scaled it up to large supercomputers, even larger.
By 2017, the world discovered Transformers so that we could train enormous amounts of data and recognize and learn patterns that are sequential over large spans of time.
It is now possible for us to train these large language models to understand and achieve a breakthrough in natural language understanding.
And we kept going after that, we built even larger ones.
Then in November 2022, trained on thousands, tens of thousands of NVIDIA GPUs in a very large AI supercomputer, OpenAI announced ChatGPT.
One million users after five days, one million after five days, a 100 million after two months.
The fastest growing application in history.
And the reason for that is very simple.
It is just so easy to use and it was so magical to use to be able to interact with a computer like it’s human.
Instead of being clear about what you want, it’s like the computer understands your meaning.
It understands your intention.
Oh, I think here it asked the closest night market.
As you know, the night market is very important to me.
- When I was young, I was I think I was four and a half years old.
- I used to love going to the night market because I just love watching people.
- And so we went, my parents used to take us to the night market.
- And I love, I love going.
- One day, my face, you guys might see that I have a large scar on my face.
- My face was cut because somebody was washing their knife and I was a little kid.
- But my memories of the night market are so deep because of that.
- I used to love, I still love going to the night market.
- And I just need to tell you guys this.
- The Tona Night Market is really good because there’s a lady, she’s been working there for 43 years.
- She’s the fruit lady and it’s in the middle of the street, in the middle between the two.
- Go find her.
- She [Applause] she’s really terrific.
- I think it would be funny after this, all of you go to see her.
- Every year she’s doing better and better.
- Her cart has improved.
- Yeah, I just love watching her succeed.

Learning & Tokens

Anyways, ChatGPT came along and something is very important in this slide here.
Let me show you something, this slide.
Okay, and this slide, the fundamental difference is this.
Until ChatGPT revealed it to the world, AI was all about perception, natural language understanding, computer vision, speech recognition.
It's all about perception and detection.
This was the first time the world saw generative AI.
It produced tokens, one token at a time and those tokens were words.
Some of the tokens of course could now be images or charts or tables, songs, words, speech, videos. Those tokens could be anything.
Anything that you can learn the meaning of.
It could be tokens of chemicals, tokens of proteins, genes.
You saw earlier in Earth 2 we were generating tokens of the weather.
We can learn physics.
If you can learn physics, you could teach an AI model physics.
The AI model could learn the meaning of physics and it can generate physics.
We were scaling down to 1 kilometer not by using filtering, it was generating.
And so we can use this method to generate tokens for almost anything, almost anything of value.
We can generate steering wheel control for a car.
We can generate articulation for a robotic arm.
Everything that we can learn, we can now generate.

AI Generator

We have now arrived not at the AI era but a generative AI era.
But what’s really important is this.
This computer that started out as a supercomputer has now evolved into a data center and it produces one thing, it produces tokens.
It’s an AI factory.
This AI factory is generating, creating, producing something of great value, a new commodity.
In the late 1890s, Nikola Tesla invented an AC generator.
We invented an AI generator.
The AC generator generated electrons.
NVIDIA's AI generator generates tokens.
Both of these things have large market opportunities.
It’s completely fungible in almost every industry and that’s why it’s a new industrial revolution.
We have now a new factory producing a new commodity for every industry that is of extraordinary value.
And the methodology for doing this is quite scalable and the methodology of doing this is quite repeatable.
Notice how quickly so many different AI models, generative AI models are being invented.
Literally daily every single industry is now piling on.
For the very first time, the IT industry, which is a $3 trillion industry, is about to create something that can directly serve a hundred trillion dollars of industry.
No longer just an instrument for information storage or data processing but a factory for generating intelligence for every industry.
This is going to be a manufacturing industry.
Not a manufacturing industry of computers but using the computers in manufacturing.
This has never happened before.
Quite an extraordinary thing.
What started with accelerated computing led to AI, led to generative AI, and now an industrial revolution.
Now the impact to our industry is also quite significant.
Of course, we could create a new commodity, a new product we call tokens for many industries.
But the impact of ours is also quite profound.

NIMs

For the very first time as I was saying earlier in 60 years, every single layer of computing has been changed.
From CPUs, general-purpose computing to accelerated GPU computing where the computer needs instructions.
Now computers process LLMs, large language models, AI models.
And whereas the computing model of the past is retrieval-based, almost every time you touch your phone, some pre-recorded text or pre-recorded image or pre-recorded video is retrieved for you and recomposed based on a recommender system to present it to you based on your habits.
But in the future, your computer will generate as much as possible, retrieve only what’s necessary.
And the reason for that is because generated data requires less energy to go fetch information.
Generated data also is more contextually relevant.
It will encode knowledge, it will understand you.
And instead of get that information for me or get that file for me, you just say ask me for an answer.
And instead of a tool, instead of your computer being a tool that we use, the computer will now generate skills.
It performs tasks.
And instead of an industry that is producing software, which was a revolutionary idea in the early 90s.
Remember the idea that Microsoft created for packaging software revolutionized the PC industry?
Without packaged software, what would we use the PC to do?
It drove this industry and now we have a new factory, a new computer.
And what we will run on top of this is a new type of software.
We call it NIMs, NVIDIA Inference Microservices.
Now what happens is the NIM runs inside this factory and this NIM is a pre-trained model.
It’s an AI.
Well, this AI is of course quite complex in itself.
But the computing stack that runs AIs is insanely complex.
When you go and use ChatGPT, underneath their stack is a whole bunch of software.
Underneath that prompt is a ton of software.
And it’s incredibly complex because the models are large, billions to trillions of parameters.
It doesn’t run on just one computer, it runs on multiple computers.
It has to distribute the workload across multiple GPUs, tensor parallelism, pipeline parallelism, data parallelism, all kinds of parallelism, expert parallelism, all kinds of parallelism distributing the workload across multiple GPUs.
Processing it as fast as possible because if you are in a factory, if you run a factory, your throughput directly correlates to your revenues.
Your throughput directly correlates to the quality of service.
And your throughput directly correlates to the number of people who can use your service.
We are now in a world where data center throughput utilization is vitally important.
It was important in the past but not vitally important.
It was important in the past but people don't measure it.
Today every parameter is measured: start time, uptime, utilization, throughput, idle time, you name it.
Because it’s a factory.
When something is a factory, its operations directly correlate to the financial performance of the company.
And so we realized that this is incredibly complex for most companies to do.
So what we did was we created this AI in a box and it contains an incredible amount of software.
Inside this container is CUDA, cuDNN, TensorRT, Triton for inference services.
It is cloud-native so that you could auto-scale in a Kubernetes environment.
It has management services and hooks so that you can monitor your AIs.
It has common APIs, standard APIs so that you could literally chat with this box.
You download this NIM and you can talk to it.
So long as you have CUDA on your computer, which is now of course everywhere.
It’s in every cloud, available from every computer maker.
It is available in hundreds of millions of PCs.
When you download this, you have an AI and you can chat with it like ChatGPT.
All of the software is now integrated, 400 dependencies all integrated into one.
We tested this NIM, each one of these pre-trained models, against all kinds of our entire install base that’s in the cloud.
All the different versions of Pascal and Ampere and Hopper and all kinds of different versions, I even forget some.
NIMs, incredible invention.
This is one of my favorites.

Models

Erenst Notes: Human needs to know the mission.

And of course, as you know, we now have the ability to create large language models and pre-trained models of all kinds.
And we have all of these various versions, whether it’s language-based or vision-based or imaging-based.
We have versions that are available for healthcare, digital biology.
We have versions that are digital humans that I’ll talk to you about.
And the way you use this, just come to ai.nvidia.com.
And today we just posted up in Hugging Face the Llama 3 NIM ¹⁶ fully optimized.
It’s available there for you to try and you can even take it with you.
It’s available to you for free.
So you could run it in the cloud, run it in any cloud.
You could download this container, put it into your own data center, and you could host it, make it available for your customers.
We have, as I mentioned, all kinds of different domains.
Physics, some of it is for semantic retrieval called RAGs, vision, languages, all kinds of different languages.
And the way that you use it is connecting these microservices into large applications.

Customer Service Agents

One of the most important applications in the coming future of course is customer service agents.
Customer service agents are necessary in just about every single industry.
It represents trillions of dollars of customer service around the world.
Nurses are customer service agents.
In some ways, some of them are non-prescription or non-diagnostic based.
Nurses are essentially customer service.
Customer service for retail, for quick service foods, financial services, insurance.
Tens and tens of millions of customer service can now be augmented by language models and augmented by AI.
And so these boxes that you see are basically NIMs.
- Some of the NIMs are reasoning agents.
- Given a task, figure out what the mission is, break it down into a plan.
- Some of the NIMs retrieve information.
- Some of the NIMs might do search.
- Some of the NIMs might use a tool like cuOpt that I was talking about earlier.
- They could use a tool that could be running on SAP, and so it has to learn a particular language called ABAP.
- Maybe some NIMs have to do SQL queries.
And so all of these NIMs are experts that are now assembled as a team.
So what’s happening, the application layer has been changed.
What used to be applications written with instructions are now applications that are assembling teams. Assembling teams of AIs.
Very few people know how to write programs.
Almost everybody knows how to break down a problem and assemble teams.
Every company I believe in the future will have a large collection of NIMs.
And you would bring down the experts that you want, you connect them into a team.
You don’t even have to figure out exactly how to connect them.
You just give the mission to an agent, to a NIM to figure out who to break the task down and who to give it to.
And that central, the leader of the application if you will, the leader of the team would break down the task and give it to the various team members.
The team members would perform their tasks, bring it back to the team leader.
The team leader would reason about that and present the information back to you just like humans.
This is in our near future.
This is the way applications are going to look.
Now of course we could interact with these large AI services with text prompts and speech prompts.

Digital Humans

Ernest’s Notes: In the Digital Humans video, there was a cameo by OpenAI. Compared to Google’s Project Astra announced at Google IO 2024, OpenAI seems to have a better grasp of the friendliness, usability, and acceptance of interacting with humans. Remind myself not to simply create very engineering-focused products while neglecting the user experience.

However, there are many applications where we would like to interact with what is otherwise a humanlike form.
We call them digital humans.
NVIDIA has been working on digital human technology for some time.
Let me show it to you.
Well before I do that, hang on a second.
Digital humans have the potential of being a great interactive experience with you.
They make interactions much more engaging.
They could be much more empathetic.
And of course, we have to cross this incredible chasm, this uncanny chasm of realism, so that the digital humans would appear much more natural.
This is of course our vision.
This is a vision of where we love to go.
But let me show you where we are.
Great to be in Taiwan.
Before I head out to the night market, let’s dive into some exciting frontiers of digital humans.
Imagine a future where computers interact with us just like humans can.
Hi, my name is Sophie and I am a digital human brand ambassador for Unique.
This is the incredible reality of digital humans.
Digital humans will revolutionize industries from customer service to advertising.
The possibilities for digital humans are endless.
Using the scans you took of your current kitchen with your phone, they will be AI interior designers.
Helping generate beautiful photorealistic suggestions and sourcing the materials and furniture.
We have generated several design options for you to choose from.
They will also be AI customer service agents making the interaction more engaging and personalized.
Or digital healthcare workers who will check on patients providing timely personalized care.
Um, I did forget to mention to the doctor that I am allergic to penicillin.
Is it still okay to take the medications?
The antibiotics you’ve been prescribed, Ciclin and Metronidazole, don’t contain penicillin, so it’s perfectly safe for you to take them.
They will even be AI brand ambassadors setting the next marketing and advertising trends.
Hi, I’m EMA, Japan’s first virtual model.
New breakthroughs in generative AI and computer graphics let digital humans see, understand, and interact with us in humanlike ways.
H, from what I can see it looks like you’re in some kind of recording or production setup.
The foundation of digital humans are AI models built on multilingual speech recognition and synthesis.
And LLMs that understand and generate conversation.
The AIs connect to another generative AI to dynamically animate a lifelike 3D mesh of a face.
And finally, AI models that reproduce lifelike appearances, enabling real-time path-traced subsurface scattering to simulate the way light penetrates the skin, scatters and exits at various points, giving skin its soft and translucent appearance.
NVIDIA ACE is a suite of digital human technologies packaged as easy to deploy fully optimized microservices or NIMs.
Developers can integrate ACE NIMs into their existing frameworks, engines, and digital human experiences.
Nemotron, SLM, and LLM NIMs to understand our intent and orchestrate other models.
RIVA speech NIMs for interactive speech and translation.
Audio2Face and Gesture NIMs for facial and body animation.
And Omniverse RTX with DLSS for neural rendering of skin and hair.
ACE NIMs run on NVIDIA GDN, a global network of NVIDIA accelerated infrastructure that delivers low latency digital human processing to over 100 regions.
[Applause] Pretty incredible.
Well, those ACE runs in a cloud, but it also runs on PCs.
We had the good wisdom of including tensor core GPUs in all of RTX.
So we’ve been shipping AI GPUs for some time, preparing ourselves for this day.
The reason for that is very simple.
We always knew that in order to create a new computing platform, you need to install base first.
Eventually, the application will come.
If you don’t create the install base, how could the application come?
And so if you build it, they might not come.
But if you don’t build it, they cannot come.
And so we installed every single RTX GPU with tensor core processing.
And now we have 100 million GeForce RTX AI PCs in the world and we’re shipping 200.
And this Computex, we’re featuring four new amazing laptops.
All of them are able to run AI.
Your future laptop, your future PC will become an AI.
It will be constantly helping you, assisting you in the background.
The PC will also run applications that are enhanced by AI.
Of course, all your photo editing and your writing and your tools, all the things that you use will all be enhanced by AI.
And your PC will also host applications with digital humans that are AIs.
And so there are different ways that AIs will manifest themselves and become used in PCs.
But PCs will become very important AI platforms.

3️⃣ Larger GPUs

And so where do we go from here?
I spoke earlier about the scaling of our data centers and every single time we scaled, we found a new phase change.
When we scaled from DGX into large AI supercomputers, we enabled transformers to be able to train on enormously large data sets.
Well, what happened was in the beginning, the data was human supervised.
It required human labeling to train AIs.
Unfortunately, there is only so much you can human label.
Transformers made it possible for unsupervised learning to happen.
Now, transformers just look at an enormous amount of data, or look at an enormous amount of video, or look at an enormous amount of images and it can learn from studying an enormous amount of data, find the patterns and relationships itself.
Well, the next generation of AI needs to be physically based.
Most of the AIs today don't understand the laws of physics.
It’s not grounded in the physical world.
In order for us to generate images and videos and 3D graphics and many physics phenomena, we need AIs that are physically based and understand the laws of physics.
Well, the way that you could do that is of course learning from video is one source.
Another way is synthetic data, simulation data.
And another way is using computers to learn with each other.
This is really no different than using AlphaGo, having AlphaGo play itself, self-play.
And between the two capabilities, same capabilities playing each other for a very long period of time, they emerge even smarter.
And so you’re going to start to see this type of AI emerging.
Well, if the AI data is synthetically generated and using reinforcement learning, it stands to reason that the rate of data generation will continue to advance.
And every single time data generation grows, the amount of computation that we have to offer needs to grow with it.
We are about to enter a phase where AIs can learn the laws of physics and understand and be grounded in physical world data.
And so we expect that models will continue to grow and we need larger GPUs.

Blackwell

Ernest’s Notes: For more comprehensive introduction, analysis, and videos on Blackwell, please see Think in Context: NVIDIA GTC 2024 Keynote with NVIDIA CEO Jensen Huang - Announcing Blackwell. There will also be a Computex-exclusive night-mix version of the video coming up.

Well, Blackwell was designed for this generation.
This is Blackwell and has several very important technologies.
One of course is just the size of the chip.
We took two of the largest chips that are as large as you can make at TSMC and we connected two of them together with a 10 terabytes per second (10 TB/sec) link between the world’s most advanced dies connecting these two together.
We then put two of them on a computer node connected with a Grace CPU.
The Grace CPU could be used for several things.
- In the training situation, it could be used for fast checkpoint and restart.
- In the case of inference and generation, it could be used for storing context memory so that the AI has memory and understands the context of the conversation you would like to have.
It’s our second generation Transformer Engine.
- Transformer Engine allows us to adapt dynamically to a lower precision based on the precision and the range necessary for that layer of computation.
This is our second generation GPU
- that has secure AI so that you could ask your service provider to protect your AI from being either stolen from theft or tampering.
This is our fifth generation NVLink.
- NVLink allows us to connect multiple GPUs together and I’ll show you more of that in a second.
And this is also our first generation with a reliability and availability engine.
- This system, this RAS system, allows us to test every single transistor, flip-flop, memory on chip, memory off chip so that we can in the field determine whether a particular chip is failing.
- The MTBF, the mean time between failure of a supercomputer with 10,000 GPUs, is measured in hours.
- The mean time between failure of a supercomputer with 100,000 GPUs is measured in minutes.
- And so the ability for a supercomputer to run for a long period of time and train a model that could last for several months is practically impossible if we don’t invent technologies to enhance its reliability.
- Reliability would of course enhance its uptime which directly affects the cost.
And then lastly, decompression engine.
- Data processing is one of the most important things we have to do.
- We added a data compression engine, decompression engine so that we can pull data out of storage 20 times faster than what's possible today.

Blackwell is in production

Well, all of this represents Blackwell and I think we have one here that’s in production.
During GTC, I showed you Blackwell in a prototype state.
The other side, this is why we practice [Laughter].
在美國是這樣子的（左右相反的）(in Chinese)
Ladies and gentlemen, this is Blackwell.
Blackwell is in production.
Incredible amounts of technology.
This is our production board.
This is the most complex, highest performance computer the world’s ever made.
This is the Grace CPU and you could see each one of these Blackwell dies, two of them connected together.
You see that it is the largest die, the largest chip the world makes.
And then we connect two of them together with a 10 terabyte per second link.
And that makes the Blackwell computer.
And the performance is incredible, take a look at this.
You see the computational, the FLOPS, the AI FLOPS, for each generation has increased by a thousand times in eight years.
Moore’s Law in eight years is something along the lines of oh I don’t know maybe 40, 60, and in the last eight years Moore’s Law has gone a lot less.
And so just to compare even Moore’s Law at its best of times compared to what Blackwell could do.
So the amount of computations is incredible and whenever we bring the computation high, the thing that happens is the cost goes down.
And I’ll show you what we’ve done is we’ve increased computational capability, the energy used to train a GPT 4, 2 trillion parameters, 8 trillion tokens.
The amount of energy that is used has gone down by 350 times.
Well, Pascal would have taken 1,000 gigawatt hours.
1,000 gigawatt hours means that it would take a gigawatt data center.
The world doesn’t have a gigawatt data center but if you had a gigawatt data center it would take a month.
If you had 100 megawatt data center it would take about a year.
And so nobody would of course create such a thing.
And that’s the reason why these large language models, ChatGPT, was impossible only eight years ago.
By us driving down, increasing the performance, the energy efficiency while keeping and improving energy efficiency along the way, we’ve now taken with Blackwell what used to be 1,000 gigawatt hours to 3.
An incredible advance.
3 gigawatt hours.
If it’s a 10,000 GPUs for example, it would only take a couple, 10,000 GPUs I guess it would take a few days, 10 days or so.
So the amount of advance in just eight years is incredible.
Well, this is for inference.
This is for token generation.
Our token generation performance has made it possible for us to drive the energy down by 45,000 times.
177,000 joules per token.
That was Pascal, 177,000 joules.
It’s kind of like two light bulbs running for two days.
It would take two light bulbs running for two days, amounts of energy, 200 watts running for two days to generate one token of GPT4.
It takes about three tokens to generate one word.
And so the amount of energy used necessary for Pascal to generate GPT4 and have a ChatGPT experience with you was practically impossible.
But now we only use 0.4 joules per token.
And we can generate tokens at incredible rates and with very little energy.
Okay, so Blackwell is just an enormous lead.
Well, even so, it’s not big enough.
And so we have to build even larger machines.

DGX

And so the way that we build it is called DGX.
So this is our Blackwell chips and it goes into DGX systems.
That’s why we should practice.
So this is a DGX Blackwell.
This is air-cooled, has eight of these GPUs inside.
Look at the size of the heat sinks on these GPUs.
About 15 kilowatts, 15,000 watts, and completely air-cooled.
This version supports x86 and it goes into the infrastructure that we’ve been shipping Hoppers into.
However, if you would like to have liquid cooling, we have a new system.
And this new system is based on this board and we call it MGX for modular.
And this modular system, you won’t be able to see this.
Can they see this?
Can you see this?
Are you okay?
So this is the MGX system and here’s the two Blackwell boards.
So this one node has four Blackwell chips.
These four Blackwell chips, this is liquid-cooled.
Nine of them, nine of them.
Well, 72 of these GPUs.
72 of these GPUs are then connected together with a new NVLink.
This is NVLink Switch fifth generation.
And the NVLink Switch is a technology miracle.
This is the most advanced switch the world’s ever made.
The data rate is insane.
And these switches connect every single one of these Blackwells to each other so that we have one giant 72 GPU Blackwell.
The benefit of this is that in one domain, one GPU domain, this now looks like one GPU.
This one GPU has 72 versus the last generation of eight.
So we increased it by nine times.
The amount of bandwidth we’ve increased by 18 times.
The AI FLOPS we’ve increased by 45 times.
And yet the amount of power is only 10 times.
This is 100 kilowatts and that is 10 kilowatts.
And that’s for one.
Now, of course, you could always connect more of these together and I’ll show you how to do that in a second.
But what’s the miracle is this chip, this NVLink chip.
People are starting to awaken to the importance of this NVLink chip as it connects all these different GPUs together.
Because the large language models are so large, it doesn’t fit on just one GPU, it doesn’t fit on just one node.
It’s going to take the entire rack of GPUs like this new DGX that I was just standing next to, to hold a large language model that are tens of trillions of parameters large.
NVLink, which in itself is a technology miracle, is 50 billion transistors, 74 ports at 400 gigabits each.
Four lengths, cross-sectional bandwidth of 7.2 terabytes per second.
But one of the important things is that it has mathematics inside the switch so that we can do reductions, which is really important in deep learning right on the chip.
And so this is what a DGX looks like now.
And a lot of people ask us, you know, they say, and there’s this confusion about what NVIDIA does and how is it possible that NVIDIA became so big building GPUs.
And so there’s an impression that this is what a GPU looks like.
Now, this is a GPU.
This is one of the most advanced GPUs in the world, but this is a gamer GPU.
But you and I know that this is what a GPU looks like.
This is one GPU, ladies and gentlemen, DGX GPU.
You know, the back of this GPU is the NVLink spine.
The NVLink spine is 5,000 wires, two miles.
And it’s right here, this is an NVLink spine.
And it connects 72 GPUs to each other.
This is an electrical mechanical miracle.
The transceivers make it possible for us to drive the entire length in copper.
And as a result, this switch, the NVSwitch, NVLink switch driving the NVLink spine in copper makes it possible for us to save 20 kilowatts in one rack.
20 kilowatts could now be used for processing, just an incredible achievement.
So this is the NVLink spine.
[Applause] Wow, I went down today.
And even this is not big enough.
Even this is not big enough for AI factories.
So we have to connect it all together with very high-speed networking.
Well, we have two types of networking.
We have InfiniBand, which has been used in supercomputing and AI factories all over the world.
And it is growing incredibly fast for us.
However, not every data center can handle InfiniBand because they’ve already invested their ecosystem in Ethernet for too long.
And it does take some specialty and some expertise to manage InfiniBand switches and InfiniBand networks.
And so what we’ve done is we’ve brought the capabilities of InfiniBand to the Ethernet architecture, which is incredibly hard.
And the reason for that is this.
Ethernet was designed for high average throughput because every single note, every single computer is connected to a different person on the internet.
And most of the communications is the data center with somebody on the other side of the internet.
However, deep learning in AI factories, the GPUs are not communicating with people on the internet.
Mostly, it’s communicating with each other.
They’re communicating with each other because they’re all collecting partial products and they have to reduce it and then redistribute it.
Chunks of partial products, reduction, redistribution, that traffic is incredibly bursty.
And it is not the average throughput that matters, it’s the last arrival that matters.
Because if you’re reducing, collecting partial products from everybody, if I’m trying to take all of your…
So it's not the average throughput, it's whoever gives me the answer last.
Okay, Ethernet has no provision for that.
And so there are several things that we had to create.
We created an end-to-end architecture so that the NIC and the switch can communicate.
And we applied four different technologies to make this possible.
Number one, NVIDIA has the world's most advanced RDMA.
- And so now we have the ability to have a network-level RDMA for Ethernet that is incredibly great.
Number two, we have congestion control.
- The switch does telemetry at all times incredibly fast.
- And whenever the GPUs or the NICs are sending too much information, we can tell them to back off so that it doesn’t create hotspots.
Number three, adaptive routing.
- Ethernet needs to transmit and receive in order.
- We see congestions or we see ports that are not currently being used irrespective of the ordering.
- We will send it to the available ports and BlueField on the other end reorders it so that it comes back in order.
- That adaptive routing is incredibly powerful.
And then lastly, noise isolation.
- There’s more than one model being trained or something happening in the data center at all times.
- And their noise and their traffic could get into each other and cause jitter.
- And so when the noise of one model training causes the last arrival to end up too late, it really slows down the training.
Well, overall, remember you have built a 5 billion dollar or 3 billion dollar data center and you’re using this for training.
If the network utilization was 40% lower and as a result the training time was 20% longer, the 5 billion dollar data center is effectively like a 6 billion dollar data center.
So the cost is incredible.
The cost impact is quite high.
Ethernet with Spectrum X basically allows us to improve the performance so much that the network is basically free.
And so this is really quite an achievement.
We have a whole pipeline of Ethernet products behind us.
This is Spectrum X800.
- It is 51.2 terabits per second and 256 radix.
The next one coming is 512 radix, one year from now.
- 512 radix and that’s called Spectrum X800 Ultra.
And the one after that is X1600.
But the important idea is this.
- X800 is designed for tens of thousands, tens of thousands of GPUs.
- X800 Ultra is designed for hundreds of thousands of GPUs.
- And X1600 is designed for millions of GPUs.
The days of millions of GPU data centers are coming.
And the reason for that is very simple.
Of course, we want to train much larger models.
But very importantly, in the future almost every interaction you have with the internet or with a computer will likely have a generative AI running in the cloud somewhere.
And that generative AI is working with you, interacting with you, generating videos or images or text or maybe a digital human.
And so you’re interacting with your computer almost all the time and there’s always a generative AI connected to that.
Some of it is on-prem, some of it is on your device, and a lot of it could be in the cloud.
These generative AIs will also do a lot of reasoning capability.
Instead of just one-shot answers, they might iterate on answers so that they improve the quality of the answer before they give it to you.
And so the amount of generation we’re going to do in the future is going to be extraordinary.
Let’s take a look at all of this put together now.
Tonight, this is our first nighttime keynote.
I want to thank all of you for coming out tonight at 7:00.
And so what I’m about to show you has a new vibe.
Okay, there’s a new vibe.
This is kind of the nighttime keynote vibe.
So enjoy this.

Video: Blackwell Platform

Ernest’s Notes: Oh yeah! It’s the night mix version! For the original Blackwell video announced at GTC 2024, please see Think in Context: NVIDIA GTC 2024 Keynote with NVIDIA CEO Jensen Huang - Announcing Blackwell.

[Music] Let’s go, go, go, go, go.
Okay [Music] [Music]
Come on, yeah, yeah, yeah, yeah.
Get it, yeah, get it, yeah, let’s go.
[Music] [Applause] The more you buy, the more you save.
With top eye.
Tor made that’s to speed.
[Music] [Applause] Now, you can’t do that on a morning keynote.
I think that style of keynote has never been done in Computex ever.
Might be the last.
Only NVIDIA can pull off that.
Only I could do that.
[Applause] Blackwell, of course, is the first generation of NVIDIA platforms that was launched at the beginning.
As the world knows, the generative AI era is here.
Just as the world realized the importance of AI factories.
Just as the beginning of this new industrial revolution.
We have so much support.
Nearly every OEM, every computer maker, every CSP, every GPU cloud, sovereign clouds, even telecommunication companies, enterprises all over the world.
The amount of success, the amount of adoption, the amount of enthusiasm for Blackwell is just really off the charts.
And I want to thank everybody for that.
We’re not stopping there.
During this time of incredible growth, we want to make sure that we continue to enhance performance.
Continue to drive down cost, cost of training, cost of inference, and continue to scale out AI capabilities for every company to embrace.
The further we drive performance up, the greater the cost decline.

NVIDIA Roadmap

Hopper platform, of course, was the most successful data center processor probably in history.
And this is just an incredible, incredible success story.
However, Blackwell is here.
And every single platform, as you’ll notice, are several things.
You’ve got the CPU, you have the GPU, you have NVLink, you have the NIC, and you have the switch, the NVLink switch that connects all of the GPUs together as large of a domain as we can.
And whatever we can do, we connect it with very large and very high-speed switches.
Every single generation, as you’ll see, is not just a GPU, but it’s an entire platform.
We build the entire platform.
We integrate the entire platform into an AI factory supercomputer.
However, then we disaggregate it and offer it to the world.
And the reason for that is because all of you could create interesting and innovative configurations and all kinds of different styles and fit different data centers and different customers in different places.
Some of it for edge, some of it for telco, and all of the different innovations are possible if we make the systems open and make it possible for you to innovate.
And so we design it integrated but we offer it to you disaggregated so that you could create modular systems.
The Blackwell platform is here.
Our company is on a one-year rhythm.
Our basic philosophy is very simple.
One, build the entire data center scale, disaggregate it, and sell it to you in parts on a one-year rhythm.
And we push everything to the technology limits.
Whatever TSMC process technology, we’ll push it to the absolute limits.
Whatever packaging technology, we’ll push it to the absolute limits.
Whatever memory technology, we’ll push it to the absolute limits.
Series technology, optics technology, everything is pushed to the limit.
And then after that, do everything in such a way so that all of our software runs on this entire install base.
Software inertia is the single most important thing in computers.
When a computer is backwards compatible and it’s architecturally compatible with all the software that has already been created, your ability to go to market is so much faster.
And so the velocity is incredible when we can take advantage of the entire installed base of software that has already been created.
Well, Blackwell is here.
Next year is Blackwell Ultra.
Just as we had H100 and H200, you’ll probably see some pretty exciting new generation from us for Blackwell Ultra.
Again, push to the limits.
And the next generation Spectrum switches I mentioned.
Well, this is the very first time that this next click has been made and I’m not sure yet whether I’m going to regret this or not.
[Applause] We have code names in our company and we try to keep them very secret.
Oftentimes, most of the employees don’t even know.
But our next generation platform is called Rubin.
The Rubin platform.
I’m not going to spend much time on it.
I know what’s going to happen.
You’re going to take pictures of it and you’re going to go look at the fine prints.
And feel free to do that.
So we have the Rubin platform and one year later we have the Rubin Ultra platform.
All of these chips that I’m showing you here are all in full development, 100% of them.
And the rhythm is one year at the limits of technology, all 100% architecturally compatible.
So this is basically what NVIDIA is building and all of the riches of software on top of it.
So in a lot of ways, the last 12 years from that moment of ImageNet and us realizing that the future of computing was going to radically change to today is really exactly as I was holding up earlier.
GeForce pre-2012 and NVIDIA today.
The company has really transformed tremendously.
And I want to thank all of our partners here for supporting us every step along the way.
This is the NVIDIA Blackwell platform.

4️⃣ Physical AI

Let me talk about what’s next.
The next wave of AI is physical AI.
AI that understands the laws of physics.
AI that can work among us.
And so they have to understand the world model so that they understand how to interpret the world, how to perceive the world.
They have to of course have excellent cognitive capabilities so they can understand us, understand what we ask, and perform the tasks.
In the future, robotics is a much more pervasive idea.
Of course, when I say robotics, there’s humanoid robotics, that’s usually the representation of that, but that’s not at all true.
Everything is going to be robotic.
All of the factories will be robotic.
The factories will orchestrate robots and those robots will be building products that are robotic.
Robots interacting with robots building products that are robotic.
Well, in order for us to do that, we need to make some breakthroughs.
And let me show you the video.

Video: Era of Robotics

The era of robotics has arrived.
One day everything that moves will be autonomous.
Researchers and companies around the world are developing robots powered by physical AI.
Physical AIs are models that can understand instructions and autonomously perform complex tasks in the real world.
Multimodal LLMs are breakthroughs that enable robots to learn, perceive, and understand the world around them and plan how they’ll act.
And from human demonstrations, robots can now learn the skills required to interact with the world using gross and fine motor skills.
One of the integral technologies for advancing robotics is reinforcement learning.
Just as LLMs need RLHF or reinforcement learning from human feedback to learn particular skills, generative physical AI can learn skills using reinforcement learning from physics feedback in a simulated world.
These simulation environments are where robots learn to make decisions by performing actions in a virtual world that obeys the laws of physics.
In these robot gyms, a robot can learn to perform complex and dynamic tasks safely and quickly, refining their skills through millions of acts of trial and error.
We built NVIDIA Omniverse as the operating system where physical AI can be created.
Omniverse is a development platform for virtual world simulation combining real-time physically-based rendering, physics simulation, and generative AI technologies.
In Omniverse, robots can learn how to be robots.
They learn how to autonomously manipulate objects with precision, such as grasping and handling objects.
Or navigate environments autonomously, finding optimal paths while avoiding obstacles and hazards.
Learning in Omniverse minimizes the sim-to-real gap and maximizes the transfer of learned behavior.
Building robots with generative physical AI requires three computers.
NVIDIA AI supercomputers to train the models.
NVIDIA Jetson Orin and next-generation Jetson Thor robotic supercomputers to run the models.
And NVIDIA Omniverse where robots can learn and refine their skills in simulated worlds.
We build the platforms, acceleration libraries, and AI models needed by developers and companies and allow them to use any or all of the stacks that suit them best.
The next wave of AI is here.
Robotics powered by physical AI will revolutionize industries.

Warehouse

This isn’t the future, this is happening now.
There are several ways that we’re going to serve the market.
The first, we’re going to create platforms for each type of robotic systems.
One for robotic factories and warehouses.
One for robots that manipulate things.
One for robots that move.
And one for robots that are humanoid.
And so each one of these robotic platforms is like almost everything else we do, a computer, acceleration libraries, and pre-trained models.
Computers, acceleration libraries, pre-trained models.
And we test everything, we train everything, and integrate everything inside Omniverse.
Omniverse, as the video was saying, is where robots learn how to be robots.
Now, of course, the ecosystem of robotic warehouses is really, really complex.
It takes a lot of companies, a lot of tools, a lot of technology to build a modern warehouse.
And warehouses are increasingly robotic.
One of these days, they will be fully robotic.
And so in each one of these ecosystems, we have SDKs and APIs that are connected into the software industry.
SDKs and APIs connected into edge AI industry and companies.
And then also, of course, systems that are designed for PLCs and robotic systems for the ODMs.
It’s then integrated by integrators, ultimately building warehouses for customers.
Here we have an example of Kenmac building a robotic warehouse for Giant Group.

Factories

And then here, now let’s talk about factories.
Factories have a completely different ecosystem.
And Foxconn is building some of the world’s most advanced factories.
Their ecosystem, again, edge computers and robotics software for designing the factories, the workflows, programming the robots, and of course, PLC computers that orchestrate the digital factories and the AI factories.
We have SDKs that are connected into each one of these ecosystems as well.
This is happening all over Taiwan.
Foxconn is building digital twins of their factories.
Delta is building digital twins of their factories.
By the way, half is real, half is digital, half is Omniverse.
Pegatron is building digital twins of their robotic factories.
Wistron is building digital twins of their robotic factories.
And this is really cool.
This is a video of Foxconn’s new factory.
Let’s take a look.
Demand for NVIDIA accelerated computing is skyrocketing as the world modernizes traditional data centers into generative AI factories.
Foxconn, the world’s largest electronics manufacturer, is gearing up to meet this demand by building robotic factories with NVIDIA Omniverse and AI.
Factory planners use Omniverse to integrate facility and equipment data from leading industry applications like Siemens Teamcenter X and Autodesk Revit.
In the digital twin, they optimize floor layout and line configurations and locate optimal camera placements to monitor future operations with NVIDIA Metropolis-powered Vision AI.
Virtual integration saves planners on the enormous cost of physical change orders during construction.
The Foxconn teams use the digital twin as the source of truth to communicate and validate accurate equipment layout.
The Omniverse digital twin is also the robot gym where Foxconn developers train and test NVIDIA Isaac AI applications for robotic perception and manipulation and Metropolis AI applications for sensor fusion.
In Omniverse, Foxconn simulates two robot AIs before deploying runtimes to Jetson computers on the assembly line.
They simulate Isaac manipulator libraries and AI models for automated optical inspection, for object identification, defect detection, and trajectory planning to transfer HGX systems to the test pods.
They simulate Isaac Perceptor-powered AMRs as they perceive and move about their environment with 3D mapping and reconstruction.
With Omniverse, Foxconn builds their robotic factories that orchestrate robots running on NVIDIA Isaac to build NVIDIA AI supercomputers, which in turn train Foxconn [Music] [Applause] robots.

Factories with Three Computers

So a robotic factory is designed with three computers.
Train the AI on NVIDIA AI.
You have the robot running on the PLC systems for orchestrating the factories.
And then you of course simulate everything inside Omniverse.
Well, the robotic arm and the robotic AMRs are also the same way, three computer systems.
The difference is the two Omniverses will come together so they’ll share one virtual space.
When they share one virtual space, that robotic arm will become inside the robotic factory.
And again, three computers and we provide the computer, the acceleration layers, and pre-trained AI models.
We’ve connected NVIDIA manipulator and NVIDIA Omniverse with Siemens, the world’s leading industrial automation software and systems company.
This is really a fantastic partnership and they’re working on factories all over the world.
Semantic Pick AI now integrates Isaac manipulator and Semantic Pick AI runs and operates ABB, KUKA, Yaskawa, FANUC, Universal Robotics, and Techman.
So Siemens is a fantastic integration.
We have all kinds of other integrations.
Let’s take a look.
ArcBest is integrating Isaac Perceptor into Vox smart autonomy robots for enhanced object recognition and human motion tracking in material handling.
BYD Electronics is integrating Isaac manipulator and Perceptor into their AI robots to enhance manufacturing efficiencies for global customers.
Idealworks is building Isaac Perceptor into their iw.os software for AI robots in factory logistics.
Intrinsic, an Alphabet company, is adopting Isaac manipulator into their Flowstate platform to advance robot grasping.
Gideon is integrating Isaac Perceptor into Trey AI-powered forklifts to advance AI-enabled logistics.
Argo Robotics is adopting Isaac Perceptor into Perception Engine for advanced vision-based AMRs.
Solomon is using Isaac manipulator AI models in their AcuPick 3D software for industrial manipulation.
Techman Robot is adopting Isaac Sim and manipulator into TM Flow, accelerating automated optical inspection.
Teradyne Robotics is integrating Isaac manipulator into ProScope X for cobots and Isaac Perceptor into MiR AMRs.
Vention is integrating Isaac manipulator into MachineLogic for AI manipulation.
[Music] Robotics is here.
Physical AI is here.
This is not science fiction.
It’s being used all over Taiwan and just really, really exciting.
And that’s the factory, the robots inside, and of course, all the products are going to be robotics.

Drive AV

There are two very high-volume robotics products.
One of course is the self-driving car or cars that have a great deal of autonomous capability.
NVIDIA again builds the entire stack.
Next year, we’re going to go to production with the Mercedes fleet and after that in 2026, the JLR fleet.
We offer the full stack to the world.
However, you’re welcome to take whichever parts, whichever layer of our stack, just as the entire drive stack is open.

Humanoid Robots

The next high-volume robotics product that’s going to be manufactured by robotic factories with robots inside will likely be humanoid robots.
And this has seen great progress in recent years in both the cognitive capability because of foundation models and also the world understanding capability that we’re in the process of developing.
I’m really excited about this area because obviously the easiest robot to adapt into the world are humanoid robots because we built the world for us.
We also have the most amount of data to train these robots than other types of robots because we have the same physique.
And so the amount of training data we can provide through demonstration capabilities and video capabilities is going to be really great.
And so we’re going to see a lot of progress in this area.
Well, I think we have some robots that we’d like to welcome.
There we go, about my [Applause] size.
And we have some friends to join us.
So the future of robotics is here.

Closing

The next wave of AI.
And of course, you know, Taiwan builds computers with keyboards.
You build computers for your pocket.
You build computers for data centers in the cloud.
In the future, you’re going to build computers that walk and computers that roll around.
So these are all just computers.
And as it turns out, the technology is very similar to the technology of building all of the other computers that you already build today.
So this is going to be a really extraordinary journey for us.
Well, I want to thank you.
I have one last video if you don’t mind.
Something that we really enjoyed making.
And if you, let’s run it.
[Music] [Music]
For [Music]
For [Music] [Music] [Music] [Music]
[Applause] Thank you.
I love you guys.
Thank [Applause] you.
Thank you all for coming.
Have a great Computex.
Thank [Applause] you.
I will make you like me.
Singing with me [Music] once again.
A Computex, we will show you all about the best.
Digital dreams and me.
To all of you in Taiwan, we hope that you like this song.
Yeah, good me and [Music] [Music]
Once again, thank you for joining us.
Digital twins and me.
I am awake for the new day.
Feeling the sun upon my face.
I am all the bad vibes away.
Look to the sky.
I have this moment inside the globe.
I am the sun up so high above.
I am the star you are thinking.
Take a minute now, take a minute to see.
Take a minute now, we are put in the scene.
This is everything that I could ever know.
On the top again in the land of the free.
E, e, e.

Background Music
- There comes a time when you realize that everything you’ve done leads to this moment.
- And I know you’re scared.
- It’s hard to believe it.
- You better make that jump before you’re gone.
- I’m going all in.
- I’m taking all my cards and putting them in the middle.
- This is just what I want.
- I will risk it all.
- I will risk it all for me.
- I will do it all.
- I will do it all for me.
- I will risk it all.
- I will risk it all for me.
- I will do it all.
- I will do it all for [Music] me.
- When you have the chance, don’t think about it because the push and pull will drive you insane.
- You can come up with a million reasons not to, but all it takes is one to know for sure.
- So I’m going all in.
- I’m taking all my cards and putting them in the middle.
- This is just what I want.
- I will risk it all.
- I will risk it all for me.
- I will do it all.
- I will do it all for me.
- I will risk it all.
- I will risk it all for me.
- I will do it all.
- I will do it all for [Music] me.
- [Music] Never give [Music] up.
- Give up [Music].
- Sometimes I just feel unstoppable.
- Other times I feel I want to go.
- To go me something long ago, always there when I need it.
- Never up when up your hand.
- See, I always when you had enough, just try to hold on when you want to get just never give [Music] up.
- [Music] It I always on theak the breaks when I’m down.
- I think of what you it always helps to get me out my up.
- Give [Music] up.
- Give [Music] up.
- Never give up when just hold your hand high till you reach the top.
- Know it’s when you enough.
- Just try to hold on when you want to give up.
- Just never give [Music] up up when hold.
- Your to I know it’s tough when you enough.
- Just try to hold on when you want to give up oh.
- Just never give [Music] up.
- The clock keeps ticking.
- I just close my eyes and manifest the braise is almost finished.
- I just want to see the end.
- I keep keep lifting when the going gets tough.
- I’mma keep going, nothing slowing me down.
- No one stopping me now, and when my faith is on the ledge, I won’t ever break.
- When they try to push me I stand tall.
- I’m not going to fall.
- Holding on to love, I’m against the wall, but I’m not giving up.
- Oh stand strong, no one’s going to break me down.
- No one’s going to break me down.
- No one’s going to break me now.
- I’m [Music] breakable.
- I keep dreaming even when I’m wide awake.
- I’m living like there’s no tomorrow.
- Never looking back, just down the road ahead.
- I keep pushing, never stop fighting, never stop laughing.
- Nothing slowing me down.
- No one stopping me now.
- Stopping me now, my faith is on the ledge.
- I won’t ever break.
- When they try to push me I stand tall.
- I’m not going to fall.
- Holding on to love.
- Up against the wall but I’m not giving up.
- I’m holding on, standing strong.
- No one’s going to break me down.
- No one’s going to break me down.
- No one’s going to break me now.
- I’m not going to fall.
- Holding on to love up against the wall but I’m not giving up.
- Hold it stand strong.
- No one’s going to break me down.
- No one’s going to break me down.
- No one’s going to break me down.
- I’m [Music] unbreakable.
- Used to feel locked up inside my head.
- You say I close the door now.
- I know that I’m the only one to open up the.
- Window.
- I can’t stop got to let it out.
- I got the power.
- I got the power spreading my wings never coming down.
- There’s so much power.
- There’s so much power in me.
- I got the power.
- I got the power in me.
- I got the power.
- I got the power now.
- I see what I have always been.
- I turn to face my future now.
- I know that I’m the only one to open up the window.
- I can’t stop got to let it out.
- I got the power.
- I got the power spread my wings never coming down.
- There’s so much power.
- There’s so much power in me [Applause] [Music].
- I got the power.
- I got the power in me.
- I got the power.
- I got the power.
- I got the power.
- I got the power in me.
- I got the power.
- I got the power in [Music] me.
↩︎
NVIDIA ACE | NVIDIA Developer - NVIDIA ACE is a suite of technologies for bringing digital humans, AI non-player characters (NPCs), and interactive avatars to life with generative AI. ↩︎
PhysX - Wikipedia - PhysX is an open-source[1] realtime physics engine middleware SDK developed by NVIDIA as a part of NVIDIA GameWorks software suite. ↩︎
NVIDIA/warp: A Python framework for high performance GPU simulation and graphics ↩︎
NeuralVDB: High-resolution Sparse Volume Representation using Hierarchical Neural Networks | ACM Transactions on Graphics ↩︎
Houdini | 3D Procedural Software for Film, TV & Gamedev | SideFX ↩︎
Ansys - Wikipedia - Ansys, Inc. is an American multinational company with its headquarters based in Canonsburg, Pennsylvania. It develops and markets CAE/multiphysics engineering simulation software for product design, testing and operation and offers its products and services to customers worldwide. ↩︎
OpenFOAM - Wikipedia - OpenFOAM (Open Field Operation And Manipulation) is a C++ toolbox for the development of customized numerical solvers, and pre-/post-processing utilities for the solution of continuum mechanics problems, most prominently including computational fluid dynamics (CFD). ↩︎
Earth-2 Platform for Climate Change Modeling | NVIDIA ↩︎
Research at NVIDIA | Advancing the Latest Technology | NVIDIA ↩︎
Project GR00T Robotic Foundation Model | NVIDIA Developer ↩︎
Direct memory access - Wikipedia ↩︎
Who is the waste provider | creator? ↩︎
Polars — DataFrames for the new era ↩︎
colab.google ↩︎
NVIDIA Collaborates with Hugging Face to Simplify Generative AI Model Deployments | NVIDIA Technical Blog ↩︎