(圖說:美味的背後有多少辛苦的前置準備? 拍攝於 Le Bouchon Ogasawara 餐廳,渋谷,東京。圖片來源:Ernest。)
內容大綱
摘要 tl;dr
- TOPS (Trillions of Operations Per Second) 是衡量 AI 晶片和 NPU 晶片運算能力的關鍵指標,反映處理器每秒可執行的萬億次運算數。
- 用「煎蛋」類比舉例直觀理解 TOPS:普通 CPU 如同時只能煎一個蛋的廚師,而高 TOPS 效能的 AI 晶片則是能同時煎超級無敵多個蛋的超級廚師。
- TOPS 是比較 AI 晶片性能的重要參考,但評估 AI 硬體時還需考慮能源效率、記憶體頻寬等因素,且 TOPS 值通常反映理論峰值,實際性能還需要綜合其他指標數據作為適合應用場景的判斷。
什麼是 TOPS (簡單生活版)
TOPS,全稱為 Trillions Operations Per Second (每秒兆次運算、每秒萬億次運算),是衡量人工智慧 (AI) 晶片或神經網路處理器 (NPU) 計算能力的重要指標。TOPS 用來表示一個處理器最高每秒能夠執行的運算數量,以萬億次來計算。未來如果運算能力繼續提升,則開頭的 T 將會換成其他更大的計量單位。
我們可以用日常生活中的例子來解釋,以更直觀地理解 TOPS
想像 AI 運算是煎蛋過程,數據是那個被加熱的蛋。
一個普通的廚師 (普通處理器、CPU) 可能一次只能煎一個蛋,而一個超級廚師 (AI 晶片) 可能同時煎 1 兆個蛋!TOPS 就像是衡量這個「超級廚師」能力的指標,告訴我們它每秒能「煎」多少個「數據蛋」。
TOPS 是理解和比較 AI 晶片性能的重要指標之一,但不是唯一。
在評估 AI 硬體或 AI 手機、AI 電腦時,請記得同時考慮其他因素,如能源使用效率、記憶體頻寬、軟體生態系統等。使用 TOPS 可以幫助我們比較不同 AI 晶片的計算能力,為選擇適合特定應用的 AI 硬體設備提供一個參考點。
什麼是 TOPS (硬要深入版)
在深入理解 TOPS 之前,我們需要先明白什麼是「operation」 (運算) :
在數位電路和計算機科學中,一個「operation」通常指的是一個基本的數學或邏輯運算。對於 AI 晶片或 NPU 來說,這些運算主要包括:
- 浮點運算:如加法、減法、乘法和除法。
- 矩陣運算:大規模的矩陣乘法是深度學習中最常見的運算之一。
- 向量運算:包括內積 (數量積)、外積 (向量積) 等。
- 激勵函數:如 ReLU、Sigmoid、Tanh 等。
- 卷積運算:在卷積神經網路 (CNN) 中廣泛使用。
這些運算通常以 FP32 (32 位元浮點數) 或 FP16 (16 位元浮點數) 格式進行。有些 AI 晶片還支持 INT8 (8 位元整數) 等低精度格式,以提高效能和降低能耗,通常用於推理。
TOPS 的計算方式可以簡化為:
TOPS = (每個時鐘周期的運算數) × (時鐘頻率) / 1 兆
例如,如果一個 AI 晶片在每個時鐘周期可以執行 1000 個運算,時鐘頻率為 1GHz,那麼它的理論峰值性能就是 1 TOPS。
1000 運算/周期×1GHz = 1000×10^9 運算/秒 = 10^12 運算/秒 = 1 TOPS
理解 TOPS 時請注意以下幾點:
- TOPS 通常表示理論峰值性能,實際性能可能會因為記憶體頻寬、晶片架構等因素而有所不同。
- 不同類型的運算 (如 FP32、FP16、INT8) 的 TOPS 值可能會不同。
- TOPS 值高不一定意味著在所有 AI 任務上都有更好的表現,因為實際性能還取決於軟體優化、特定任務的特性等。
TOPS 比較表
(主要可看「INT8 Ops」欄位。可以左右滑動看到更多比較數據)
INT8 Ops | FP32 FLOps | Company Name | Type | Target Market | Product Family | Product Name | Product Generation | Code Name | Release Year | First Used On | Fab Process | CPU | GPU | NPU | Memory Tech | Memory Bandwidth | TDP Base | Remark |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
73 TOPS | n/a | AMD | SoC | PC | Ryzen AI 300 | Ryzen AI 9 365 | n/a | Strix Point | 2024 | n/a | TSMC 4nm FinFET | n/a | AMD Radeon™ 880M | n/a | DDR5-5600 or LPDDR5X-7500 | n/a | 28.0 | - Total 73 TOPS (50 TOPS from NPU). |
80 TOPS | n/a | AMD | SoC | PC | Ryzen AI 300 | Ryzen AI 9 HX 370 | n/a | Strix Point | 2024 | n/a | TSMC 4nm FinFET | n/a | AMD Radeon™ 890M | n/a | DDR5-5600 or LPDDR5X-7500 | n/a | 28.0 | - Total 80 TOPS (50 TOPS from NPU). |
50 TOPS | n/a | AMD | NPU | n/a | Ryzen | XDNA 2 | n/a | AI | 2024 | Ryzen AI 9 HX 370 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
1961.2 TOPS 3922.3 TOPS (with Sparsity) | 122.6 TFLOPS | AMD | GPU | Datacenter | AMD Data Center GPUs (AMD Instinct) | MI300A | n/a | n/a | 2023 | n/a | n/a | n/a | n/a | n/a | HBM3 | 5300 GB/s | 550.0 | n/a |
2614.9 TOPS 5229.8 TOPS (with Sparsity) | 163.4 TFLOPS | AMD | GPU | Datacenter | AMD Data Center GPUs (AMD Instinct) | MI300X | n/a | n/a | 2023 | n/a | XCD: TSMC N5 IOD: TSMC N6 | n/a | n/a | n/a | HBM3 | 5300 GB/s | 750.0 | n/a |
2614.9 TOPS 5229.8 TOPS (with Sparsity) | 163.4 TFLOPS | AMD | GPU | Datacenter | AMD Data Center GPUs (AMD Instinct) | MI325X | n/a | n/a | 2024 | n/a | XCD: TSMC N5 IOD: TSMC N6 | n/a | n/a | n/a | HBM3E | 6000 GB/s | 750.0 | n/a |
n/a | n/a | ARM | IP | n/a | Neoverse | Neoverse E1 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | ARM | IP | n/a | Neoverse | Neoverse N1 | n/a | Ares | 2019 | Ampere Altra, AWS Graviton2 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | ARM | IP | Datacenter (Infrastructure Processor) | Neoverse | Neoverse N2 | n/a | Perseus | 2020 | Microsoft Azure Cobalt 100 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | ARM | IP | Datacenter (Infrastructure Processor) | Neoverse | Neoverse N3 | n/a | Hermes | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | ARM | IP | Datacenter (Infrastructure Processor) | Neoverse | Neoverse V1 | n/a | Zeus | 2020 | AWS Graviton3 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | - first announcements coming out of Arm’s TechCon convention 2018 in San Jose. |
n/a | n/a | ARM | IP | Datacenter (Infrastructure Processor) | Neoverse | Neoverse V2 | n/a | n/a | 2022 | NVIDIA Grace, AWS Graviton4, Google Axion | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | ARM | IP | Datacenter (Infrastructure Processor) | Neoverse | Neoverse V3 | n/a | Poseidon | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
825 TOPS ??? | n/a | Alibaba | SoC | Datacenter (AI inference) | Hanguang 含光 | Hanguang 800 | 1 | n/a | 2019 | n/a | TSMC 12nm | n/a | n/a | n/a | n/a | n/a | 280.0 | - 16x PCIe gen4 - SRAM, No DDR |
n/a | n/a | Alibaba | SoC | Datacenter (Infra) | Yitian 倚天 | Yitian 710 | 1 | n/a | 2021 | Alibaba ECS g8m | N5 | 128 Neoverse N2 core | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | Amazon | SoC | Datacenter (Infra) (Scale out) | AWS Graviton | Graviton | 1 | Alpine | 2018 | Amazon EC2 A1 | TSMC 16nm | Cortex A72 | n/a | n/a | DDR4-1600 | 51.2 GB/s | 95.0 | - 32 lanes of PCIe gen3 |
n/a | n/a | Amazon | SoC | Datacenter (Infra) (General Purpose) | AWS Graviton | Graviton 2 | 2 | Alpine+ | 2019 | Amazon EC2 M6g, M6gd, C6g, C6gd, C6gn, R6g, R6gd, T4g, X2gd, G5g, Im4gn, Is4gen, I4g | TSMC 7nm | 128 Neoverse N1 core | n/a | n/a | DDR4-3200 | 204.8 GB/s | 110.0 | - 64 lanes of PCIe gen4 |
n/a | n/a | Amazon | SoC | Datacenter (Infra) (ML, HPC, SIMD) | AWS Graviton | Graviton 3 | 3 | n/a | 2021 | Amazon EC2 C7g, M7g, R7g; with local disk: C7gd, M7gd, R7gd | TSMC 5nm | 64 Neoverse V1 core | n/a | n/a | DDR5-4800 | 307.2 GB/s | 100.0 | - 32 lanes of PCIe gen5 |
n/a | n/a | Amazon | SoC | Datacenter (Infra) | AWS Graviton | Graviton 3E | 3 | n/a | 2022 | Amazon EC2 C7gn, HPC7g | n/a | 64 Neoverse V1 core | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | Amazon | SoC | Datacenter (Infra) (Scale up) | AWS Graviton | Graviton 4 | 4 | n/a | 2023 | Amazon EC2 R8g | n/a | 96 Neoverse V2 core | n/a | n/a | DDR5-5600 | 537.6 GB/s | n/a | - 96 lanes of PCIe gen5 |
63.3 TOPS | 0.97 TFLOPS | Amazon | SoC | Datacenter (AI inference) | AWS Inferertia | Inferertia 1 | 1 | n/a | 2018 | Amazon EC2 Inf1 | TSMC 16nm | 16 NeuroCore v1 | n/a | n/a | n/a | 50 GB/s | n/a | n/a |
380 TOPS | 2.9 TFLOPS | Amazon | SoC | Datacenter (AI inference) | AWS Inferertia | Inferertia 2 | 2 | n/a | 2022 | Amazon EC2 Inf2 | TSMC 5nm | 24 NeuroCore v2 | n/a | n/a | n/a | 820 GB/s | n/a | n/a |
380 TOPS | 2.9 TFLOPS | Amazon | SoC | Datacenter (AI train) | AWS Trainium | Trainium 1 | 1 | n/a | 2020 | Amazon EC2 Trn1 | TSMC 7nm | 32 NeuroCore v2 | n/a | n/a | n/a | 820 GB/s | n/a | n/a |
861 TOPS | 6.57 TFLOPS | Amazon | SoC | Datacenter (AI train) | AWS Trainium | Trainium 2 | 2 | n/a | 2023 | Amazon EC2 Trn2 | TSMC 4nm | 64 NeuroCore v2 | n/a | n/a | n/a | 4,096 GB/s | n/a | n/a |
11 TOPS | 748.8 GFLOPS | Apple | SoC | Mobile | A | A14 Bionic | n/a | APL1W01 | 2020 | iPhone 12 | TSMC N5 | Firestorm + Icestorm | n/a | n/a | LPDDR4X-4266 | 34.1 GB/s | n/a | n/a |
15.8 TOPS | 1.37 TFLOPS | Apple | SoC | Mobile | A | A15 Bionic | n/a | APL1W07 | 2021 | iPhone 13 | TSMC N5P | Avalanche + Blizzard | n/a | n/a | LPDDR4X-4266 | 34.1 GB/s | n/a | n/a |
17 TOPS | 1.789 TFLOPS | Apple | SoC | Mobile | A | A16 Bionic | n/a | APL1W10 | 2022 | iPhone 14 | TSMC N4P | Everest + Sawtooth | n/a | n/a | LPDDR5-6400 | 51.2 GB/s | n/a | - 6GB LPDDR5 |
35 TOPS | 2.147 TFLOPS | Apple | SoC | Mobile | A | A17 Pro | n/a | APL1V02 | 2023 | iPhone 15 Pro, iPhone 15 Pro Max | TSMC N3B | 6 cores (2 performance + 4 efficiency) | Apple-designed 6-core | 16-core Neural Engine | LPDDR5-6400 | 51.2 GB/s | n/a | - 8GB LPDDR5 |
35 TOPS | n/a | Apple | SoC | Mobile | A | A18 | n/a | n/a | 2024 | iPhone 16 | TSMC N3P | 6 cores (2 performance + 4 efficiency) | Apple-designed 5-core | 16-core Neural Engine | n/a | n/a | n/a | n/a |
35 TOPS | n/a | Apple | SoC | Mobile | A | A18 Pro | n/a | n/a | 2024 | iPhone 16 Pro | TSMC N3P | 6 cores (2 performance + 4 efficiency) | Apple-designed 6-core | 16-core Neural Engine | n/a | n/a | n/a | n/a |
11 TOPS | 2.6 TFLOPS | Apple | SoC | Mobile, PC | M | M1 | n/a | APL1102 | 2020 | n/a | TSMC N5 | high-performance “Firestorm” + energy-efficient “Icestorm” | n/a | n/a | LPDDR4X-4266 | 68.3 GB/s | n/a | n/a |
11 TOPS | 10.4 TFLOPS | Apple | SoC | Mobile, PC | M | M1 Max | n/a | APL1105 | 2021 | n/a | TSMC N5 | n/a | n/a | n/a | LPDDR5-6400 | 409.6 GB/s | n/a | n/a |
11 TOPS | n/a | Apple | SoC | Mobile, PC | M | M1 Pro | n/a | APL1103 | 2021 | n/a | TSMC N5 | n/a | n/a | n/a | LPDDR5-6400 | 204.8 GB/s | n/a | n/a |
22 TOPS | 21 TFLOPS | Apple | SoC | Mobile, PC | M | M1 Ultra | n/a | APL1W06 | 2022 | n/a | TSMC N5 | The M1 Ultra consists of two M1 Max units connected with UltraFusion Interconnect with a total of 20 CPU cores and 96 MB system level cache (SLC). | n/a | n/a | LPDDR5-6400 | 819.2 GB/s | n/a | n/a |
15.8 TOPS | 2.863 TFLOPS, 3.578 TFLOPS | Apple | SoC | Mobile, PC | M | M2 | n/a | APL1109 | 2022 | n/a | TSMC N5P | high-performance @3.49 GHz “Avalanche” + energy-efficient @2.42 GHz “Blizzard” | n/a | n/a | LPDDR5-6400 | 102.4 GB/s | n/a | n/a |
15.8 TOPS | 10.736 TFLOPS, 13.599 TFLOPS | Apple | SoC | Mobile, PC | M | M2 Max | n/a | APL1111 | 2023 | n/a | TSMC N5P | n/a | n/a | n/a | LPDDR5-6400 | 409.6 GB/s | n/a | n/a |
15.8 TOPS | 5.726 TFLOPS, 6.799 TFLOPS | Apple | SoC | Mobile, PC | M | M2 Pro | n/a | APL1113 | 2023 | n/a | TSMC N5P | n/a | n/a | n/a | LPDDR5-6400 | 204.8 GB/s | n/a | n/a |
31.6 TOPS | 21.473 TFLOPS, 27.199 TFLOPS | Apple | SoC | Mobile, PC | M | M2 Ultra | n/a | APL1W12 | 2023 | n/a | TSMC N5P | n/a | n/a | n/a | LPDDR5-6400 | 819.2 GB/s | n/a | n/a |
18 TOPS | 2.826 TFLOPS, 3.533 TFLOPS | Apple | SoC | Mobile, PC | M | M3 | n/a | APL1201 | 2023 | MacBook Pro | TSMC N3B | n/a | n/a | n/a | LPDDR5-6400 | 102.4 GB/s | n/a | n/a |
18 TOPS | 10.598 TFLOPS, 14.131 TFLOPS | Apple | SoC | Mobile, PC | M | M3 Max | n/a | APL1204 | 2023 | n/a | TSMC N3B | n/a | n/a | n/a | LPDDR5-6400 | 307.2 GB/s, 409.6 GB/s | n/a | n/a |
18 TOPS | 4.946 TFLOPS, 6.359 TFLOPS | Apple | SoC | Mobile, PC | M | M3 Pro | n/a | APL1203 | 2023 | n/a | TSMC N3B | n/a | n/a | n/a | LPDDR5-6400 | 153.6 GB/s | n/a | n/a |
38 TOPS | 3.763 TFLOPS | Apple | SoC | Mobile, PC | M | M4 | n/a | APL1206 | 2024 | iPad Pro (7th generation) | TSMC N3E | 10 cores (4 performance + 6 efficiency) | Apple-designed 10-core | 16-core Neural Engine | LPDDR5X-7500 | 120 GB/s | n/a | n/a |
38 TOPS | n/a | Apple | SoC | Mobile, PC | M | M4 Max | n/a | n/a | 2024 | MacBook Pro M4 Max | TSMC N3E | 14 cores (10 performance + 4 efficiency) 16 cores (12 performance + 4 efficiency) | Apple-designed 16-core Apple-designed 20-core | 16-core Neural Engine | LPDDR5X-8533 | 409.6 GB/s (36GB), 546 GB/s (48GB, 64GB, 128GB) | n/a | n/a |
38 TOPS | n/a | Apple | SoC | Mobile, PC | M | M4 Pro | n/a | n/a | 2024 | MacBook Pro M4 Pro, Mac mini M4 Pro | TSMC N3E | 12 cores (8 performance + 4 efficiency) 14 cores (10 performance + 4 efficiency) | Apple-designed 32-core Apple-designed 40-core | 16-core Neural Engine | LPDDR5X-8533 | 273 GB/s | n/a | n/a |
n/a | n/a | SoC | Datacenter (Infra) | GCP CPU | Axion | n/a | Axion | 2024 | GCP Compute Engine ??? | n/a | ?? Neoverse V2 core | n/a | n/a | n/a | n/a | n/a | n/a | |
1.6 TOPS | n/a | SoC | Mobile | Google Tensor (Edge TPU) | G1 | 1 | Whitechapel | 2021 | Pixel 6, Pixel 6 Pro, Pixel 6a | Samsung 5 nm LPE | Octa-core: 2.8 GHz Cortex-X1 (2×) 2.25 GHz Cortex-A76 (2×) 1.8 GHz Cortex-A55 (4×) | Mali-G78 MP20 at 848 MHz | Google Edge TPU | LPDDR5 | 51.2 GB/s | n/a | n/a | |
n/a | n/a | SoC | Mobile | Google Tensor (Edge TPU) | G2 | 2 | Cloudripper | 2022 | Pixel 7, Pixel 7 Pro, Pixel 7a, Pixel Fold, Pixel Tablet | Samsung 5 nm LPE | Octa-core: 2.85 GHz Cortex-X1 (2×) 2.35 GHz Cortex-A78 (2×) 1.8 GHz Cortex-A55 (4×) | Mali-G710 MP7 at 850 MHz | Google Edge TPU | LPDDR5 | 51.2 GB/s | n/a | n/a | |
27 TOPS | n/a | SoC | Mobile | Google Tensor (Edge TPU) | G3 | 3 | Zuma (Dev Board: Ripcurrent) | 2023 | Pixel 8, Pixel 8 Pro, Pixel 8a | Samsung 4nm LPP | Nona-core: 2.91 GHz Cortex-X3 (1×) 2.37 GHz Cortex-A715 (4×) 1.7 GHz Cortex-A510 (4×) | Mali-G715 MP10 at 890 MHz | Google Edge TPU (Rio) | LPDDR5X | 68.2 GB/s | n/a | n/a | |
45 TOPS | n/a | SoC | Mobile | Google Tensor (Edge TPU) | G4 | 4 | Zuma Pro | 2024 | Pixel 9, Pixel 9 Pro | Samsung 4nm LPP | Octa-core: 3.1 GHz Cortex-X4 (1×) 2.6 GHz Cortex-A720 (3×) 1.92 GHz Cortex-A520 (4×) | Mali-G715 MP10 at 940 MHz | n/a | LPDDR5X | n/a | n/a | - 8Gen3 = 45 TOPS, D9300 = 48 TOPS | |
n/a | n/a | SoC | Mobile | Google Tensor (Edge TPU) | G5 | 5 | Laguna Beach (Dev Board: Deepspace) | 2025 | Pixel 10, Pixel 10 Pro | TSMC N3 + InFO-POP packaging | n/a | n/a | n/a | n/a | n/a | n/a | n/a | |
23 TOPS | n/a | SoC | Datacenter (AI inference) | TPU | TPUv1 | 1 | n/a | 2015 | n/a | 28nm | n/a | n/a | n/a | DDR3-2133 | 34 GB/s | 75.0 | - The core of TPU: Systolic Array - Matrix Multiply Unit (MXU): a big systolic array - PCIe Gen3 x16 | |
45 TOPS | 3 TFLOPS | SoC | Datacenter (AI inference) | TPU | TPUv2 | 2 | n/a | 2017 | n/a | 16nm | n/a | n/a | n/a | n/a | 600 GB/s | 280.0 | - 16GB HBM - BF16 | |
123 TOPS | 4 TFLOPS | SoC | Datacenter (AI inference) | TPU | TPUv3 | 3 | n/a | 2018 | n/a | 16nm | n/a | n/a | n/a | n/a | 900 GB/s | 220.0 | n/a | |
275 TOPS | n/a | SoC | Datacenter (AI inference) | TPU | TPUv4 | 4 | n/a | 2021 | n/a | 7nm | n/a | n/a | n/a | n/a | 1,200 GB/s | 170.0 | - 32GB HBM2 | |
393 TOPS | n/a | SoC | Datacenter (AI inference) | TPU | TPUv5e | 5 | n/a | 2023 | n/a | n/a | n/a | n/a | n/a | n/a | 819 GB/s | n/a | n/a | |
918 TOPS | n/a | SoC | Datacenter (AI inference) | TPU | TPUv5p | 5 | n/a | 2023 | n/a | n/a | n/a | n/a | n/a | n/a | 2,765 GB/s | n/a | n/a | |
n/a | n/a | SoC | Datacenter (AI inference) | TPU | TPUv6? Trillium? | 6 | n/a | 2024 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | |
n/a | 31 TFLOPS | Graphcore | SoC | Datacenter | Colossus | Colossus MK1 GC2 IPU | 1 | n/a | 2017 | n/a | TSMC 16nm | 1216 processor cores | n/a | n/a | n/a | 45,000 GB/s | n/a | n/a |
n/a | 62 TFLOPS | Graphcore | SoC | Datacenter | Colossus | Colossus MK2 GC200 IPU | 2 | n/a | 2020 | n/a | TSMC 7nm | 1472 processor cores | n/a | n/a | n/a | 47,500 GB/s | n/a | n/a |
n/a | n/a | Graphcore | SoC | Datacenter | Colossus | Colossus MK3 (TBD) | 3 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | Intel | SoC | HP Mobile, PC | n/a | n/a | n/a | Arrow Lake | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
120 TOPS | n/a | Intel | SoC | LP Mobile | Core Ultra | Core Ultra | Series 2 | Lunar Lake | 2024 | n/a | TSMC N3B (Compute tile), TSMC N6 (Platform controoler tile) | P-core: Lion Cove E-core: Skymont | Xe2 | NPU 4 | n/a | n/a | n/a | - Total 120 TOPS (48 TOPS from NPU 4 + 67 TOPS from GPU + 5 TOPS from CPU). |
34 TOPS | n/a | Intel | SoC | Mobile | Core Ultra | Core Ultra | Series 1 | Meteor Lake | 2023 | n/a | Intel 4 (7nm EUV, Compute tile), TSMC N5 (Graphics tile), TSMC N6 (Soc tile, I/O extender tile) | P-core: Redwood Cove E-core: Crestmont | Xe-LPG | NPU 3720 | n/a | n/a | n/a | - Total 34 TOPS (11 TOPS from NPU + 18 TOPS from GPU + 5 TOPS from CPU). |
0.5 TOPS | n/a | Intel | NPU | n/a | n/a | NPU 1 | 1 | n/a | 2018 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
7 TOPS | n/a | Intel | NPU | n/a | n/a | NPU 2 | 2 | n/a | 2021 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
11.5 TOPS | n/a | Intel | NPU | n/a | n/a | NPU 3 | 3 | n/a | 2023 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
48 TOPS | n/a | Intel | NPU | n/a | n/a | NPU 4 | 4 | n/a | 2024 | Lunar Lake | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9000 天璣 9000 | 9000 | n/a | 2021 | Redmi K50 Pro OPPO Find X5 Pro 天璣版 vivo X80 / X80 Pro 天璣版 | TSMC N4 | 1× Cortex-X2 @ 3.05 GHz 3× Cortex-A710 @ 2.85 GHz 4× Cortex-A510 @ 1.8 GHz | Mali-G710 MP10 @ 850 MHz | MediaTek APU 590 | n/a | n/a | n/a | - 5G NR Sub-6GHz, LTE |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9000+ 天璣 9000+ | 9000 | n/a | 2022 | 小米12 Pro 天璣版 華碩 ROG Phone 6D Ultimate iQOO Neo 7 OPPO Find N2 Flip | TSMC N4 | 1× Cortex-X2 @ 3.2 GHz 3× Cortex-A710 @ 2.85 GHz 4× Cortex-A510 @ 1.8 GHz | Mali-G710 MC10 | MediaTek APU 590 | n/a | n/a | n/a | - 5G NR Sub-6GHz, LTE |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9200 天璣 9200 | 9000 | n/a | 2022 | vivo X90, vivo X90 Pro OPPO Find X6 OPPO Find N3 Flip | TSMC N4 | 1× Cortex-X3 @ 3.05GHz 3× Cortex-A715 @ 2.85GHz 4× Cortex-A510 @ 1.8GHz | Mali-Immortalis-G715 MP11 @ 981 MHz | MediaTek APU 690 | n/a | n/a | n/a | - 5G NR Sub-6 GHz, 5G mmWave, LTE |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9200+ 天璣 9200+ | 9000 | n/a | 2023 | iQOO Neo8 Pro vivo X90s Redmi K60至尊版 | TSMC N4 | 1× Cortex-X3 @ 3.35 GHz 3× Cortex-A715 @ 3.0 GHz 4× Cortex-A510 @ 2.0 GHz | Mali-Immortalis-G715 MC11 | MediaTek APU 690 | n/a | n/a | n/a | - 5G NR Sub-6 GHz, 5G mmWave, LTE |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9300 天璣 9300 | 9000 | n/a | 2023 | vivo X100, vivo X100 Pro OPPO Find X7 | TSMC N4P | 1× Cortex-X4 @ 3.25 GHz 3× Cortex-X4 @ 2.85 GHz 4× Cortex-A720 @ 2.0 GHz | Mali-Immortalis-G720 MC12 @ 1300 MHz | MediaTek APU 790 | n/a | n/a | n/a | - 5G NR (Sub-6 GHz & mmWave), 4G LTE, quad-band GNSS (BeiDou, Galileo, GLONASS, GPS, NavIC, QZSS), Bluetooth 5.4, Wi-Fi 7 (2x2) |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9300+ 天璣 9300+ | 9000 | n/a | 2024 | vivo X100S, vivo X100X Pro | TSMC N4P | 1× Cortex-X4 @ 3.4 GHz 3× Cortex-X4 @ 2.85 GHz 4× Cortex-A720 @ 2.0 GHz | Mali-Immortalis-G720 MC12 @ 1300 MHz | MediaTek APU 790 | n/a | n/a | n/a | - 5G NR (Sub-6 GHz & mmWave), 4G LTE, quad-band GNSS (BeiDou, Galileo, GLONASS, GPS, NavIC, QZSS), Bluetooth 5.4, Wi-Fi 7 (2x2) |
n/a | n/a | MediaTek | SoC | Mobile | Dimensity 天璣 | Dimensity 9400 天璣 9400 | 9000 | n/a | 2024 | vivo X200, OPPO Find X8 / Pro | TSMC N3 | 1× Cortex-X925 @ 3.63 GHz 3× Cortex-X4 @ 2.8 GHz 4× Cortex-A725 @ 2.1 GHz | Mali-Immortalis-G925 MC12 @ ??? MHz | n/a | n/a | n/a | n/a | n/a |
n/a | n/a | Microsoft | SoC | Datacenter (Infra) | Azure Cobalt | Cobalt 100 | 1 | n/a | 2024 | Azure VM Dpsv6, Dplsv6, Epsv6 | n/a | 128 Neoverse V2 core | n/a | n/a | LPDDR5 ??? | n/a | n/a | - PCIe gen5 - CXL 1.1 - from project start to silicon in 13 months. |
1,600 TOPS | n/a | Microsoft | SoC | Datacenter (AI inference) | Azure Maia | Maia 100 | 1 | n/a | 2024 | Microsoft Copilot | TSMC N5 + CoWoS-S | n/a | n/a | n/a | n/a | 18,000 GB/s ??? | 500.0 | - 32Gb/s PCIe gen5x8 - Design to TDP = 700W - Provision TDP = 500W |
n/a | 15.1 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4060 | n/a | AD107-400 | 2023 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6 | 272 GB/s | 115.0 | - PCIe 4.0 x8 |
n/a | 22.1 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4060 Ti | n/a | AD106-351 | 2023 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6 | 288 GB/s | 160.0 | - PCIe 4.0 x8 |
n/a | 29.1 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4070 | n/a | AD104-250 | 2023 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 504 GB/s | 200.0 | - PCIe 4.0 x16 |
n/a | 35.48 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4070 Super | n/a | AD104-350 | 2024 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 504 GB/s | 220.0 | - PCIe 4.0 x16 |
n/a | 40.1 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4070 Ti | n/a | AD104-400 | 2023 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 504 GB/s | 285.0 | - PCIe 4.0 x16 |
n/a | 44.10 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4070 Ti Super | n/a | AD103-275 | 2024 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 672 GB/s | 285.0 | - PCIe 4.0 x16 |
n/a | 48.7 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4080 | n/a | AD103-300 | 2022 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 717 GB/s | 320.0 | - PCIe 4.0 x16 |
n/a | 52.22 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4080 Super | n/a | AD103-400 | 2024 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 736 GB/s | 320.0 | - PCIe 4.0 x16 |
n/a | 82.6 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4090 | n/a | AD102-300 | 2022 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 1008 GB/s | 450.0 | - PCIe 4.0 x16 |
n/a | 73.5 TFLOPS | NVIDIA | GPU | Desktop | GeForce RTX 40 | GeForce RTX 4090 D | n/a | AD102-250 | 2023 | n/a | TSMC N4 | n/a | n/a | n/a | GDDR6X | 1008 GB/s | 425.0 | - PCIe 4.0 x16 |
n/a | 124.96 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | A10 | Ampere | n/a | 2021 | n/a | n/a | n/a | 1× GA102-890-A1 | n/a | GDDR6 | 600 GB/s | n/a | n/a |
624 TOPS | 312.0 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | A100 | Ampere | n/a | 2020 | n/a | TSMC N7 | n/a | 1× GA100-883AA-A1 | n/a | HBM2 | 1555 GB/s | 400.0 | n/a |
n/a | 73.728 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | A16 | Ampere | n/a | 2021 | n/a | n/a | n/a | 4× GA107 | n/a | GDDR6 | 4x 200 GB/s | n/a | n/a |
n/a | 18.124 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | A2 | Ampere | n/a | 2021 | n/a | n/a | n/a | 1× GA107 | n/a | GDDR6 | 200 GB/s | 60.0 | n/a |
n/a | 165.12 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | A30 | Ampere | n/a | 2021 | n/a | n/a | n/a | 1× GA100 | n/a | HBM2 | 933.1 GB/s | n/a | n/a |
n/a | 149.68 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | A40 | Ampere | n/a | 2020 | n/a | n/a | n/a | 1× GA102 | n/a | GDDR6 | 695.8 GB/s | n/a | n/a |
3500 TOPS (3.5 POPS) | n/a | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | B100 (SXM6 card) | Blackwell | n/a | 2024 | n/a | TSMC 4NP (custom N4P) | n/a | n/a | n/a | HBM3E | 8000 GB/s | 700.0 | n/a |
4500 TOPS (4.5 POPS) | n/a | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | B200 (SXM6 card) | Blackwell | n/a | 2024 | n/a | TSMC 4NP (custom N4P) | n/a | n/a | n/a | HBM3E | 8000 GB/s | 1000.0 | n/a |
n/a | 756.449 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | H100 (PCIe card) | Hopper | n/a | 2022 | n/a | TSMC 4N (custom N4) | n/a | 1× GH100 | n/a | HBM2E | 2039 GB/s | n/a | n/a |
1980 TOPS (1.98 POPS) | 989.43 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | H100 (SXM5 card) | Hopper | n/a | 2022 | n/a | TSMC 4N (custom N4) | n/a | 1× GH100 | n/a | HBM3 | 3352 GB/s | 700.0 | n/a |
1980 TOPS (1.98 POPS) | 67 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | H200 (SXM5 card) | Hopper | n/a | 2023 | n/a | TSMC 4N (custom N4) | n/a | n/a | n/a | HBM3E | 4800 GB/s | 1000.0 | n/a |
n/a | 121.0 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | L4 | Ada Lovelace | n/a | 2023 | n/a | n/a | n/a | 1x AD104 | n/a | GDDR6 | 1563 GB/s | n/a | n/a |
n/a | 362.066 TFLOPS | NVIDIA | GPU | Datacenter | Nvidia Data Center GPUs (Nvidia Tesla) | L40 | Ada Lovelace | n/a | 2022 | n/a | n/a | n/a | 1× AD102 | n/a | GDDR6 | 2250 GB/s | n/a | n/a |
n/a | 2.774 TFLOPS | Qualcomm | SoC | Mobile | Snapdragon 8 | Snapdragon 8 Gen 3 | 8 | n/a | 2023 | n/a | TSMC N4P | 1× 3.30 GHz Kryo Prime (Cortex-X4) + 3× 3.15 GHz Kryo Gold (Cortex-A720) + 2× 2.96 GHz Kryo Gold (Cortex-A720) + 2× 2.27 GHz Kryo Silver (Cortex-A520) | Adreno 750 @ 903 MHz | n/a | LPDDR5X | 76.8 GB/s | n/a | n/a |
n/a | 1.689 TFLOPS | Qualcomm | SoC | Mobile | Snapdragon 8 | Snapdragon 8s Gen 3 | 8 | n/a | 2024 | n/a | TSMC N4P | 1× 3.0 GHz Kryo Prime (Cortex-X4) + 4× 2.8 GHz Kryo Gold (Cortex-A720) + 3× 2.0 GHz Kryo Silver (Cortex-A520) | Adreno 735 @ 1100 MHz | n/a | LPDDR5X | 76.8 GB/s | n/a | n/a |
45 TOPS | 4.6 TFLOPS | Qualcomm | SoC | PC | Snapdragon X | Snapdragon X Elite | X | n/a | 2023 | n/a | TSMC N4 | Oryon | Adreno X1 | Hexagon | LPDDR5X-8448 @ 4224 MHz | 135 GB/s | n/a | - Total 75 TOPS (45 TOPS from NPU). |
45 TOPS | 3.8 TFLOPS | Qualcomm | SoC | PC | Snapdragon X | Snapdragon X Plus | X | n/a | 2024 | n/a | TSMC N4 | Oryon | Adreno X1-45 1107 MHz (1.7 TFLOPS) Adreno X1-45 (2.1 TFLOPS) Adreno X1-85 1250 MHz (3.8 TFLOPS) | Hexagon | LPDDR5X-8448 @ 4224 MHz | 135 GB/s | n/a | n/a |
45 TOPS | n/a | Qualcomm | NPU | n/a | Hexagon | Hexagon | n/a | n/a | n/a | Snapdragon X Plus | n/a | n/a | n/a | n/a | n/a | n/a | n/a | - Hexagon is the brand name for a family of digital signal processor (DSP) and later neural processing unit (NPU) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.” |
n/a | 2.1 TFLOPS | Qualcomm | GPU | n/a | Adreno | Adreno X1-45 | X | Adreno 726 | n/a | n/a | TSMC N4 | n/a | n/a | n/a | LPDDR5X-8448 @ 4224 MHz or LPDDR5X-8533 @ 4266.5 MHz | 125.1 GB/s or 136.5 GB/s | n/a | - The Adreno X1-45 is internally called the Adreno 726, suggesting it’s a scaled-up of the Adreno 725 from the Snapdragon 7+ Gen 2. |
n/a | 4.6 TFLOPS | Qualcomm | GPU | n/a | Adreno | Adreno X1-85 | X | Adreno 741 | n/a | Snapdragon X Plus | TSMC N4 | n/a | n/a | n/a | LPDDR5X-8448 @ 4224 MHz or LPDDR5X-8533 @ 4266.5 MHz | 125.1 GB/s or 136.5 GB/s | n/a | - The Adreno X1-85 is internally called the Adreno 741, suggesting it’s a scaled-up of the Adreno 730 from the Snapdragon 8 Gen 1/8+ Gen 1. |