拆解 AI TOPS:人工智慧晶片關鍵指標與 TOPS 晶片比較表

(圖說:美味的背後有多少辛苦的前置準備? 拍攝於 Le Bouchon Ogasawara 餐廳,渋谷,東京。圖片來源:Ernest。)



摘要 tl;dr

  • TOPS (Trillions of Operations Per Second) 是衡量 AI 晶片和 NPU 晶片運算能力的關鍵指標,反映處理器每秒可執行的萬億次運算數。
  • 用「煎蛋」類比舉例直觀理解 TOPS:普通 CPU 如同時只能煎一個蛋的廚師,而高 TOPS 效能的 AI 晶片則是能同時煎超級無敵多個蛋的超級廚師。
  • TOPS 是比較 AI 晶片性能的重要參考,但評估 AI 硬體時還需考慮能源效率、記憶體頻寬等因素,且 TOPS 值通常反映理論峰值,實際性能還需要綜合其他指標數據作為適合應用場景的判斷。

什麼是 TOPS (簡單生活版)

TOPS,全稱為 Trillions Operations Per Second (每秒兆次運算、每秒萬億次運算),是衡量人工智慧 (AI) 晶片或神經網路處理器 (NPU) 計算能力的重要指標。TOPS 用來表示一個處理器最高每秒能夠執行的運算數量,以萬億次來計算。未來如果運算能力繼續提升,則開頭的 T 將會換成其他更大的計量單位。

我們可以用日常生活中的例子來解釋,以更直觀地理解 TOPS

想像 AI 運算是煎蛋過程數據是那個被加熱的蛋

一個普通的廚師 (普通處理器、CPU) 可能一次只能煎一個蛋,而一個超級廚師 (AI 晶片) 可能同時煎 1 兆個蛋!TOPS 就像是衡量這個「超級廚師」能力的指標,告訴我們它每秒能「煎」多少個「數據蛋」。

TOPS 是理解和比較 AI 晶片性能的重要指標之一,但不是唯一。

在評估 AI 硬體或 AI 手機、AI 電腦時,請記得同時考慮其他因素,如能源使用效率、記憶體頻寬、軟體生態系統等。使用 TOPS 可以幫助我們比較不同 AI 晶片的計算能力,為選擇適合特定應用的 AI 硬體設備提供一個參考點。


什麼是 TOPS (硬要深入版)

在深入理解 TOPS 之前,我們需要先明白什麼是「operation」 (運算) :

在數位電路和計算機科學中,一個「operation」通常指的是一個基本的數學或邏輯運算。對於 AI 晶片或 NPU 來說,這些運算主要包括:

  1. 浮點運算:如加法、減法、乘法和除法。
  2. 矩陣運算:大規模的矩陣乘法是深度學習中最常見的運算之一。
  3. 向量運算:包括內積 (數量積)、外積 (向量積) 等。
  4. 激勵函數:如 ReLU、Sigmoid、Tanh 等。
  5. 卷積運算:在卷積神經網路 (CNN) 中廣泛使用。

這些運算通常以 FP32 (32 位元浮點數) 或 FP16 (16 位元浮點數) 格式進行。有些 AI 晶片還支持 INT8 (8 位元整數) 等低精度格式,以提高效能和降低能耗,通常用於推理。

TOPS 的計算方式可以簡化為:

TOPS = (每個時鐘周期的運算數) × (時鐘頻率) / 1 兆

例如,如果一個 AI 晶片在每個時鐘周期可以執行 1000 個運算,時鐘頻率為 1GHz,那麼它的理論峰值性能就是 1 TOPS。

1000 運算/周期×1GHz = 1000×10^9 運算/秒 = 10^12 運算/秒 = 1 TOPS

理解 TOPS 時請注意以下幾點:

  1. TOPS 通常表示理論峰值性能,實際性能可能會因為記憶體頻寬、晶片架構等因素而有所不同。
  2. 不同類型的運算 (如 FP32、FP16、INT8) 的 TOPS 值可能會不同。
  3. TOPS 值高不一定意味著在所有 AI 任務上都有更好的表現,因為實際性能還取決於軟體優化、特定任務的特性等。

TOPS 比較表

(主要可看「INT8 Ops」欄位。可以左右滑動看到更多比較數據)

INT8 OpsCompany NameTypeTarget MarketProduct FamilyProduct NameProduct GenerationCode NameRelease YearFirst Used OnFP32 FLOPSFab ProcessCPUGPUNPUMemory TechMemory BandwidthTDP BaseRemark
73 TOPSAMDSoCPCRyzen AI 300Ryzen AI 9 365n/aStrix Point2024n/an/aTSMC 4nm FinFETn/aAMD Radeon™ 880Mn/aDDR5-5600 or LPDDR5X-7500n/a28.0- Total 73 TOPS (50 TOPS from NPU).
80 TOPSAMDSoCPCRyzen AI 300Ryzen AI 9 HX 370n/aStrix Point2024n/an/aTSMC 4nm FinFETn/aAMD Radeon™ 890Mn/aDDR5-5600 or LPDDR5X-7500n/a28.0- Total 80 TOPS (50 TOPS from NPU).
50 TOPSAMDNPUn/aRyzenXDNA 2n/aAI2024Ryzen AI 9 HX 370n/an/an/an/an/an/an/an/an/a
n/aARMIPn/aNeoverseNeoverse E1n/an/an/an/an/an/an/an/an/an/an/an/an/a
n/aARMIPn/aNeoverseNeoverse N1n/aAres2019Ampere Altra, AWS Graviton2n/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse N2n/aPerseus2020Microsoft Azure Cobalt 100n/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse N3n/aHermesn/an/an/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse V1n/aZeus2020AWS Graviton3n/an/an/an/an/an/an/an/a- first announcements coming out of Arm’s TechCon convention 2018 in San Jose.
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse V2n/an/a2022NVIDIA Grace, AWS Graviton4, Google Axionn/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse V3n/aPoseidonn/an/an/an/an/an/an/an/an/an/an/a
825 TOPS ???AlibabaSoCDatacenter (AI inference)Hanguang 含光Hanguang 8001n/a2019n/an/aTSMC 12nmn/an/an/an/an/a280.0- 16x PCIe gen4 - SRAM, No DDR
n/aAlibabaSoCDatacenter (Infra)Yitian 倚天Yitian 7101n/a2021Alibaba ECS g8mn/aN5128 Neoverse N2 coren/an/an/an/an/an/a
n/aAmazonSoCDatacenter (Infra) (Scale out)AWS GravitonGraviton1Alpine2018Amazon EC2 A1n/aTSMC 16nmCortex A72n/an/aDDR4-160051.2 GB/s95.0- 32 lanes of PCIe gen3
n/aAmazonSoCDatacenter (Infra) (General Purpose)AWS GravitonGraviton 22Alpine+2019Amazon EC2 M6g, M6gd, C6g, C6gd, C6gn, R6g, R6gd, T4g, X2gd, G5g, Im4gn, Is4gen, I4gn/aTSMC 7nm128 Neoverse N1 coren/an/aDDR4-3200204.8 GB/s110.0- 64 lanes of PCIe gen4
n/aAmazonSoCDatacenter (Infra) (ML, HPC, SIMD)AWS GravitonGraviton 33n/a2021Amazon EC2 C7g, M7g, R7g; with local disk: C7gd, M7gd, R7gdn/aTSMC 5nm64 Neoverse V1 coren/an/aDDR5-4800307.2 GB/s100.0- 32 lanes of PCIe gen5
n/aAmazonSoCDatacenter (Infra)AWS GravitonGraviton 3E3n/a2022Amazon EC2 C7gn, HPC7gn/an/a64 Neoverse V1 coren/an/an/an/an/an/a
n/aAmazonSoCDatacenter (Infra) (Scale up)AWS GravitonGraviton 44n/a2023Amazon EC2 R8gn/an/a96 Neoverse V2 coren/an/aDDR5-5600537.6 GB/sn/a- 96 lanes of PCIe gen5
63.3 TOPSAmazonSoCDatacenter (AI inference)AWS InferertiaInferertia 11n/a2018Amazon EC2 Inf10.97 TFLOPSTSMC 16nm16 NeuroCore v1n/an/an/a50 GB/sn/an/a
380 TOPSAmazonSoCDatacenter (AI inference)AWS InferertiaInferertia 22n/a2022Amazon EC2 Inf22.9 TFLOPSTSMC 5nm24 NeuroCore v2n/an/an/a820 GB/sn/an/a
380 TOPSAmazonSoCDatacenter (AI train)AWS TrainiumTrainium 11n/a2020Amazon EC2 Trn12.9 TFLOPSTSMC 7nm32 NeuroCore v2n/an/an/a820 GB/sn/an/a
861 TOPSAmazonSoCDatacenter (AI train)AWS TrainiumTrainium 22n/a2023Amazon EC2 Trn26.57 TFLOPSTSMC 4nm64 NeuroCore v2n/an/an/a4,096 GB/sn/an/a
11 TOPSAppleSoCMobileAA14 Bionicn/aAPL1W012020iPhone 12748.8 GFLOPSTSMC N5Firestorm + Icestormn/an/aLPDDR4X-426634.1 GB/sn/an/a
15.8 TOPSAppleSoCMobileAA15 Bionicn/aAPL1W072021iPhone 131.37 TFLOPSTSMC N5PAvalanche + Blizzardn/an/aLPDDR4X-426634.1 GB/sn/an/a
17 TOPSAppleSoCMobileAA16 Bionicn/aAPL1W102022iPhone 141.789 TFLOPSTSMC N4PEverest + Sawtoothn/an/aLPDDR5-640051.2 GB/sn/a- 6GB LPDDR5
35 TOPSAppleSoCMobileAA17 Pron/aAPL1V022023iPhone 15 Pro, iPhone 15 Pro Max2.147 TFLOPSTSMC N3B6 cores (2 performance + 4 efficiency)Apple-designed 6-core16-core Neural EngineLPDDR5-640051.2 GB/sn/a- 8GB LPDDR5
35 TOPSAppleSoCMobileAA18n/an/a2024iPhone 16n/aTSMC N3P6 cores (2 performance + 4 efficiency)Apple-designed 5-core16-core Neural Enginen/an/an/an/a
35 TOPSAppleSoCMobileAA18 Pron/an/a2024iPhone 16 Pron/aTSMC N3P6 cores (2 performance + 4 efficiency)Apple-designed 6-core16-core Neural Enginen/an/an/an/a
11 TOPSAppleSoCMobile, PCMM1n/aAPL11022020n/a2.6 TFLOPSTSMC N5high-performance “Firestorm” + energy-efficient “Icestorm”n/an/aLPDDR4X-426668.3 GB/sn/an/a
11 TOPSAppleSoCMobile, PCMM1 Maxn/aAPL11052021n/a10.4 TFLOPSTSMC N5n/an/an/aLPDDR5-6400409.6 GB/sn/an/a
11 TOPSAppleSoCMobile, PCMM1 Pron/aAPL11032021n/an/aTSMC N5n/an/an/aLPDDR5-6400204.8 GB/sn/an/a
22 TOPSAppleSoCMobile, PCMM1 Ultran/aAPL1W062022n/a21 TFLOPSTSMC N5The M1 Ultra consists of two M1 Max units connected with UltraFusion Interconnect with a total of 20 CPU cores and 96 MB system level cache (SLC).n/an/aLPDDR5-6400819.2 GB/sn/an/a
15.8 TOPSAppleSoCMobile, PCMM2n/aAPL11092022n/a2.863 TFLOPS, 3.578 TFLOPSTSMC N5Phigh-performance @3.49 GHz “Avalanche” + energy-efficient @2.42 GHz “Blizzard”n/an/aLPDDR5-6400102.4 GB/sn/an/a
15.8 TOPSAppleSoCMobile, PCMM2 Maxn/aAPL11112023n/a10.736 TFLOPS, 13.599 TFLOPSTSMC N5Pn/an/an/aLPDDR5-6400409.6 GB/sn/an/a
15.8 TOPSAppleSoCMobile, PCMM2 Pron/aAPL11132023n/a5.726 TFLOPS, 6.799 TFLOPSTSMC N5Pn/an/an/aLPDDR5-6400204.8 GB/sn/an/a
31.6 TOPSAppleSoCMobile, PCMM2 Ultran/aAPL1W122023n/a21.473 TFLOPS, 27.199 TFLOPSTSMC N5Pn/an/an/aLPDDR5-6400819.2 GB/sn/an/a
18 TOPSAppleSoCMobile, PCMM3n/aAPL12012023MacBook Pro2.826 TFLOPS, 3.533 TFLOPSTSMC N3Bn/an/an/aLPDDR5-6400102.4 GB/sn/an/a
18 TOPSAppleSoCMobile, PCMM3 Maxn/aAPL12042023n/a10.598 TFLOPS, 14.131 TFLOPSTSMC N3Bn/an/an/aLPDDR5-6400307.2 GB/s, 409.6 GB/sn/an/a
18 TOPSAppleSoCMobile, PCMM3 Pron/aAPL12032023n/a4.946 TFLOPS, 6.359 TFLOPSTSMC N3Bn/an/an/aLPDDR5-6400153.6 GB/sn/an/a
38 TOPSAppleSoCMobile, PCMM4n/aAPL12062024iPad Pro (7th generation)3.763 TFLOPSTSMC N3E10 cores (4 performance + 6 efficiency)Apple-designed 10-core16-core Neural EngineLPDDR5X-7500120 GB/sn/an/a
n/aGoogleSoCDatacenter (Infra)GCP CPUAxionn/aAxion2024GCP Compute Engine ???n/an/a?? Neoverse V2 coren/an/an/an/an/an/a
1.6 TOPSGoogleSoCMobileGoogle Tensor (Edge TPU)G11Whitechapel2021Pixel 6, Pixel 6 Pro, Pixel 6an/aSamsung 5 nm LPEOcta-core: 2.8 GHz Cortex-X1 (2×) 2.25 GHz Cortex-A76 (2×) 1.8 GHz Cortex-A55 (4×)Mali-G78 MP20 at 848 MHzGoogle Edge TPULPDDR551.2 GB/sn/an/a
n/aGoogleSoCMobileGoogle Tensor (Edge TPU)G22Cloudripper2022Pixel 7, Pixel 7 Pro, Pixel 7a, Pixel Fold, Pixel Tabletn/aSamsung 5 nm LPEOcta-core: 2.85 GHz Cortex-X1 (2×) 2.35 GHz Cortex-A78 (2×) 1.8 GHz Cortex-A55 (4×)Mali-G710 MP7 at 850 MHzGoogle Edge TPULPDDR551.2 GB/sn/an/a
27 TOPSGoogleSoCMobileGoogle Tensor (Edge TPU)G33Zuma (Dev Board: Ripcurrent)2023Pixel 8, Pixel 8 Pro, Pixel 8an/aSamsung 4nm LPPNona-core: 2.91 GHz Cortex-X3 (1×) 2.37 GHz Cortex-A715 (4×) 1.7 GHz Cortex-A510 (4×)Mali-G715 MP10 at 890 MHzGoogle Edge TPU (Rio)LPDDR5X68.2 GB/sn/an/a
45 TOPSGoogleSoCMobileGoogle Tensor (Edge TPU)G44Zuma Pro2024Pixel 9, Pixel 9 Pron/aSamsung 4nm LPPOcta-core: 3.1 GHz Cortex-X4 (1×) 2.6 GHz Cortex-A720 (3×) 1.92 GHz Cortex-A520 (4×)Mali-G715 MP10 at 940 MHzn/aLPDDR5Xn/an/a- 8Gen3 = 45 TOPS, D9300 = 48 TOPS
n/aGoogleSoCMobileGoogle Tensor (Edge TPU)G55Laguna Beach (Dev Board: Deepspace)2025Pixel 10, Pixel 10 Pron/aTSMC N3 + InFO-POP packagingn/an/an/an/an/an/an/a
23 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv11n/a2015n/an/a28nmn/an/an/aDDR3-213334 GB/s75.0- The core of TPU: Systolic Array - Matrix Multiply Unit (MXU): a big systolic array - PCIe Gen3 x16
45 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv22n/a2017n/an/a16nmn/an/an/an/a600 GB/s280.0- 16GB HBM - BF16
123 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv33n/a2018n/an/a16nmn/an/an/an/a900 GB/s220.0n/a
275 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv44n/a2021n/an/a7nmn/an/an/an/a1,200 GB/s170.0- 32GB HBM2
393 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv5e5n/a2023n/an/an/an/an/an/an/a819 GB/sn/an/a
918 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv5p5n/a2023n/an/an/an/an/an/an/a2,765 GB/sn/an/a
n/aGoogleSoCDatacenter (AI inference)TPUTPUv6? Trillium?6n/a2024n/an/an/an/an/an/an/an/an/an/a
n/aIntelSoCHP Mobile, PCn/an/an/aArrow Laken/an/an/an/an/an/an/an/an/an/an/a
120 TOPSIntelSoCLP MobileCore UltraCore UltraSeries 2Lunar Lake2024n/an/aTSMC N3B (Compute tile), TSMC N6 (Platform controoler tile)P-core: Lion Cove E-core: SkymontXe2NPU 4n/an/an/a- Total 120 TOPS (48 TOPS from NPU 4 + 67 TOPS from GPU + 5 TOPS from CPU).
34 TOPSIntelSoCMobileCore UltraCore UltraSeries 1Meteor Lake2023n/an/aIntel 4 (7nm EUV, Compute tile), TSMC N5 (Graphics tile), TSMC N6 (Soc tile, I/O extender tile)P-core: Redwood Cove E-core: CrestmontXe-LPGNPU 3720n/an/an/a- Total 34 TOPS (11 TOPS from NPU + 18 TOPS from GPU + 5 TOPS from CPU).
0.5 TOPSIntelNPUn/an/aNPU 11n/a2018n/an/an/an/an/an/an/an/an/an/a
7 TOPSIntelNPUn/an/aNPU 22n/a2021n/an/an/an/an/an/an/an/an/an/a
11.5 TOPSIntelNPUn/an/aNPU 33n/a2023n/an/an/an/an/an/an/an/an/an/a
48 TOPSIntelNPUn/an/aNPU 44n/a2024Lunar Laken/an/an/an/an/an/an/an/an/a
n/aMicrosoftSoCDatacenter (Infra)Azure CobaltCobalt 1001n/a2024Azure VM Dpsv6, Dplsv6, Epsv6n/an/a128 Neoverse V2 coren/an/aLPDDR5 ???n/an/a- PCIe gen5 - CXL 1.1 - from project start to silicon in 13 months.
1,600 TOPSMicrosoftSoCDatacenter (AI inference)Azure MaiaMaia 1001n/a2024Microsoft Copilotn/aTSMC N5 + CoWoS-Sn/an/an/an/a18,000 GB/s ???500.0- 32Gb/s PCIe gen5x8 - Design to TDP = 700W - Provision TDP = 500W
45 TOPSQualcommSoCPCSnapdragon XSnapdragon X Eliten/an/a2023n/a4.6 TFLOPSTSMC N4OryonAdreno X1HexagonLPDDR5X-8448 @ 4224 MHz135 GB/sn/a- Total 75 TOPS (45 TOPS from NPU).
45 TOPSQualcommSoCPCSnapdragon XSnapdragon X Plusn/an/a2024n/a3.8 TFLOPSTSMC N4OryonAdreno X1-45 1107 MHz (1.7 TFLOPS) Adreno X1-45 (2.1 TFLOPS) Adreno X1-85 1250 MHz (3.8 TFLOPS)HexagonLPDDR5X-8448 @ 4224 MHz135 GB/sn/an/a
45 TOPSQualcommNPUn/aHexagonHexagonn/an/an/aSnapdragon X Plusn/an/an/an/an/an/an/an/a- Hexagon is the brand name for a family of digital signal processor (DSP) and later neural processing unit (NPU) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.”
n/aQualcommGPUn/aAdrenoAdreno X1n/an/an/aSnapdragon X Plus4.6 TFLOPSTSMC N4n/an/an/aLPDDR5X-8448 @ 4224 MHz or LPDDR5X-8533 @ 4266.5 MHz125.1 GB/s or 136.5 GB/sn/an/a

參考資料 Reference

Loading comments…