Decoding AI TOPS: Essential Metrics for AI Chips and TOPS Comparison Chart

(Illustration: A lot of hard work has been going on behind the scenes. Le Bouchon Ogasawara, Shibuya, Tokyo. Image source: Ernest)



tl;dr

  • TOPS (Trillions of Operations Per Second) is a key indicator for measuring the computational power of AI chips and NPU chips, reflecting the trillions of operations a processor can execute per second.
  • Using the “frying eggs” analogy to intuitively understand TOPS: A regular CPU is like a chef who can only fry one egg at a time, while a high TOPS performance AI chip is like a super chef who can fry an incredibly large number of eggs simultaneously.
  • TOPS is an important reference for comparing AI chip performance, but when evaluating AI hardware, factors such as energy efficiency and memory bandwidth should also be considered. Additionally, TOPS values typically reflect theoretical peak performance, and actual performance needs to be judged based on a combination of other metrics suitable for the application scenario.

What is TOPS (Simple Life Version)

TOPS, which stands for Trillions Operations Per Second, is an important indicator for measuring the computational power of Artificial Intelligence (AI) chips or Neural Processing Units (NPUs). TOPS is used to express the maximum number of operations a processor can execute per second, calculated in trillions. In the future, if computational power continues to increase, the initial T may be replaced by other larger units of measurement.

We can use an example from daily life to explain and more intuitively understand TOPS:

Imagine AI computation as the process of frying eggs, where data is the egg being heated.

A regular chef (ordinary processor, CPU) might only be able to fry one egg at a time, while a super chef (AI chip) might be able to fry 1 trillion eggs simultaneously! TOPS is like a measure of this “super chef’s” ability, telling us how many “data eggs” it can “fry” per second.

TOPS is one of the important indicators for understanding and comparing AI chip performance, but it’s not the only one.

When evaluating AI hardware, AI phones, or AI computers, remember to consider other factors such as energy efficiency, memory bandwidth, software ecosystem, etc. Using TOPS can help us compare the computational power of different AI chips, providing a reference point for choosing AI hardware devices suitable for specific applications.


What is TOPS (In-depth Version, for Those Who Insist)

Before delving deeper into understanding TOPS, we need to first understand what an “operation” is:

In digital circuits and computer science, an “operation” typically refers to a basic mathematical or logical computation. For AI chips or NPUs, these operations mainly include:

  1. Floating-point operations: such as addition, subtraction, multiplication, and division.
  2. Matrix operations: Large-scale matrix multiplication is one of the most common operations in deep learning.
  3. Vector operations: including dot product (scalar product), cross product (vector product), etc.
  4. Activation functions: such as ReLU, Sigmoid, Tanh, etc.
  5. Convolution operations: widely used in Convolutional Neural Networks (CNN).

These operations are typically performed in FP32 (32-bit floating-point) or FP16 (16-bit floating-point) formats. Some AI chips also support lower precision formats like INT8 (8-bit integer) to improve performance and reduce energy consumption, typically used for inference.

The calculation of TOPS can be simplified as:

TOPS = (Number of operations per clock cycle) × (Clock frequency) / 1 trillion

For example, if an AI chip can perform 1000 operations per clock cycle and has a clock frequency of 1GHz, then its theoretical peak performance is 1 TOPS.

1000 operations/cycle × 1GHz = 1000 × 10^9 operations/second = 10^12 operations/second = 1 TOPS

When understanding TOPS, please note the following points:

  1. TOPS typically represents theoretical peak performance; actual performance may vary due to factors such as memory bandwidth and chip architecture.
  2. TOPS values may differ for different types of operations (such as FP32, FP16, INT8).
  3. A high TOPS value doesn’t necessarily mean better performance in all AI tasks, as actual performance also depends on software optimization and the characteristics of specific tasks.

TOPS Comparison Table

(Focus mainly on the “INT8 Ops” column. You can swipe left and right to see more comparison data)

INT8 OpsCompany NameTypeTarget MarketProduct FamilyProduct NameProduct GenerationCode NameRelease YearFirst Used OnFP32 FLOPSFab ProcessCPUGPUNPUMemory TechMemory BandwidthTDP BaseRemark
73 TOPSAMDSoCPCRyzen AI 300Ryzen AI 9 365n/aStrix Point2024n/an/aTSMC 4nm FinFETn/aAMD Radeon™ 880Mn/aDDR5-5600 or LPDDR5X-7500n/a28.0- Total 73 TOPS (50 TOPS from NPU).
80 TOPSAMDSoCPCRyzen AI 300Ryzen AI 9 HX 370n/aStrix Point2024n/an/aTSMC 4nm FinFETn/aAMD Radeon™ 890Mn/aDDR5-5600 or LPDDR5X-7500n/a28.0- Total 80 TOPS (50 TOPS from NPU).
50 TOPSAMDNPUn/aRyzenXDNA 2n/aAI2024Ryzen AI 9 HX 370n/an/an/an/an/an/an/an/an/a
n/aARMIPn/aNeoverseNeoverse E1n/an/an/an/an/an/an/an/an/an/an/an/an/a
n/aARMIPn/aNeoverseNeoverse N1n/aAres2019Ampere Altra, AWS Graviton2n/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse N2n/aPerseus2020Microsoft Azure Cobalt 100n/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse N3n/aHermesn/an/an/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse V1n/aZeus2020AWS Graviton3n/an/an/an/an/an/an/an/a- first announcements coming out of Arm’s TechCon convention 2018 in San Jose.
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse V2n/an/a2022NVIDIA Grace, AWS Graviton4, Google Axionn/an/an/an/an/an/an/an/an/a
n/aARMIPDatacenter (Infrastructure Processor)NeoverseNeoverse V3n/aPoseidonn/an/an/an/an/an/an/an/an/an/an/a
825 TOPS ???AlibabaSoCDatacenter (AI inference)Hanguang 含光Hanguang 8001n/a2019n/an/aTSMC 12nmn/an/an/an/an/a280.0- 16x PCIe gen4 - SRAM, No DDR
n/aAlibabaSoCDatacenter (Infra)Yitian 倚天Yitian 7101n/a2021Alibaba ECS g8mn/aN5128 Neoverse N2 coren/an/an/an/an/an/a
n/aAmazonSoCDatacenter (Infra) (Scale out)AWS GravitonGraviton1Alpine2018Amazon EC2 A1n/aTSMC 16nmCortex A72n/an/aDDR4-160051.2 GB/s95.0- 32 lanes of PCIe gen3
n/aAmazonSoCDatacenter (Infra) (General Purpose)AWS GravitonGraviton 22Alpine+2019Amazon EC2 M6g, M6gd, C6g, C6gd, C6gn, R6g, R6gd, T4g, X2gd, G5g, Im4gn, Is4gen, I4gn/aTSMC 7nm128 Neoverse N1 coren/an/aDDR4-3200204.8 GB/s110.0- 64 lanes of PCIe gen4
n/aAmazonSoCDatacenter (Infra) (ML, HPC, SIMD)AWS GravitonGraviton 33n/a2021Amazon EC2 C7g, M7g, R7g; with local disk: C7gd, M7gd, R7gdn/aTSMC 5nm64 Neoverse V1 coren/an/aDDR5-4800307.2 GB/s100.0- 32 lanes of PCIe gen5
n/aAmazonSoCDatacenter (Infra)AWS GravitonGraviton 3E3n/a2022Amazon EC2 C7gn, HPC7gn/an/a64 Neoverse V1 coren/an/an/an/an/an/a
n/aAmazonSoCDatacenter (Infra) (Scale up)AWS GravitonGraviton 44n/a2023Amazon EC2 R8gn/an/a96 Neoverse V2 coren/an/aDDR5-5600537.6 GB/sn/a- 96 lanes of PCIe gen5
63.3 TOPSAmazonSoCDatacenter (AI inference)AWS InferertiaInferertia 11n/a2018Amazon EC2 Inf10.97 TFLOPSTSMC 16nm16 NeuroCore v1n/an/an/a50 GB/sn/an/a
380 TOPSAmazonSoCDatacenter (AI inference)AWS InferertiaInferertia 22n/a2022Amazon EC2 Inf22.9 TFLOPSTSMC 5nm24 NeuroCore v2n/an/an/a820 GB/sn/an/a
380 TOPSAmazonSoCDatacenter (AI train)AWS TrainiumTrainium 11n/a2020Amazon EC2 Trn12.9 TFLOPSTSMC 7nm32 NeuroCore v2n/an/an/a820 GB/sn/an/a
861 TOPSAmazonSoCDatacenter (AI train)AWS TrainiumTrainium 22n/a2023Amazon EC2 Trn26.57 TFLOPSTSMC 4nm64 NeuroCore v2n/an/an/a4,096 GB/sn/an/a
11 TOPSAppleSoCMobileAA14 Bionicn/aAPL1W012020iPhone 12748.8 GFLOPSTSMC N5Firestorm + Icestormn/an/aLPDDR4X-426634.1 GB/sn/an/a
15.8 TOPSAppleSoCMobileAA15 Bionicn/aAPL1W072021iPhone 131.37 TFLOPSTSMC N5PAvalanche + Blizzardn/an/aLPDDR4X-426634.1 GB/sn/an/a
17 TOPSAppleSoCMobileAA16 Bionicn/aAPL1W102022iPhone 141.789 TFLOPSTSMC N4PEverest + Sawtoothn/an/aLPDDR5-640051.2 GB/sn/a- 6GB LPDDR5
35 TOPSAppleSoCMobileAA17 Pron/aAPL1V022023iPhone 15 Pro, iPhone 15 Pro Max2.147 TFLOPSTSMC N3B6 cores (2 performance + 4 efficiency)Apple-designed 6-core16-core Neural EngineLPDDR5-640051.2 GB/sn/a- 8GB LPDDR5
35 TOPSAppleSoCMobileAA18n/an/a2024iPhone 16n/aTSMC N3P6 cores (2 performance + 4 efficiency)Apple-designed 5-core16-core Neural Enginen/an/an/an/a
35 TOPSAppleSoCMobileAA18 Pron/an/a2024iPhone 16 Pron/aTSMC N3P6 cores (2 performance + 4 efficiency)Apple-designed 6-core16-core Neural Enginen/an/an/an/a
11 TOPSAppleSoCMobile, PCMM1n/aAPL11022020n/a2.6 TFLOPSTSMC N5high-performance “Firestorm” + energy-efficient “Icestorm”n/an/aLPDDR4X-426668.3 GB/sn/an/a
11 TOPSAppleSoCMobile, PCMM1 Maxn/aAPL11052021n/a10.4 TFLOPSTSMC N5n/an/an/aLPDDR5-6400409.6 GB/sn/an/a
11 TOPSAppleSoCMobile, PCMM1 Pron/aAPL11032021n/an/aTSMC N5n/an/an/aLPDDR5-6400204.8 GB/sn/an/a
22 TOPSAppleSoCMobile, PCMM1 Ultran/aAPL1W062022n/a21 TFLOPSTSMC N5The M1 Ultra consists of two M1 Max units connected with UltraFusion Interconnect with a total of 20 CPU cores and 96 MB system level cache (SLC).n/an/aLPDDR5-6400819.2 GB/sn/an/a
15.8 TOPSAppleSoCMobile, PCMM2n/aAPL11092022n/a2.863 TFLOPS, 3.578 TFLOPSTSMC N5Phigh-performance @3.49 GHz “Avalanche” + energy-efficient @2.42 GHz “Blizzard”n/an/aLPDDR5-6400102.4 GB/sn/an/a
15.8 TOPSAppleSoCMobile, PCMM2 Maxn/aAPL11112023n/a10.736 TFLOPS, 13.599 TFLOPSTSMC N5Pn/an/an/aLPDDR5-6400409.6 GB/sn/an/a
15.8 TOPSAppleSoCMobile, PCMM2 Pron/aAPL11132023n/a5.726 TFLOPS, 6.799 TFLOPSTSMC N5Pn/an/an/aLPDDR5-6400204.8 GB/sn/an/a
31.6 TOPSAppleSoCMobile, PCMM2 Ultran/aAPL1W122023n/a21.473 TFLOPS, 27.199 TFLOPSTSMC N5Pn/an/an/aLPDDR5-6400819.2 GB/sn/an/a
18 TOPSAppleSoCMobile, PCMM3n/aAPL12012023MacBook Pro2.826 TFLOPS, 3.533 TFLOPSTSMC N3Bn/an/an/aLPDDR5-6400102.4 GB/sn/an/a
18 TOPSAppleSoCMobile, PCMM3 Maxn/aAPL12042023n/a10.598 TFLOPS, 14.131 TFLOPSTSMC N3Bn/an/an/aLPDDR5-6400307.2 GB/s, 409.6 GB/sn/an/a
18 TOPSAppleSoCMobile, PCMM3 Pron/aAPL12032023n/a4.946 TFLOPS, 6.359 TFLOPSTSMC N3Bn/an/an/aLPDDR5-6400153.6 GB/sn/an/a
38 TOPSAppleSoCMobile, PCMM4n/aAPL12062024iPad Pro (7th generation)3.763 TFLOPSTSMC N3E10 cores (4 performance + 6 efficiency)Apple-designed 10-core16-core Neural EngineLPDDR5X-7500120 GB/sn/an/a
n/aGoogleSoCDatacenter (Infra)GCP CPUAxionn/aAxion2024GCP Compute Engine ???n/an/a?? Neoverse V2 coren/an/an/an/an/an/a
1.6 TOPSGoogleSoCMobileGoogle Tensor (Edge TPU)G11Whitechapel2021Pixel 6, Pixel 6 Pro, Pixel 6an/aSamsung 5 nm LPEOcta-core: 2.8 GHz Cortex-X1 (2×) 2.25 GHz Cortex-A76 (2×) 1.8 GHz Cortex-A55 (4×)Mali-G78 MP20 at 848 MHzGoogle Edge TPULPDDR551.2 GB/sn/an/a
n/aGoogleSoCMobileGoogle Tensor (Edge TPU)G22Cloudripper2022Pixel 7, Pixel 7 Pro, Pixel 7a, Pixel Fold, Pixel Tabletn/aSamsung 5 nm LPEOcta-core: 2.85 GHz Cortex-X1 (2×) 2.35 GHz Cortex-A78 (2×) 1.8 GHz Cortex-A55 (4×)Mali-G710 MP7 at 850 MHzGoogle Edge TPULPDDR551.2 GB/sn/an/a
27 TOPSGoogleSoCMobileGoogle Tensor (Edge TPU)G33Zuma (Dev Board: Ripcurrent)2023Pixel 8, Pixel 8 Pro, Pixel 8an/aSamsung 4nm LPPNona-core: 2.91 GHz Cortex-X3 (1×) 2.37 GHz Cortex-A715 (4×) 1.7 GHz Cortex-A510 (4×)Mali-G715 MP10 at 890 MHzGoogle Edge TPU (Rio)LPDDR5X68.2 GB/sn/an/a
45 TOPSGoogleSoCMobileGoogle Tensor (Edge TPU)G44Zuma Pro2024Pixel 9, Pixel 9 Pron/aSamsung 4nm LPPOcta-core: 3.1 GHz Cortex-X4 (1×) 2.6 GHz Cortex-A720 (3×) 1.92 GHz Cortex-A520 (4×)Mali-G715 MP10 at 940 MHzn/aLPDDR5Xn/an/a- 8Gen3 = 45 TOPS, D9300 = 48 TOPS
n/aGoogleSoCMobileGoogle Tensor (Edge TPU)G55Laguna Beach (Dev Board: Deepspace)2025Pixel 10, Pixel 10 Pron/aTSMC N3 + InFO-POP packagingn/an/an/an/an/an/an/a
23 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv11n/a2015n/an/a28nmn/an/an/aDDR3-213334 GB/s75.0- The core of TPU: Systolic Array - Matrix Multiply Unit (MXU): a big systolic array - PCIe Gen3 x16
45 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv22n/a2017n/an/a16nmn/an/an/an/a600 GB/s280.0- 16GB HBM - BF16
123 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv33n/a2018n/an/a16nmn/an/an/an/a900 GB/s220.0n/a
275 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv44n/a2021n/an/a7nmn/an/an/an/a1,200 GB/s170.0- 32GB HBM2
393 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv5e5n/a2023n/an/an/an/an/an/an/a819 GB/sn/an/a
918 TOPSGoogleSoCDatacenter (AI inference)TPUTPUv5p5n/a2023n/an/an/an/an/an/an/a2,765 GB/sn/an/a
n/aGoogleSoCDatacenter (AI inference)TPUTPUv6? Trillium?6n/a2024n/an/an/an/an/an/an/an/an/an/a
n/aIntelSoCHP Mobile, PCn/an/an/aArrow Laken/an/an/an/an/an/an/an/an/an/an/a
120 TOPSIntelSoCLP MobileCore UltraCore UltraSeries 2Lunar Lake2024n/an/aTSMC N3B (Compute tile), TSMC N6 (Platform controoler tile)P-core: Lion Cove E-core: SkymontXe2NPU 4n/an/an/a- Total 120 TOPS (48 TOPS from NPU 4 + 67 TOPS from GPU + 5 TOPS from CPU).
34 TOPSIntelSoCMobileCore UltraCore UltraSeries 1Meteor Lake2023n/an/aIntel 4 (7nm EUV, Compute tile), TSMC N5 (Graphics tile), TSMC N6 (Soc tile, I/O extender tile)P-core: Redwood Cove E-core: CrestmontXe-LPGNPU 3720n/an/an/a- Total 34 TOPS (11 TOPS from NPU + 18 TOPS from GPU + 5 TOPS from CPU).
0.5 TOPSIntelNPUn/an/aNPU 11n/a2018n/an/an/an/an/an/an/an/an/an/a
7 TOPSIntelNPUn/an/aNPU 22n/a2021n/an/an/an/an/an/an/an/an/an/a
11.5 TOPSIntelNPUn/an/aNPU 33n/a2023n/an/an/an/an/an/an/an/an/an/a
48 TOPSIntelNPUn/an/aNPU 44n/a2024Lunar Laken/an/an/an/an/an/an/an/an/a
n/aMicrosoftSoCDatacenter (Infra)Azure CobaltCobalt 1001n/a2024Azure VM Dpsv6, Dplsv6, Epsv6n/an/a128 Neoverse V2 coren/an/aLPDDR5 ???n/an/a- PCIe gen5 - CXL 1.1 - from project start to silicon in 13 months.
1,600 TOPSMicrosoftSoCDatacenter (AI inference)Azure MaiaMaia 1001n/a2024Microsoft Copilotn/aTSMC N5 + CoWoS-Sn/an/an/an/a18,000 GB/s ???500.0- 32Gb/s PCIe gen5x8 - Design to TDP = 700W - Provision TDP = 500W
45 TOPSQualcommSoCPCSnapdragon XSnapdragon X Eliten/an/a2023n/a4.6 TFLOPSTSMC N4OryonAdreno X1HexagonLPDDR5X-8448 @ 4224 MHz135 GB/sn/a- Total 75 TOPS (45 TOPS from NPU).
45 TOPSQualcommSoCPCSnapdragon XSnapdragon X Plusn/an/a2024n/a3.8 TFLOPSTSMC N4OryonAdreno X1-45 1107 MHz (1.7 TFLOPS) Adreno X1-45 (2.1 TFLOPS) Adreno X1-85 1250 MHz (3.8 TFLOPS)HexagonLPDDR5X-8448 @ 4224 MHz135 GB/sn/an/a
45 TOPSQualcommNPUn/aHexagonHexagonn/an/an/aSnapdragon X Plusn/an/an/an/an/an/an/an/a- Hexagon is the brand name for a family of digital signal processor (DSP) and later neural processing unit (NPU) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.”
n/aQualcommGPUn/aAdrenoAdreno X1n/an/an/aSnapdragon X Plus4.6 TFLOPSTSMC N4n/an/an/aLPDDR5X-8448 @ 4224 MHz or LPDDR5X-8533 @ 4266.5 MHz125.1 GB/s or 136.5 GB/sn/an/a

Reference

Loading comments…