This is Jon from Beyond GTA. You probably remember one of my last blogs which I wrote about [AMD EPYC]. Well, this time I would like to share an interesting topic that was written by Kanbara Yuta who works in our HQ in Japan also my fellow. He has compared pros and cons of AMD EPYC with Intel Xeon by benchmarking them.
I will share the whole scenario and how it went together with its summary.
AMD EPYC on GCP Compute Engine
In Google Compute Engine, EPYC is available for multiple machine types, such as:
Tau T2D VM
For benchmarking, he used the N2D VM machine, which has the AMD EPYC and N2 VM that uses Intel Xeon as a CPU.
Selecting an Instance
The first step is to select an instance for benchmarking.
The machine type of the EPYC instance is N2D VM, with the following specs and configuration shown below the table. EPYC has different generations, and according to the official documentation, the one available for N2D VM is Rome, the second-generation EPYC. The latest EPYC which is 3rd generation called Milan.
AMD EPYC Rome
20 GB persistent disk
The opposing Intel Xeon instance is an N2 VM with the following specs and configuration shown in the next table. The Xeon generation seems to be either called Ice Lake or Cascade Lake, but either way, it is a fairly new generation.
Intel Xeon (Ice Lake or Cascade Lake)
20 GB persistent disk
Confirmation of CPU for each instance.
Once the instance is up and running, you can log in using SSH.
GCP has SSH integrated into the console, so you can easily log in to the server from a browser, which is very convenient. If you are using instances for a temporary purpose like this, you don't need to worry about preparing a key.
After logging in via SSH, let's start typing lscpu. First is the EPYC instance.
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7B12 Stepping: 0 CPU MHz: 2249.998 BogoMIPS: 4499.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save umip
The Model name is AMD EPYC 7B12. You can also see the cache structure.
The architecture is of course x86_64, which is compatible with Intel Xeon and instruction sets. SIMD instructions are compatible with up to AVX2. However, since the SIMD instruction itself has limited usage opportunities for general server applications, this should be enough.
Next is the result of an instance running on a Xeon.
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) CPU @ 2.80GHz Stepping: 7 CPU MHz: 2800.252 BogoMIPS: 5600.50 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear spec_ctrl intel_stibp arch_capabilities
The model name is Intel(R) Xeon(R) CPU @ 2.80GHz, and the specific model number seems to be masked. Also, the avx512 series flag has been added to Flags, indicating that the CPU supports AVX512 (512-bit wide SIMD instructions).
Since the EPYC instance supported up to AVX2 (256-bit wide SIMD instructions), it is likely that high performance will be demonstrated in processes (and programs) that can use SIMD instructions, such as video encoding and image conversion.
Preparing for Benchmarking
We will be using UnixBench as our benchmark software.
For the explanation of UnixBench and the installation procedure, IDCF has published a very easy-to-understand article, you can check this out as well.
Both EPYC and Xeon instances are launched from the same image (centos-7-v20211105), and yum -y update is performed once before running the benchmark.
The kernel, GCC, and UnixBench versions of the execution environment are as follows.
4.8.5 20150623 (Red Hat 4.8.5-44)
If you run UnixBench with no arguments, it will benchmark with parallel number 1 and then continue with the parallel number of logical CPUs (twice in total). In this case, we have specified 2vCPU as the instance specification, so the number of logical CPUs is 2, which means the number of parallelism is also 2. In the next section, we will show both the results of parallelism 1 and parallelism 2.
The blue bars are the results for the Xeon instance, and the orange bars are the results for the EPYC instance, with higher values indicating better results.
The System Benchmarks Index Score (Overall) at the bottom is the total benchmark score, while the other items are the results of individual tests.
Looking at the results, we can see that the AMD instance performed better overall. The total score of the Xeon instance is 1096.6, while the AMD instance is 1288.2, which is a significant difference. Looking at the individual items, it seems that the System Call Overhead and File Copy items show particularly high performance.
System Call Overhead and File Copy are important indicators for server applications, so it is great to see the high performance in these items.
Next are the results for parallel number 2
The same as Parallel Number 1, the blue bar is the Xeon instance and the orange bar is the EPYC instance.
Again, the EPYC instance is dominant. Overall, the EPYC instance performs better than the Xeon instance. The EPYC instance has a total score of 2253.1 while the Xeon instance has a score of 1545.4.
The Xeon instance with parallel number 2 has a score about 1.4 times higher than that with parallel number 1, while the EPYC instance with parallel number 2 has a score about 1.75 times higher than that with parallel number 1. This is just the result of running a single benchmark software called UnixBench, but it seems that EPYC instances tend to be more efficient in parallelization than Xeon instances.
Although not directly related to the benchmark, EPYC instances are less expensive than Xeon instances. With the configuration and specs used in this benchmark, a Xeon instance costs about $74/month, while an EPYC instance costs about $64/month.
We launched an AMD EPYC instance and an Intel Xeon instance in GCP's Compute Engine and benchmarked them in UnixBench for comparison.
In this benchmark, we found that the EPYC instance had superior performance. It also has excellent cost performance, and we hope you will consider it when you build a server with GCP.
The Intel Xeon instance did not perform well in this benchmark, but it is possible that other benchmarks will show different results. The Intel Xeon has AVX 512, which is a feature that EPYC does not have, so you should be able to choose the right instance for your workload on a case-by-case basis.
Credit goes to Kanbara Yuta who is the original writer of the post in our HQ website.