M5aが東京リージョンでもリリースされました。
M5aとはAMD製CPUを使用したインスタンスですけど既存のIntel製CPUと比較してどれくらいなものかきになったので性能比較してみました。
今回の比較対象
・M5.large
・M5a.large
準備
定番「UnixBench」を使います。
1 . モジュールをインストール
yum install -y wget gcc make perl perl-Time-HiRes patch
2 . UnixBenchのパッケージファイルダウンロードと展開
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/byte-unixbench/UnixBench5.1.3.tgz
tar xfz UnixBench5.1.3.tgz
3 . パッチ適応(デフォルトでは16コア以上の測定ができないため。(本検証では不要ですけども))
cd UnixBench/
wget http://storage.googleapis.com/google-code-attachments/byte-unixbench/issue-4/comment-1/fix-limitation.patch
patch Run fix-limitation.patch
UnixBenchのインストール完了
計測
以下コマンドで計測
./Run -i 5
比較
まずはCPU情報の比較(2コアなので2つ分それぞれ出力されてます)
M5.large
# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz stepping : 4 microcode : 0x200005a cpu MHz : 3110.790 cache size : 33792 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5000.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz stepping : 4 microcode : 0x200005a cpu MHz : 3112.951 cache size : 33792 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5000.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
M5a.large
# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD EPYC 7571 stepping : 2 microcode : 0x8001227 cpu MHz : 2664.885 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips : 4399.73 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 1 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD EPYC 7571 stepping : 2 microcode : 0x8001227 cpu MHz : 2538.409 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips : 4399.73 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management:
実行結果
1CPUの場合
項目 | M5インスタンス | M5aインスタンス | 備考 |
---|---|---|---|
Dhrystone 2 using register variables | 3234.1 | 2945.1 | 整数演算処理 |
Double-Precision Whetstone | 852.8 | 763.3 | 浮動小数点数演算処理 |
Execl Throughput | 1211.1 | 1107.7 | 関数の呼び出し性能 |
File Copy 1024 bufsize 2000 maxblocks | 2081.6 | 2043.5 | ファイルコピー(バッファサイズ1024バイト) |
File Copy 256 bufsize 500 maxblocks | 1313.4 | 1324.5 | ファイルのコピー(バッファサイズ254バイト) |
File Copy 4096 bufsize 8000 maxblocks | 4492.7 | 3980.8 | ファイルのコピー(バッファサイズ4096バイト) |
Pipe Throughput | 912.1 | 1165 | パイプ処理のスループット |
Pipe-based Context Switching | 196.9 | 122.7 | パイプベースのコンテキストのスイッチング処理 |
Process Creation | 1048.6 | 806.5 | プロセスのフォーク処理 |
Shell Scripts (1 concurrent) | 2032.7 | 1779.7 | システムコールでのオーバーヘッド |
Shell Scripts (8 concurrent) | 2316.3 | 2281.5 | 単体シェルスクリプト処理 |
System Call Overhead | 493 | 1106 | 8個でのシェルスクリプト並列処理 |
項目 | M5インスタンス | M5aインスタンス |
---|---|---|
System Benchmarks Index Score | 1272 | 1245.2 |
2CPUの場合
項目 | M5インスタンス | M5aインスタンス | 備考 |
---|---|---|---|
Dhrystone 2 using register variables | 4246.8 | 3708.9 | 整数演算処理 |
Double-Precision Whetstone | 1472.9 | 1480.1 | 浮動小数点数演算処理 |
Execl Throughput | 1785 | 1488.7 | 関数の呼び出し性能 |
File Copy 1024 bufsize 2000 maxblocks | 3064.9 | 2701.4 | ファイルコピー(バッファサイズ1024バイト) |
File Copy 256 bufsize 500 maxblocks | 1914.2 | 1800.7 | ファイルのコピー(バッファサイズ254バイト) |
File Copy 4096 bufsize 8000 maxblocks | 6617.9 | 5511.4 | ファイルのコピー(バッファサイズ4096バイト) |
Pipe Throughput | 1347.6 | 1549.6 | パイプ処理のスループット |
Pipe-based Context Switching | 973.1 | 835 | パイプベースのコンテキストのスイッチング処理 |
Process Creation | 1736.1 | 1795.3 | プロセスのフォーク処理 |
Shell Scripts (1 concurrent) | 2398.3 | 2293.5 | システムコールでのオーバーヘッド |
Shell Scripts (8 concurrent) | 2346.1 | 2310.5 | 単体シェルスクリプト処理 |
System Call Overhead | 794.2 | 1380.8 | 8個でのシェルスクリプト並列処理 |
項目 | M5インスタンス | M5aインスタンス |
---|---|---|
System Benchmarks Index Score | 2012.4 | 1984.2 |
まとめ
最終的に「 System Benchmarks Index Score」の値が大きいほうが良いです。
そんなに変わらないですけど若干「M5インスタンス」が優位ですかね。
おまけ
実行結果貼り付け
M5インスタンス
======================================================================== BYTE UNIX Benchmarks (Version 5.1.3) System: ip-172-168-3-51.ap-northeast-1.compute.internal: GNU/Linux OS: GNU/Linux -- 4.14.104-95.84.amzn2.x86_64 -- #1 SMP Sat Mar 2 00:40:20 UTC 2019 Machine: x86_64 (x86_64) Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8") CPU 0: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (5000.0 bogomips) Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET CPU 1: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (5000.0 bogomips) Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET 07:01:33 up 4 min, 1 user, load average: 0.02, 0.03, 0.00; runlevel 5 ------------------------------------------------------------------------ Benchmark Run: Tue Mar 19 2019 07:01:33 - 07:18:23 2 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 37741819.2 lps (10.0 s, 4 samples) Double-Precision Whetstone 4690.4 MWIPS (9.9 s, 4 samples) Execl Throughput 5207.8 lps (29.9 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 824302.2 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 217374.0 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 2605794.9 KBps (30.0 s, 2 samples) Pipe Throughput 1134627.7 lps (10.0 s, 4 samples) Pipe-based Context Switching 78779.6 lps (10.0 s, 4 samples) Process Creation 13211.8 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 8618.6 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1389.8 lpm (60.0 s, 2 samples) System Call Overhead 739542.3 lps (10.0 s, 4 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 37741819.2 3234.1 Double-Precision Whetstone 55.0 4690.4 852.8 Execl Throughput 43.0 5207.8 1211.1 File Copy 1024 bufsize 2000 maxblocks 3960.0 824302.2 2081.6 File Copy 256 bufsize 500 maxblocks 1655.0 217374.0 1313.4 File Copy 4096 bufsize 8000 maxblocks 5800.0 2605794.9 4492.7 Pipe Throughput 12440.0 1134627.7 912.1 Pipe-based Context Switching 4000.0 78779.6 196.9 Process Creation 126.0 13211.8 1048.6 Shell Scripts (1 concurrent) 42.4 8618.6 2032.7 Shell Scripts (8 concurrent) 6.0 1389.8 2316.3 System Call Overhead 15000.0 739542.3 493.0 ======== System Benchmarks Index Score 1272.0 ------------------------------------------------------------------------ Benchmark Run: Tue Mar 19 2019 07:18:23 - 07:35:14 2 CPUs in system; running 2 parallel copies of tests Dhrystone 2 using register variables 49559728.2 lps (10.0 s, 4 samples) Double-Precision Whetstone 8101.2 MWIPS (10.0 s, 4 samples) Execl Throughput 7675.6 lps (29.6 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 1213697.0 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 316806.9 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 3838399.1 KBps (30.0 s, 2 samples) Pipe Throughput 1676385.3 lps (10.0 s, 4 samples) Pipe-based Context Switching 389244.2 lps (10.0 s, 4 samples) Process Creation 21874.5 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 10168.6 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1407.6 lpm (60.0 s, 2 samples) System Call Overhead 1191333.6 lps (10.0 s, 4 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 49559728.2 4246.8 Double-Precision Whetstone 55.0 8101.2 1472.9 Execl Throughput 43.0 7675.6 1785.0 File Copy 1024 bufsize 2000 maxblocks 3960.0 1213697.0 3064.9 File Copy 256 bufsize 500 maxblocks 1655.0 316806.9 1914.2 File Copy 4096 bufsize 8000 maxblocks 5800.0 3838399.1 6617.9 Pipe Throughput 12440.0 1676385.3 1347.6 Pipe-based Context Switching 4000.0 389244.2 973.1 Process Creation 126.0 21874.5 1736.1 Shell Scripts (1 concurrent) 42.4 10168.6 2398.3 Shell Scripts (8 concurrent) 6.0 1407.6 2346.1 System Call Overhead 15000.0 1191333.6 794.2 ======== System Benchmarks Index Score 2012.4
M5aインスタンス
======================================================================== BYTE UNIX Benchmarks (Version 5.1.3) System: ip-172-168-3-208.ap-northeast-1.compute.internal: GNU/Linux OS: GNU/Linux -- 4.14.104-95.84.amzn2.x86_64 -- #1 SMP Sat Mar 2 00:40:20 UTC 2019 Machine: x86_64 (x86_64) Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8") CPU 0: AMD EPYC 7571 (4399.7 bogomips) Hyper-Threading, x86-64, MMX, AMD MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET CPU 1: AMD EPYC 7571 (4399.7 bogomips) Hyper-Threading, x86-64, MMX, AMD MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET 07:01:27 up 3 min, 1 user, load average: 0.07, 0.08, 0.03; runlevel 5 ------------------------------------------------------------------------ Benchmark Run: Tue Mar 19 2019 07:01:27 - 07:18:25 2 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 34369205.7 lps (10.0 s, 4 samples) Double-Precision Whetstone 4198.4 MWIPS (11.7 s, 4 samples) Execl Throughput 4762.9 lps (29.6 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 809240.5 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 219205.0 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 2308868.3 KBps (30.0 s, 2 samples) Pipe Throughput 1449217.0 lps (10.0 s, 4 samples) Pipe-based Context Switching 49074.0 lps (10.0 s, 4 samples) Process Creation 10161.9 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 7545.8 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1368.9 lpm (60.0 s, 2 samples) System Call Overhead 1659072.6 lps (10.0 s, 4 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 34369205.7 2945.1 Double-Precision Whetstone 55.0 4198.4 763.3 Execl Throughput 43.0 4762.9 1107.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 809240.5 2043.5 File Copy 256 bufsize 500 maxblocks 1655.0 219205.0 1324.5 File Copy 4096 bufsize 8000 maxblocks 5800.0 2308868.3 3980.8 Pipe Throughput 12440.0 1449217.0 1165.0 Pipe-based Context Switching 4000.0 49074.0 122.7 Process Creation 126.0 10161.9 806.5 Shell Scripts (1 concurrent) 42.4 7545.8 1779.7 Shell Scripts (8 concurrent) 6.0 1368.9 2281.5 System Call Overhead 15000.0 1659072.6 1106.0 ======== System Benchmarks Index Score 1245.2 ------------------------------------------------------------------------ Benchmark Run: Tue Mar 19 2019 07:18:25 - 07:35:16 2 CPUs in system; running 2 parallel copies of tests Dhrystone 2 using register variables 43282979.7 lps (10.0 s, 4 samples) Double-Precision Whetstone 8140.8 MWIPS (9.9 s, 4 samples) Execl Throughput 6401.3 lps (29.6 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 1069771.9 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 298022.9 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 3196602.2 KBps (30.0 s, 2 samples) Pipe Throughput 1927648.8 lps (10.0 s, 4 samples) Pipe-based Context Switching 333996.5 lps (10.0 s, 4 samples) Process Creation 22620.8 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 9724.3 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1386.3 lpm (60.0 s, 2 samples) System Call Overhead 2071274.1 lps (10.0 s, 4 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 43282979.7 3708.9 Double-Precision Whetstone 55.0 8140.8 1480.1 Execl Throughput 43.0 6401.3 1488.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 1069771.9 2701.4 File Copy 256 bufsize 500 maxblocks 1655.0 298022.9 1800.7 File Copy 4096 bufsize 8000 maxblocks 5800.0 3196602.2 5511.4 Pipe Throughput 12440.0 1927648.8 1549.6 Pipe-based Context Switching 4000.0 333996.5 835.0 Process Creation 126.0 22620.8 1795.3 Shell Scripts (1 concurrent) 42.4 9724.3 2293.5 Shell Scripts (8 concurrent) 6.0 1386.3 2310.5 System Call Overhead 15000.0 2071274.1 1380.8 ======== System Benchmarks Index Score 1984.2