2&>1

AWSとかGCPとかGolangとかとか

M5とM5aインスタンスの性能比較を試す

M5aが東京リージョンでもリリースされました。

M5aとはAMD製CPUを使用したインスタンスですけど既存のIntel製CPUと比較してどれくらいなものかきになったので性能比較してみました。

今回の比較対象

・M5.large

・M5a.large

OSは共に「Amazon Linux 2」

準備

定番「UnixBench」を使います。

1 . モジュールをインストール

yum install -y wget gcc make perl perl-Time-HiRes patch

2 . UnixBenchのパッケージファイルダウンロードと展開

wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/byte-unixbench/UnixBench5.1.3.tgz
tar xfz UnixBench5.1.3.tgz

3 . パッチ適応(デフォルトでは16コア以上の測定ができないため。(本検証では不要ですけども))

cd UnixBench/
wget http://storage.googleapis.com/google-code-attachments/byte-unixbench/issue-4/comment-1/fix-limitation.patch
patch Run fix-limitation.patch

UnixBenchのインストール完了

計測

以下コマンドで計測

./Run -i 5

比較

まずはCPU情報の比較(2コアなので2つ分それぞれ出力されてます)

M5.large

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping        : 4
microcode       : 0x200005a
cpu MHz         : 3110.790
cache size      : 33792 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5000.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping        : 4
microcode       : 0x200005a
cpu MHz         : 3112.951
cache size      : 33792 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5000.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

M5a.large

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD EPYC 7571
stepping        : 2
microcode       : 0x8001227
cpu MHz         : 2664.885
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 4399.73
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD EPYC 7571
stepping        : 2
microcode       : 0x8001227
cpu MHz         : 2538.409
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 4399.73
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

実行結果

1CPUの場合

項目 M5インスタンス M5aインスタンス 備考
Dhrystone 2 using register variables 3234.1 2945.1 整数演算処理
Double-Precision Whetstone 852.8 763.3 浮動小数点数演算処理
Execl Throughput 1211.1 1107.7 関数の呼び出し性能
File Copy 1024 bufsize 2000 maxblocks 2081.6 2043.5 ファイルコピー(バッファサイズ1024バイト)
File Copy 256 bufsize 500 maxblocks 1313.4 1324.5 ファイルのコピー(バッファサイズ254バイト)
File Copy 4096 bufsize 8000 maxblocks 4492.7 3980.8 ファイルのコピー(バッファサイズ4096バイト)
Pipe Throughput 912.1 1165 パイプ処理のスループット
Pipe-based Context Switching 196.9 122.7 パイプベースのコンテキストのスイッチング処理
Process Creation 1048.6 806.5 プロセスのフォーク処理
Shell Scripts (1 concurrent) 2032.7 1779.7 システムコールでのオーバーヘッド
Shell Scripts (8 concurrent) 2316.3 2281.5 単体シェルスクリプト処理
System Call Overhead 493 1106 8個でのシェルスクリプト並列処理
項目 M5インスタンス M5aインスタンス
System Benchmarks Index Score 1272 1245.2

2CPUの場合

項目 M5インスタンス M5aインスタンス 備考
Dhrystone 2 using register variables 4246.8 3708.9 整数演算処理
Double-Precision Whetstone 1472.9 1480.1 浮動小数点数演算処理
Execl Throughput 1785 1488.7 関数の呼び出し性能
File Copy 1024 bufsize 2000 maxblocks 3064.9 2701.4 ファイルコピー(バッファサイズ1024バイト)
File Copy 256 bufsize 500 maxblocks 1914.2 1800.7 ファイルのコピー(バッファサイズ254バイト)
File Copy 4096 bufsize 8000 maxblocks 6617.9 5511.4 ファイルのコピー(バッファサイズ4096バイト)
Pipe Throughput 1347.6 1549.6 パイプ処理のスループット
Pipe-based Context Switching 973.1 835 パイプベースのコンテキストのスイッチング処理
Process Creation 1736.1 1795.3 プロセスのフォーク処理
Shell Scripts (1 concurrent) 2398.3 2293.5 システムコールでのオーバーヘッド
Shell Scripts (8 concurrent) 2346.1 2310.5 単体シェルスクリプト処理
System Call Overhead 794.2 1380.8 8個でのシェルスクリプト並列処理
項目 M5インスタンス M5aインスタンス
System Benchmarks Index Score 2012.4 1984.2

まとめ

最終的に「 System Benchmarks Index Score」の値が大きいほうが良いです。

そんなに変わらないですけど若干「M5インスタンス」が優位ですかね。

おまけ

実行結果貼り付け

M5インスタンス

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: ip-172-168-3-51.ap-northeast-1.compute.internal: GNU/Linux
   OS: GNU/Linux -- 4.14.104-95.84.amzn2.x86_64 -- #1 SMP Sat Mar 2 00:40:20 UTC 2019
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (5000.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (5000.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   07:01:33 up 4 min,  1 user,  load average: 0.02, 0.03, 0.00; runlevel 5

------------------------------------------------------------------------
Benchmark Run: Tue Mar 19 2019 07:01:33 - 07:18:23
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       37741819.2 lps   (10.0 s, 4 samples)
Double-Precision Whetstone                     4690.4 MWIPS (9.9 s, 4 samples)
Execl Throughput                               5207.8 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        824302.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          217374.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2605794.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1134627.7 lps   (10.0 s, 4 samples)
Pipe-based Context Switching                  78779.6 lps   (10.0 s, 4 samples)
Process Creation                              13211.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   8618.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1389.8 lpm   (60.0 s, 2 samples)
System Call Overhead                         739542.3 lps   (10.0 s, 4 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   37741819.2   3234.1
Double-Precision Whetstone                       55.0       4690.4    852.8
Execl Throughput                                 43.0       5207.8   1211.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     824302.2   2081.6
File Copy 256 bufsize 500 maxblocks            1655.0     217374.0   1313.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    2605794.9   4492.7
Pipe Throughput                               12440.0    1134627.7    912.1
Pipe-based Context Switching                   4000.0      78779.6    196.9
Process Creation                                126.0      13211.8   1048.6
Shell Scripts (1 concurrent)                     42.4       8618.6   2032.7
Shell Scripts (8 concurrent)                      6.0       1389.8   2316.3
System Call Overhead                          15000.0     739542.3    493.0
                                                                   ========
System Benchmarks Index Score                                        1272.0

------------------------------------------------------------------------
Benchmark Run: Tue Mar 19 2019 07:18:23 - 07:35:14
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       49559728.2 lps   (10.0 s, 4 samples)
Double-Precision Whetstone                     8101.2 MWIPS (10.0 s, 4 samples)
Execl Throughput                               7675.6 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1213697.0 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          316806.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3838399.1 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1676385.3 lps   (10.0 s, 4 samples)
Pipe-based Context Switching                 389244.2 lps   (10.0 s, 4 samples)
Process Creation                              21874.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  10168.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1407.6 lpm   (60.0 s, 2 samples)
System Call Overhead                        1191333.6 lps   (10.0 s, 4 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   49559728.2   4246.8
Double-Precision Whetstone                       55.0       8101.2   1472.9
Execl Throughput                                 43.0       7675.6   1785.0
File Copy 1024 bufsize 2000 maxblocks          3960.0    1213697.0   3064.9
File Copy 256 bufsize 500 maxblocks            1655.0     316806.9   1914.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    3838399.1   6617.9
Pipe Throughput                               12440.0    1676385.3   1347.6
Pipe-based Context Switching                   4000.0     389244.2    973.1
Process Creation                                126.0      21874.5   1736.1
Shell Scripts (1 concurrent)                     42.4      10168.6   2398.3
Shell Scripts (8 concurrent)                      6.0       1407.6   2346.1
System Call Overhead                          15000.0    1191333.6    794.2
                                                                   ========
System Benchmarks Index Score                                        2012.4

M5aインスタンス

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: ip-172-168-3-208.ap-northeast-1.compute.internal: GNU/Linux
   OS: GNU/Linux -- 4.14.104-95.84.amzn2.x86_64 -- #1 SMP Sat Mar 2 00:40:20 UTC 2019
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: AMD EPYC 7571 (4399.7 bogomips)
          Hyper-Threading, x86-64, MMX, AMD MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: AMD EPYC 7571 (4399.7 bogomips)
          Hyper-Threading, x86-64, MMX, AMD MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   07:01:27 up 3 min,  1 user,  load average: 0.07, 0.08, 0.03; runlevel 5

------------------------------------------------------------------------
Benchmark Run: Tue Mar 19 2019 07:01:27 - 07:18:25
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       34369205.7 lps   (10.0 s, 4 samples)
Double-Precision Whetstone                     4198.4 MWIPS (11.7 s, 4 samples)
Execl Throughput                               4762.9 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        809240.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          219205.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2308868.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1449217.0 lps   (10.0 s, 4 samples)
Pipe-based Context Switching                  49074.0 lps   (10.0 s, 4 samples)
Process Creation                              10161.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7545.8 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1368.9 lpm   (60.0 s, 2 samples)
System Call Overhead                        1659072.6 lps   (10.0 s, 4 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   34369205.7   2945.1
Double-Precision Whetstone                       55.0       4198.4    763.3
Execl Throughput                                 43.0       4762.9   1107.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     809240.5   2043.5
File Copy 256 bufsize 500 maxblocks            1655.0     219205.0   1324.5
File Copy 4096 bufsize 8000 maxblocks          5800.0    2308868.3   3980.8
Pipe Throughput                               12440.0    1449217.0   1165.0
Pipe-based Context Switching                   4000.0      49074.0    122.7
Process Creation                                126.0      10161.9    806.5
Shell Scripts (1 concurrent)                     42.4       7545.8   1779.7
Shell Scripts (8 concurrent)                      6.0       1368.9   2281.5
System Call Overhead                          15000.0    1659072.6   1106.0
                                                                   ========
System Benchmarks Index Score                                        1245.2

------------------------------------------------------------------------
Benchmark Run: Tue Mar 19 2019 07:18:25 - 07:35:16
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       43282979.7 lps   (10.0 s, 4 samples)
Double-Precision Whetstone                     8140.8 MWIPS (9.9 s, 4 samples)
Execl Throughput                               6401.3 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1069771.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          298022.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3196602.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1927648.8 lps   (10.0 s, 4 samples)
Pipe-based Context Switching                 333996.5 lps   (10.0 s, 4 samples)
Process Creation                              22620.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   9724.3 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1386.3 lpm   (60.0 s, 2 samples)
System Call Overhead                        2071274.1 lps   (10.0 s, 4 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   43282979.7   3708.9
Double-Precision Whetstone                       55.0       8140.8   1480.1
Execl Throughput                                 43.0       6401.3   1488.7
File Copy 1024 bufsize 2000 maxblocks          3960.0    1069771.9   2701.4
File Copy 256 bufsize 500 maxblocks            1655.0     298022.9   1800.7
File Copy 4096 bufsize 8000 maxblocks          5800.0    3196602.2   5511.4
Pipe Throughput                               12440.0    1927648.8   1549.6
Pipe-based Context Switching                   4000.0     333996.5    835.0
Process Creation                                126.0      22620.8   1795.3
Shell Scripts (1 concurrent)                     42.4       9724.3   2293.5
Shell Scripts (8 concurrent)                      6.0       1386.3   2310.5
System Call Overhead                          15000.0    2071274.1   1380.8
                                                                   ========
System Benchmarks Index Score                                        1984.2