Hadoopのベンチマーク計測サンプル(TeraSort, TestDFSIO)を動かしてみる

構築したHadoopクラスタの性能がどの程度なのかを知る上で、共通で標準的な計測手順があると便利だと思い調べてみたところ、Hadoopに標準でベンチマークを計測するスクリプトが用意されているようだったので、これを使ってみる。
サンプルはたくさんあるようだが、ベンチマークとして一般的に使えそうなものにTeraSortとTestDFSIOという2つがあった。
TeraSortはその名の通り、大量データのソート速度を図るもの。ソートなので主にCPUやメモリをベンチマークの対象としている。
もう一つのTestDFSIOは分散ファイルシステムのIOということで、HDFSへの書き込み/読み込みを図る。つまりディスクをベンチマークを対象としている。
今回はこのふたつのサンプルを実行させてみることにする。

なお、Hadoop環境はCDH 5.5がインストールされたHadoop環境で実施した。

TeraSort

TeraSortを実行するためには3ステップの手順を実行する。

TeraGen: ソート用の入力データ生成
TeraSort: ソート処理実行
TeraValidate: ソートされた結果の検証

この3ステップのプログラムはhadoop-examples.jarというHadoop標準のjarとして用意されているので、まずはこれをhadoop jarで呼び出してみる。

$ hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort ★
terasort: Run the terasort ★
teravalidate: Checking results of terasort ★
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

hadoop-examplesは今回のTeraSort以外にもWordCountやモンテカルロによる円周率算出(pi)などさまざまなプログラムが用意されていることが分かる。

TeraGen

TeraGenはTeraSortのための入力データを生成する。HDFS上の作業ディレクトリを/user/mapred/benchmarks/terasortとし、これが作成済みであるとする。
このとき、以下のコマンドでTeraGenが実行できる。

$ hadoop jar/usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teragen 10000/user/mapred/benchmarks/terasort/input

第1引数はteragenと指定する。
第2引数は生成データ行を数値で指定する。1行あたり100バイトとなるので、10000を指定した場合は1,000,000バイト、つまり1MBのファイルを生成する。
第3引数はデータの出力先のパスを指定する。

このコマンドを実行するとデータ生成のMapReduceジョブが実行される。
完了すると、指定したパスに以下のようにデータが生成される。

$ hadoop fs -ls /user/mapred/benchmarks/terasort/input
Found 3 items
-rw-r--r-- 1 mapredsupergroup     0 2016-07-19 15:51 /user/mapred/benchmarks/terasort/input/_SUCCESS
-rw-r--r-- 1 mapred supergroup  500000 2016-07-19 15:51 /user/mapred/benchmarks/terasort/input/part-m-00000
-rw-r--r-- 1 mapred supergroup  500000 2016-07-19 15:51 /user/mapred/benchmarks/terasort/input/part-m-00001

TeraSort

続いて生成したデータを実際にソートする。
以下のコマンドでTeraSortを実行する。

$ hadoop jar/usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /user/mapred/benchmarks/terasort/input/user/mapred/benchmarks/terasort/output

第1引数はterasortと指定する。
第2引数はソート対象の入力データパス、つまりTeraGenの出力先を指定する。
第3引数はソート結果を出力するパスを指定する。

コマンドを実行するとソートを行うMapReduceジョブが実行される。
TeraSortはベンチマークスクリプトだが、計測はやってくれないので、このときに処理時間やCPU、メモリ等のリソース使用状況は自分で取得する必要がある。
リソースの取り方はdstat等のOSコマンドで各ノードのリソース使用状況を確認したり、Cloudera Managerがあるならリソース画面で確認してもよい。
ジョブが完了すると、以下のように結果が出力されている。

$ hadoop fs -ls /user/mapred/benchmarks/terasort/output
Found 3 items
-rw-r--r-- 1 mapred supergroup     0 2016-07-19 15:58 /user/mapred/benchmarks/terasort/output/_SUCCESS
-rw-r--r-- 10 mapred supergroup     0 2016-07-19 15:57 /user/mapred/benchmarks/terasort/output/_partition.lst
-rw-r--r-- 1 mapred supergroup  1000000 2016-07-19 15:58 /user/mapred/benchmarks/terasort/output/part-r-00000

TeraValidate

TeraSortによるソート処理が正しく行われたかを判定する処理。
TeraSortの出力結果に対し、以下のようなコマンドを実行することで検証が行える。

$ hadoop jar/usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teravalidate/user/mapred/benchmarks/terasort/output /user/mapred/benchmarks/terasort/validate

第1引数はteravalidateと指定する。
第2引数は検証対象の入力データパス、つまりTeraSortのソート結果出力先を指定する。
第3引数は検証結果を出力するパスを指定する。

これを実行すると、以下のように結果が出力される。

$ hadoop fs -ls /user/mapred/benchmarks/terasort/validate
Found 2 items
-rw-r--r-- 1 mapred supergroup     0 2016-07-19 16:01 /user/mapred/benchmarks/terasort/validate/_SUCCESS
-rw-r--r-- 1 mapred supergroup    22 2016-07-19 16:01 /user/mapred/benchmarks/terasort/validate/part-r-00000

出力されたファイルの中身を確認する。

$ hadoop fs -cat /user/mapred/benchmarks/terasort/validate/part-r-00000
checksum    139abefd74b2

このようにchecksumのみが出力され、errorが出力されていなければソートは正しく行われたと判断できる。
ソートが正しくない場合は、以下のようにerrorが結果に含まれる。

checksum    365ed3f3e1
error misorder in part-m-00000 between 6a 97 43 59 ea ab 3a 59 4d 99 and 63 b6 04 4b 8e 78 91 14 83 73
error misorder in part-m-00000 between 88 2a 02 c3 15 36 2b 60 76 5f and 5c 90 ab 38 ae 52 89 62 15 d7
（略）
error misorder in part-m-00001 between 6e 45 fb 3d 1c 2c fd d1 cd 57 and 3c b4 46 e3 07 0c 3d 50 d0 19
error bad key partitioning:
 file part-m-00000:begin key 4a 69 6d 47 72 61 79 52 49 50
 file part-m-00000:end key 2c b7 a1 bb 93 62 af 20 a4 d9

TestDFSIO

TestDFSIOはHDFS上にデータを書き込み、読み込みしてIOを計測するためのベンチマークスクリプト。書き込みと読み込みでそれぞれ独立した処理になっている。
実行ファイルは以下で、TeraSortで使ったjarとは異なるので注意。

$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark. ★
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

TestDFSIOを引数なしで実行するとUsageが見れる。

$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO
16/07/19 17:46:10 INFO fs.TestDFSIO: TestDFSIO.1.7
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes]

書き込み

1MBファイルを10個、つまり10MBの書き込みを実行するコマンドは以下のように実行する。

$ hadoop jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO \
-D test.build.data=/user/mapred/benchmarks/TestDFSIO \
-write -nrFiles 10 -fileSize 1MB

-Dオプションでtest.build.dataプロパティに計測用のデータ書き込み先パスを指定する。
-writeオプションで書き込みであることを指定し、-nrFilesでファイル数、-fileSizeでファイルごとのサイズを指定する。

これを実行するとMapReduceジョブが開始されるが、TeraSortと違い、ジョブの最後にベンチマーク値が標準出力される。

(・・・略)
16/07/19 16:16:37 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
16/07/19 16:16:37 INFO fs.TestDFSIO:      Date & time: Tue Jul 19 16:16:37 JST 2016
16/07/19 16:16:37 INFO fs.TestDFSIO:    Number of files: 10
16/07/19 16:16:37 INFO fs.TestDFSIO: Total MBytes processed: 10.0
16/07/19 16:16:37 INFO fs.TestDFSIO:   Throughput mb/sec: 1.196888090963495
16/07/19 16:16:37 INFO fs.TestDFSIO: Average IO rate mb/sec: 1.5878732204437256
16/07/19 16:16:37 INFO fs.TestDFSIO: IO rate std deviation: 1.2619301662658684
16/07/19 16:16:37 INFO fs.TestDFSIO:  Test exec time sec: 51.268
16/07/19 16:16:37 INFO fs.TestDFSIO:

スループットやI/O、処理時間などが分かる。

読み込み

書き込みと同じように、ただオプションを-writeから-readに変えるだけでよい。

$ hadoop jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO \
-D test.build.data=/user/mapred/benchmarks/TestDFSIO \
-read -nrFiles 10 -fileSize 1MB

こちらも同じように結果が標準出力される。

(・・・略)
16/07/19 16:20:28 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
16/07/19 16:20:28 INFO fs.TestDFSIO:      Date & time: Tue Jul 19 16:20:28 JST 2016
16/07/19 16:20:28 INFO fs.TestDFSIO:    Number of files: 10
16/07/19 16:20:28 INFO fs.TestDFSIO: Total MBytes processed: 10.0
16/07/19 16:20:28 INFO fs.TestDFSIO:   Throughput mb/sec: 74.6268656716418
16/07/19 16:20:28 INFO fs.TestDFSIO: Average IO rate mb/sec: 164.8916778564453
16/07/19 16:20:28 INFO fs.TestDFSIO: IO rate std deviation: 148.3374932963216
16/07/19 16:20:28 INFO fs.TestDFSIO:  Test exec time sec: 48.181
16/07/19 16:20:28 INFO fs.TestDFSIO:

クリーン

書き込みや読み込みで生成したデータは削除されないので、計測が終わった場合ややり直したい場合はクリーンコマンドを実行するとよい。
パラメータは以下のように実行する。

$ hadoop jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO \
-D test.build.data=/user/mapred/benchmarks/TestDFSIO \
-clean

これで以下のように出力される。

(・・・略)
16/07/19 16:23:49 INFO fs.TestDFSIO: TestDFSIO.1.7
16/07/19 16:23:49 INFO fs.TestDFSIO: nrFiles = 1
16/07/19 16:23:49 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
16/07/19 16:23:49 INFO fs.TestDFSIO: bufferSize = 1000000
16/07/19 16:23:49 INFO fs.TestDFSIO: baseDir = /user/mapred/benchmarks/TestDFSIO
16/07/19 16:23:50 INFO fs.TestDFSIO: Cleaning up test files

クリーンはtest.build.dataオプションで指定したディレクトリが削除される。

$ hadoop fs -ls /user/mapred/benchmarks/TestDFSIO
ls: `/user/mapred/benchmarks/TestDFSIO': No such file or directory

ベンチマーク値の見方の注意

読み込み・書き込みのベンチマーク値の見方に注意がある。
まず、計測コマンドを実行したあとの結果はHDFS上の以下に出力されている。

$ hadoop fs -cat /user/mapred/benchmarks/TestDFSIO/io_read/part-00000
f:rate 1648916.8
f:sqrate    4.91932768E8
l:size 10485760
l:tasks 10
l:time 134

ここではpart-*という名前のファイルがたくさんあるが、これはそのMapタスクで処理したファイルのサイズや処理時間の合計を表している。
一方、標準出力される方はこれらの集計である。

※参考: 以下のソースおよびメーリングリスト
https://github.com/facebookarchive/hadoop-20/blob/master/src/test/org/apache/hadoop/fs/TestDFSIO.java
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200901.mbox/%3C496EACE2.2090007@yahoo-inc.com%3E

16/07/19 16:20:28 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
16/07/19 16:20:28 INFO fs.TestDFSIO:      Date & time: Tue Jul 19 16:20:28 JST 2016
16/07/19 16:20:28 INFO fs.TestDFSIO:    Number of files: 10
16/07/19 16:20:28 INFO fs.TestDFSIO: Total MBytes processed: 10.0
16/07/19 16:20:28 INFO fs.TestDFSIO:   Throughput mb/sec: 74.6268656716418
16/07/19 16:20:28 INFO fs.TestDFSIO: Average IO rate mb/sec: 164.8916778564453
16/07/19 16:20:28 INFO fs.TestDFSIO: IO rate std deviation: 148.3374932963216
16/07/19 16:20:28 INFO fs.TestDFSIO:  Test exec time sec: 48.181

最後のTest exec time secはhadoop jarコマンド全体の実行時間となるため、並列性が考慮され、それによるオーバーヘッドが含まれる。
しかし結果ファイルのpart-*の方はMapタスクごとの時間の合計として算出されている。
標準出力のThroughput mb/secやAverage IO rate mb/secはこれらのpart-*ファイルの値の合計や平均を算出しているため、あくまでMapの平均であり、並列性は考慮されず、Reducerやオーバーヘッドが含まれない結果となるので注意。
要するに、ここでのThroughputは各MapのThroughputの平均値をとっているため、クラスタのスループットとならないことに注意が必要。
そのためクラスタスループットは、Total MBytes processed /Test exec time sec で近似するのがよさそう。
出力ファイルサイズを増やしていくと、Throughput mb/secの値はあるサイズからどんどん値が小さくなっていくが、これはMap数の平均スループットとなるため、頭打ちするのではなくどんどん値が小さくなっていく。
一方、Total MBytes processed /Test exec time sec で近似した値はそのサイズでほぼ頭打ちする数値となる。

まとめ

Hadoopのベンチマーク取得のためのTeraSortとTestDFSIOというふたつの例の動かし方を一通り見た。
ベンチマークの取り方を知っておくと、異なるHadoopクラスタを性能という観点で比較する場合や構築時の動作確認、もしくはノード追加やメモリ増設やクラスタパラメータ変更の効果を測定する場合に便利。

なお性能を見るときはCloudera Managerのリソース画面を見るのが一番見やすいと思う。機会があればこちらも記事にしたい。

（※2016/7/20追記）
Cloudera Managerのリソース画面について記事を書きました。
totech.hateblo.jp

以上！