Big Data Joe Brain Dump: HDFS Testing and Benchmarking

May 19, 2014

I am putting together a test and benchmarking plan for Hadoop clusters as it seems to be something that can be difficult to find on the interwebs.

Test the HDFS filesystem:

hadoop jar $HADOOP_HOME/hadoop-test-*.jar testfilesystem -files 10 -megaBytes 10

This command will generate 10 files with 10 MB each for testing. This test can be customized by changing the -files and -megaBytes.

Benchmark the distributed write consistency on the distributed filesystem:

hadoop jar $HADOOP_HOME/hadoop-test-*.jar DistributedFSCheck -write - nrFiles 10 -fileSize 50

This command will write 10 (controlled by the -nrFiles option) files of 50 MB (controlled by t he -fileSize opt ion) with random content to the HDFS. It will generate a result file named TestDFSIO_results.log.

Benchmark the distributed read consistency on the distributed filesystem:

hadoop jar $HADOOP_HOME/hadoop-test-*.jar DistributedFSCheck -read - nrFiles 10 -fileSize 50

The command will read 10 files with the size of 50 MB from the cluster and will generate a result file result file named TestDFSIO_results.log.

I will be posting these other tests soon:

MapReduce Testing and Benchmarking
MapReduce Sort Testing

- Big Data Joe

Big Data Joe Brain Dump: HDFS Testing and Benchmarking

Discussion about this post