Big Data Joe Brain Dump: HDFS Testing and Benchmarking
I am putting together a test and benchmarking plan for Hadoop clusters as it seems to be something that can be difficult to find on the interwebs.
Test the HDFS filesystem:
hadoop jar $HADOOP_HOME/hadoop-test-*.jar testfilesystem -files 10 -megaBytes 10
This command will generate 10 files with 10 MB each for testing. This test can be customized by changing the -files and -megaBytes.
Benchmark the distributed write consistency on the distributed filesystem:
hadoop jar $HADOOP_HOME/hadoop-test-*.jar DistributedFSCheck -write - nrFiles 10 -fileSize 50
This command will write 10 (controlled by the -nrFiles option) files of 50 MB (controlled by t he -fileSize opt ion) with random content to the HDFS. It will generate a result file named TestDFSIO_results.log.
Benchmark the distributed read consistency on the distributed filesystem:
hadoop jar $HADOOP_HOME/hadoop-test-*.jar DistributedFSCheck -read - nrFiles 10 -fileSize 50
The command will read 10 files with the size of 50 MB from the cluster and will generate a result file result file named TestDFSIO_results.log.
I will be posting these other tests soon:
MapReduce Testing and Benchmarking
MapReduce Sort Testing
- Big Data Joe