EMC Isilon with HDFS ... Good technology, but not for everyone.
Had some time to review the technology in more depth. Here’s my notes on what I’ve discovered along with some questions I had before reviewing and the answers I found.
Questions
Q. What version / flavor of Hadoop are you based on? Or is it just a mimic’d environment that understands Hadoop RPC.
A. Just mimic’s HDFS
Q.Is this a supported configuration with the major players in the Hadoop space … like Cloudera and Hortonworks.
A. Will work with them, but they won’t show up in the management layers of the solutions (Cloudera Manager / Hortonworks Ambari.
Q. Are you doing NNHA and NN Federation?
A. They remove the need for this.
Misc. Notes
Takes the responsibilities of the name node and data node.
isi_hdfs_d is the daemon running on the Isilon that handles HDFS.
6.5.5 or great has hdfs
They send back 3 datanode addresses … mimics replicas
They vary it … so it spreads the load.
Doesn’t care about rack awareness or locality
They do have some rack awareness, but not used a lot.
It constructs the metadata on the fly …
Tricking the client … not really telling it where it lives, just telling it where to get it.
Caveats
Hadoop should be cheap or free
Performance?
DNs are remote from the TTs
X + Y = time
X = Ingestion (Isilon wins)
Y = Processing Time (Hadoop wins)
Isilon is 1/3rd of the speed of DAS
File structure is more optimized in native HDFS compared to Isilon.
PivotalHD doesn’t really work either with Isilon, yet.