Wednesday, January 6, 2016

Hadoop : Rack Awareness

https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml

To embed the rack number into the hostnames could be an option.

Check out:
net.topology.node.switch.mapping.impl
net.topology.script.file.name

https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html
Hadoop Rack Awareness
The HDFS and the Map/Reduce components are rack-aware.

The NameNode and the JobTracker obtains the rack id of the slaves in the cluster by invoking an API resolve in an administrator configured module. The API resolves the slave's DNS name (also IP address) to a rack id. What module to use can be configured using the configuration item topology.node.switch.mapping.impl. The default implementation of the same runs a script/command configured using topology.script.file.name. If topology.script.file.name is not set, the rack id /default-rack is returned for any passed IP address. The additional configuration in the Map/Reduce part is mapred.cache.task.levels which determines the number of levels (in the network topology) of caches. So, for example, if it is the default value of 2, two levels of caches will be constructed - one for hosts (host -> task mapping) and another for racks (rack -> task mapping).

1 comment:

  1. Helpful blog to me.. Before reading this article i have no knowledge about hadoop rack awareness but now i learnt about this topic information clearly..

    hadoop training in chennai | big data training in chennai

    ReplyDelete