MAMBA DSBA Hadoop User Notes

DSBA Hadoop is running Cloudera’s Distribution of Hadoop (CDH) 5.14, on top of Red Hat Enterprise Linux 7.2. Here are a few details regarding our implementation of CDH5.14 on DSBA Hadoop.

ACCESS TO MAMBA DSBA HADOOP

Currently, only classes are added to the MAMBA Education environment.  If you are unsure whether your class has access to MAMBA or not, please contact your TA or instructor.

Users must have already registered for DUO authentication. You can follow these instructions to do so: https://spaces.uncc.edu/pages/viewpage.action?pageId=35651686

dsba-hadoop.uncc.edu is the MAMBA EDGE node (interactive host) and HUE server for the cluster. You may use SSH to log into dsba-hadoop, and you can use SCP or SFTP to transfer files to-and-from this host. Please use your NINERNET USERNAME and PASSWORD to log in.

HDFS

The HDFS volume for DSBA Hadoop is 130 TB, and is accessible through the "hadoop fs" (or "hdfs dfs") commands. Your HDFS home directory is /user/<username>, there is quota limit of 2TB, and HDFS is NOT BACKED UP. HDFS is intended to be used for computation on the Hadoop cluster and not for long-term storage, so please do not store anything in HDFS that you have not backed up to your NFS storage or somewhere else.

HADOOP URLS

The Hadoop URLs (below) are NOT directly accessible from the campus network. They are only accessible from the DSBA Hadoop cluster's internal network. In order to navigate to the following URLs, you must run Firefox from the MAMBA interactive (EDGE) node, using X11 forwarding:

Namenode Info   http://mba-hm1.uncc.edu:50070
http://mba-hm2.uncc.edu:50070
Yarn/MapRed Info   http://mba-hm1.uncc.edu:8088
http://mba-hm2.uncc.edu:8088
MapRed Job History   http://mba-hm4.uncc.edu:19888/jobhistory
Spark Job History   http://mba-hm4.uncc.edu:18088
Spark2 Job History   http://mba-hm4.uncc.edu:18089
HBase Info   http://mba-hm3.uncc.edu:60010
Hue   http://mba-i2.uncc.edu:8889