Big Data and Cloud: September 2014

Tuesday, September 30, 2014

Linux : Multipath

Server A :

# multipath -ll
3600a098044316b37365d436b476e564f dm-2 NETAPP,LUN C-Mode
size=15G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:2 sdd 8:48   active ready running
| |- 1:0:1:2 sdg 8:96   active ready running
| |- 2:0:0:2 sdp 8:240 active ready running
| `- 2:0:1:2 sds 65:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 1:0:2:2 sdj 8:144 active ready running
|- 1:0:3:2 sdm 8:192 active ready running
|- 2:0:2:2 sdv 65:80 active ready running
`- 2:0:3:2 sdy 65:128 active ready running
3600a098044316b37305d44353075674e dm-1 NETAPP,LUN C-Mode
size=500G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:2:1 sdi 8:128 active ready running
| |- 1:0:3:1 sdl 8:176 active ready running
| |- 2:0:2:1 sdu 65:64 active ready running
| `- 2:0:3:1 sdx 65:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 1:0:0:1 sdc 8:32   active ready running
|- 1:0:1:1 sdf 8:80   active ready running
|- 2:0:0:1 sdo 8:224 active ready running
`- 2:0:1:1 sdr 65:16 active ready running
3600a098044316b37305d44353075674d dm-0 NETAPP,LUN C-Mode
size=500G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:2:0 sdh 8:112 active ready running
| |- 1:0:3:0 sdk 8:160 active ready running
| |- 2:0:2:0 sdt 65:48 active ready running
| `- 2:0:3:0 sdw 65:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 1:0:0:0 sdb 8:16   active ready running
|- 1:0:1:0 sde 8:64   active ready running
|- 2:0:0:0 sdn 8:208 active ready running
`- 2:0:1:0 sdq 65:0   active ready running

Server B:
# multipath -ll
3600a098044316b37365d436b476e564f dm-2 NETAPP,LUN C-Mode
size=15G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:2 sdd 8:48   active ready running
| |- 1:0:1:2 sdg 8:96   active ready running
| |- 2:0:0:2 sdp 8:240 active ready running
| `- 2:0:1:2 sds 65:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 1:0:2:2 sdj 8:144 active ready running
|- 1:0:3:2 sdm 8:192 active ready running
|- 2:0:2:2 sdv 65:80 active ready running
`- 2:0:3:2 sdy 65:128 active ready running
3600a098044316b37305d44353075674e dm-1 NETAPP,LUN C-Mode
size=500G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:2:1 sdi 8:128 active ready running
| |- 1:0:3:1 sdl 8:176 active ready running
| |- 2:0:2:1 sdu 65:64 active ready running
| `- 2:0:3:1 sdx 65:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 1:0:0:1 sdc 8:32   active ready running
|- 1:0:1:1 sdf 8:80   active ready running
|- 2:0:0:1 sdo 8:224 active ready running
`- 2:0:1:1 sdr 65:16 active ready running
3600a098044316b37305d44353075674d dm-0 NETAPP,LUN C-Mode
size=500G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:2:0 sdh 8:112 active ready running
| |- 1:0:3:0 sdk 8:160 active ready running
| |- 2:0:2:0 sdt 65:48 active ready running
| `- 2:0:3:0 sdw 65:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 1:0:0:0 sdb 8:16   active ready running
|- 1:0:1:0 sde 8:64   active ready running
|- 2:0:0:0 sdn 8:208 active ready running
`- 2:0:1:0 sdq 65:0   active ready running

Linux : Bash Code Injection Vulnerability via Specially Crafted Environment Variables (CVE-2014-6271, CVE-2014-7169)

https://access.redhat.com/articles/1200223

Diagnostic Steps:

If your system is not vulnerable, you will see output similar to:

$ env 'x=() { :;}; echo vulnerable' 'BASH_FUNC_x()=() { :;}; echo vulnerable' bash -c "echo test"
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `BASH_FUNC_x'
test

If your system is vulnerable, you will see output similar to:

$ env 'x=() { :;}; echo vulnerable' 'BASH_FUNC_x()=() { :;}; echo vulnerable' bash -c "echo test"
vulnerable
bash: BASH_FUNC_x(): line 0: syntax error near unexpected token `)'
bash: BASH_FUNC_x(): line 0: `BASH_FUNC_x() () { :;}; echo vulnerable'
bash: error importing function definition for `BASH_FUNC_x'
test

If the output of the above command contains a line containing only the word vulnerable you are using a vulnerable version of Bash. The patch used to fix this issue ensures that no code is allowed after the end of a Bash function

If your system is not vulnerable, you will see output similar to:

$ cd /tmp; rm -f /tmp/echo; env 'x=() { (a)=>\' bash -c "echo date"; cat /tmp/echo
date
cat: /tmp/echo: No such file or directory

If your system is vulnerable, the time and date information will be output on the screen and a file called /tmp/echo will be created.

$ cd /tmp; rm -f /tmp/echo; env 'x=() { (a)=>\' bash -c "echo date"; cat /tmp/echo
bash: x: line 1: syntax error near unexpected token `='
bash: x: line 1: `'
bash: error importing function definition for `x'
Tue Sep 30 09:57:39 EDT 2014
$ ls -ld /tmp/echo
-rw-rw-r-- 1 abcd abcd 29 Sep 30 09:57 /tmp/echo

Solution :
If your system is vulnerable, you can fix these issues by updating to the most recent version of the Bash package by running the following command:

# yum update bash

Monday, September 29, 2014

Oracle VM Manager : OVMAPI_2005E Summary: Server Cluster Failure, Description: Failed to destroy cluster

Oracle VM Manager version : 3.3.1

Problem : When trying to destroy a cluster, got the following error :

com.oracle.odof.exception.ObjectException: Caught during invoke method: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_2005E "[ServerDbImpl] 32:33:35:36:30:30:53:55:45:34:33:30:33:4c:30:35 (xxxxxxxxx)" contains a component "32:33:35:36:30:30:53:55:45:34:33:30:33:4c:30:35" in error. Error event: server.cluster.failure., Summary: Server Cluster Failure, Description: Failed to destroy cluster

OVMEVT_003503D_000 Server reboot is required.. [Fri Sep 26 14:17:58 EDT 2014]

Solution :

# ssh -l admin -p 10000 loopback

OVM> list filesystem

Command: list filesystem

Status: Success

Time: 2014-09-26 11:29:14,457 EDT

Data:

:
:
:

id:0004fb00000500005f792a3625f957de name:

:

:
:
:

OVM> delete filesystem id=0004fb00000500005f792a3625f957de

Command: delete filesystem id=0004fb00000500005f792a3625f957de

Status: Success

Time: 2014-09-26 11:40:12,672 EDT

JobId: 1411746009613

After that we can destroy the server pool

Machine Learning read list

https://github.com/soulmachine/machine-learning-cheat-sheet

Wednesday, September 24, 2014

Linux : ssh not working due to wrong key file permission

Problem: ssh not working due to wrong key file permission

In /var/log/messages

Sep 15 05:21:51 localhost sshd[21043]: error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Sep 15 05:21:51 localhost sshd[21043]: error: @ WARNING: UNPROTECTED PRIVATE KEY FILE! @
Sep 15 05:21:51 localhost sshd[21043]: error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Sep 15 05:21:51 localhost sshd[21043]: error: Permissions 0777 for '/etc/ssh/ssh_host_rsa_key' are too open.
Sep 15 05:21:51 localhost sshd[21043]: error: It is required that your private key files are NOT accessible by others.
Sep 15 05:21:51 localhost sshd[21043]: error: This private key will be ignored.
Sep 15 05:21:51 localhost sshd[21043]: error: bad permissions: ignore key: /etc/ssh/ssh_host_rsa_key
Sep 15 05:21:51 localhost sshd[21043]: error: Could not load host key: /etc/ssh/ssh_host_rsa_key

Solution : change the file permission back to read-able by root only

Tuesday, September 23, 2014

Marketing Technology Landscape Supergraphic (2014)

http://chiefmartec.com/2014/01/marketing-technology-landscape-supergraphic-2014/

Tuesday, September 16, 2014

SDN, Network Virtualization, And NFV In A Nutshell

http://www.networkcomputing.com/networking/sdn-network-virtualization-and-nfv-in-a-nutshell/a/d-id/1315755

The networking industry is awash in new terminology. Here is a quick guide to three of the hottest concepts in networking today.

Over the past several years, there's been an explosion of new networking concepts and terminology resulting from trends in data center technologies and virtualization. Terms like software-defined networking (SDN), network virtualization, and network functions virtualization (NFV) are used frequently in technical talks, vendor marketing material, and blogs.

Many networking professionals have a tenuous grasp on just what those terms mean and how they relate to one another. In this post, I will provide a basic working definition for each.
Software-defined networking
SDN is probably the most heavily used (and overused) term of the three. It generally means separating a data network's control functions from its packet forwarding functions. Why separate these functions? There are three main reasons being pushed by different solution sets in the networking industry right now.
First, the separation of hardware and software can allow vendors that specialize in each component to focus on bringing successful products to market in an independent, interoperable way. This, in turn, allows end users to select a combination of hardware and software that best suits their needs. This aspect of SDN is often called the "white-box" movement, harkening back to early white box personal computers, which were themselves decoupled from the operating systems that ran on them and sold largely as a collection of commodity components at a lower price than a fully integrated solution, such as an IBM PC or a Macintosh.
Not all SDN use cases necessitate (or even support) purchasing hardware and software developed independently, but the trend is growing. The result is value being driven into the networking software while the hardware vendors focus on reducing the cost of the commodity physical components.
Second, the decoupling of networking hardware and software allows for centralization of the control portion (called the control plane) while keeping the actual packet forwarding function (the forwarding plane) distributed across many physical network switches. This provides a means to configure, monitor, troubleshoot, and automate a large network built of many discrete hardware components as a single network "fabric."
The centralized control plane can then enable new or different forwarding behaviors and broader, more precise control of traffic flow. Many products that encompass data center fabrics and flow control methods such as OpenFlow leverage this facet of SDN.
Finally, the term SDN often goes hand in hand with the idea of network programmability: using homegrown or commercial tools that can interact closely with the software-based control plane to affect their configuration and behavior. By providing application programming interfaces (APIs) into the centralized SDN network control function and the information that supports the forwarding function, network management applications, provisioning tools, and homegrown scripts have a single point of interaction with the network that can greatly improve their effectiveness.
Network virtualization
Network virtualization refers to the virtualization of network resources or pathways to achieve application or tenant isolation. This isolation is often desirable for a variety of reasons, including scalability, fault isolation, security, and network abstraction. Isolation is sometimes accomplished with technologies that create virtual instances of a physical device, such as load balancers or firewall appliances that support being split into multiple virtual devices for different purposes.
Routers and Layer 3 switches can be virtualized using technologies such as virtual routing and forwarding instances (VRFs) to virtualize and isolate IP routing tables and routing functions. Ethernet switches support VLANs to provide Layer 2 path isolation and virtually carve up the broadcast domain of a single physical switch into multiple logical ones.
These techniques are often used in combination to provide a completely separate network environment for an application, business unit, or data center tenant. Path isolation and network virtualization can also be achieved using newer techniques like overlay network technologies such as VXLAN and NVGRE. This method provides tenant separation, containerization, and isolation as well as scalability. Another means for path isolation is flow manipulation using SDN technologies like OpenFlow.
There are various benefits and drawbacks to each of these network virtualization techniques, and there are situations in which they complement or conflict with one another. Detailed exploration of these pros and cons is beyond the scope of this article.
Network functions virtualization
NFV describes the concept of taking a function that traditionally runs on a dedicated network appliance -- usually a large appliance in the center of the network, shared by many tenants or applications -- and running those functions as virtual machines on the virtual server infrastructure (or sometimes dedicated virtualization resources).
The drawbacks of the traditional approach of monster firewalls or load balancers sitting in the middle of the network are numerous: They represent a large, shared fault domain and are typically very expensive because they must be sized for peak capacity (and thus are usually chronically underutilized). They also make it difficult to provide customers or users with configuration and monitoring access, or to perform maintenance without impacting multiple applications or tenants.
Major advances in the power of x86 microprocessors and compute virtualization technology have driven the success of NFV. Specialized hardware is increasingly unnecessary for many functions with virtual server hosts containing such powerful compute nodes. Once virtualized, those functions can be placed closer to where they are needed, containerized with an application or tenant, and replicated easily for building new, duplicate, or backup environments.
Fault domains are reduced to the specific container in which the function exists, and maintenance activities becomes easier, because multiple application owners don't need to agree on a common maintenance window for a software upgrade or other changes. NFV is usually used for upper-layer networking devices like firewalls, load balancers, NATs, and VPN appliances.
Virtualized network functions may rely on path isolation and containerization to ensure they are used by the intended application, such as ensuring a firewall is the default gateway for a containerized, isolated application. NFV may also rely on SDN flow programming techniques to force traffic through one or more virtualized network functions -- a process called service chaining.
NFV, SDN, and network virtualization are related when considering ways to design and implement a modern, scalable, secure, and highly available data center environment for multiple applications or tenants. Each topic has enough depth to warrant many volumes of material, but the goal of this post was to define the basics of each term and the basic means in which they are interdependent in modern data center implementation.

Monday, September 15, 2014

Linux : 7Zip

Download 7Zip from http://sourceforge.net/projects/p7zip/

After download :

cd p7zip_9.20.1
make all

# p7zip_9.20.1/bin/7za x cloudera-quickstart-vm-5.1.0-1-kvm.7z

7-Zip (A) [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)

Processing archive: cloudera-quickstart-vm-5.1.0-1-kvm.7z

Extracting cloudera-quickstart-vm-5.1.0-1-kvm/cloudera-quickstart-vm-5.1.0-1-kvm.qcow2
Extracting cloudera-quickstart-vm-5.1.0-1-kvm

Everything is Ok

Folders: 1
Files: 1
Size: 32828686336
Compressed: 3112562928

Friday, September 12, 2014

Solaris 11 : Show FCP device

# cfgadm -al -o show_FCP_dev
Ap_Id                          Type         Receptacle   Occupant     Condition
c5                             fc-fabric    connected    configured   unknown
c5::201b00a0983f416c,0         disk         connected    configured   unknown
c5::201c00a0983f416c,0         disk         connected    configured   unknown
c5::201d00a0983f416c,0         disk         connected    configured   unknown
c5::201e00a0983f416c,0         disk         connected    configured   unknown
c6                             fc-fabric    connected    configured   unknown
c6::201b00a0983f416c,0         disk         connected    configured   unknown
c6::201c00a0983f416c,0         disk         connected    configured   unknown
c6::201d00a0983f416c,0         disk         connected    configured   unknown
c6::201e00a0983f416c,0         disk         connected    configured   unknown

# cfgadm -al -o show_FCP_dev c5 c6
Ap_Id                          Type         Receptacle   Occupant     Condition
c5                             fc-fabric    connected    configured   unknown
c5::201b00a0983f416c,0         disk         connected    configured   unknown
c5::201c00a0983f416c,0         disk         connected    configured   unknown
c5::201d00a0983f416c,0         disk         connected    configured   unknown
c5::201e00a0983f416c,0         disk         connected    configured   unknown
c6                             fc-fabric    connected    configured   unknown
c6::201b00a0983f416c,0         disk         connected    configured   unknown
c6::201c00a0983f416c,0         disk         connected    configured   unknown
c6::201d00a0983f416c,0         disk         connected    configured   unknown
c6::201e00a0983f416c,0         disk         connected    configured   unknown

# cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
c2                             scsi-sas     connected    configured   unknown
c3                             scsi-sas     connected    configured   unknown
c4                             scsi-sas     connected    unconfigured unknown
c5                             fc-fabric    connected    configured   unknown
c6                             fc-fabric    connected    configured   unknown
c7                             scsi-sas     connected    configured   unknown
c8                             scsi-sas     connected    unconfigured unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok
usb0/5                         usb-hub      connected    configured   ok
usb0/5.1                       unknown      empty        unconfigured ok
usb0/5.2                       usb-communi connected    configured   ok
usb0/5.3                       unknown      empty        unconfigured ok
usb0/5.4                       unknown      empty        unconfigured ok
usb0/6                         unknown      empty        unconfigured ok
usb0/7                         unknown      empty        unconfigured ok
usb0/8                         usb-hub      connected    configured   ok
usb0/8.1                       unknown      empty        unconfigured ok
usb0/8.2                       unknown      empty        unconfigured ok
usb0/8.3                       unknown      empty        unconfigured ok
usb0/8.4                       unknown      empty        unconfigured ok

Linux : SAN : How to find out the shared SAN LUN

Problem : We have LUN presented to 2 Red Hat servers, how can we verify which device file is shared device ?

Solution : Find out the LUN with same serial number by running sq_inq

In server A :

# fdisk -l

Disk /dev/mapper/3600a098044316b37305d44353075674d: 537.0 GB, 536952700928 bytes
255 heads, 63 sectors/track, 65280 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

# sg_inq /dev/mapper/3600a098044316b37305d44353075674d
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3]
[AERC=0] [TrmTsk=0] NormACA=1 HiSUP=1 Resp_data_format=2
SCCS=0 ACC=0 TPGS=1 3PC=1 Protect=0 BQue=0
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=0
[RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=117 (0x75) Peripheral device type: disk
Vendor identification: NETAPP
Product identification: LUN C-Mode
Product revision level: 8200
Unit serial number: D1k70]D50ugM

In Server B :

# fdisk -l

Disk /dev/mapper/3600a098044316b37305d44353075674e: 537.0 GB, 536952700928 bytes
255 heads, 63 sectors/track, 65280 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

# sg_inq /dev/mapper/3600a098044316b37305d44353075674e
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3]
[AERC=0] [TrmTsk=0] NormACA=1 HiSUP=1 Resp_data_format=2
SCCS=0 ACC=0 TPGS=1 3PC=1 Protect=0 BQue=0
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=0
[RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=117 (0x75) Peripheral device type: disk
Vendor identification: NETAPP
Product identification: LUN C-Mode
Product revision level: 8200
Unit serial number: D1k70]D50ugN

Thursday, September 11, 2014

Solaris 11 : cannot ssh into the Solaris 11 server

Problem : cannot ssh into the Solaris 11 server

$ ssh id@$ip_solaris11
ssh_exchange_identification: Connection closed by remote host

Solution : vi /etc/hosts.allow, add the subnet or IP of the client and then restart inetd

vi /etc/hosts.allow and make sure the subnet or IP of the client is added to the sshd ALLOW section.

sshd:   x.x.x.x/255.255.255.0 \
        10.100.0.0/255.255.0.0 \
        x.x.x.x/255.255.255.0                       : ALLOW

Make sure sshd service is ON
# svcs -a | grep ssh
online         Sep_02   svc:/network/ssh:default

Now restart inetd
# svcadm restart inetd

Big Data Talks

http://www.kdnuggets.com/2014/09/most-viewed-big-data-talks-videolectures.html

http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation

Mining of massive datasets

http://www.mmds.org/

https://www.linkedin.com/pulse/article/20140918162814-111366377-let-s-get-nerdy-data-analytics-for-business-leaders-explained

http://simplystatistics.org/2014/05/22/10-things-statistics-taught-us-about-big-data-analysis/

http://www.datasciencecentral.com/profiles/blogs/data-science-cheat-sheet

www.coreservlets.com/hadoop-tutorial/

http://www.stratapps.net/intro-hive.php

Wednesday, September 10, 2014

Hadoop : Sqoop

http://www.slideshare.net/huguk/140410-huguksqoopunlockhadoop

https://www.youtube.com/watch?v=puQMQxkDNtw

Solaris : How To Find Fiber Host Bus Adapter (Fiber HBA) WWN

see also link

# luxadm -e port
/devices/pci@300/pci@1/pci@0/pci@4/SUNW,emlxs@0/fp@0,0:devctl      CONNECTED
/devices/pci@300/pci@1/pci@0/pci@4/SUNW,emlxs@0,1/fp@0,0:devctl    CONNECTED

# fcinfo hba-port
HBA Port WWN: 10000090fa72ac46
        Port Mode: Initiator
        Port ID: 210760
        OS Device Name: /dev/cfg/c5
        Manufacturer: Emulex
        Model: LPe12002-S
        Firmware Version: LPe12002-S 1.00a12
        FCode/BIOS Version: Boot:5.03a0 Fcode:3.01a1
        Serial Number: 4925382+1422000085
        Driver Name: emlxs
        Driver Version: 2.90.15.0 (2014.01.22.14.50)
        Type: N-port
        State: online
        Supported Speeds: 2Gb 4Gb 8Gb
        Current Speed: 8Gb
        Node WWN: 20000090fa72ac46
        NPIV Not Supported
HBA Port WWN: 10000090fa72ac47
        Port Mode: Initiator
        Port ID: 210680
        OS Device Name: /dev/cfg/c6
        Manufacturer: Emulex
        Model: LPe12002-S
        Firmware Version: LPe12002-S 1.00a12
        FCode/BIOS Version: Boot:5.03a0 Fcode:3.01a1
        Serial Number: 4925382+1422000085
        Driver Name: emlxs
        Driver Version: 2.90.15.0 (2014.01.22.14.50)
        Type: N-port
        State: online
        Supported Speeds: 2Gb 4Gb 8Gb
        Current Speed: 8Gb
        Node WWN: 20000090fa72ac47
        NPIV Not Supported

Tuesday, September 9, 2014

Hadoop : MapReduce : port 8020 failed on connection exception

Error when running MapReduce :

[hadoop@hadoopmaster1 ~]$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar teragen 100 tmp/teragenout

Caused by: java.net.ConnectException: Call From hadoopmaster1/192.168.1.187 to hadoopmaster1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

Solution : Restart HDFS Name Node

[hdfs@hadoopmaster1 ~]$jps
32192 DataNode

su hdfs

stop

/usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode

/usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop secondarynamenode

/usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop datanode

start

/usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode

/usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start secondarynamenode

/usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode

[hdfs@hadoopmaster1 ~]$jps

32192 DataNode

31851 NameNode

[root@hadoopmaster1 conf]# netstat -lent | grep :8020

tcp 0 0 192.168.1.187:8020 0.0.0.0:* LISTEN 493 548347

[hadoop@hadoopmaster1 ~]$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar teragen 10 tmp/teragenout

java.io.IOException: Output directory tmp/teragenout already exists.

[hadoop@hadoopmaster1 ~]$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar teragen 10 tmp/teragenout2

14/09/09 21:25:26 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster1/192.168.1.187:8050

14/09/09 21:25:29 INFO terasort.TeraSort: Generating 10 using 2

14/09/09 21:25:29 INFO mapreduce.JobSubmitter: number of splits:2

14/09/09 21:25:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410306715808_0002

14/09/09 21:25:32 INFO impl.YarnClientImpl: Submitted application application_1410306715808_0002

14/09/09 21:25:33 INFO mapreduce.Job: The url to track the job: http://hadoopmaster1:8088/proxy/application_1410306715808_0002/

14/09/09 21:25:33 INFO mapreduce.Job: Running job: job_1410306715808_0002

14/09/09 21:26:04 INFO mapreduce.Job: Job job_1410306715808_0002 running in uber mode : false

14/09/09 21:26:05 INFO mapreduce.Job: map 0% reduce 0%

14/09/09 21:26:49 INFO mapreduce.Job: map 100% reduce 0%

14/09/09 21:26:56 INFO mapreduce.Job: Job job_1410306715808_0002 completed successfully

14/09/09 21:26:57 INFO mapreduce.Job: Counters: 31

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=192632

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=158

HDFS: Number of bytes written=1000

HDFS: Number of read operations=8

HDFS: Number of large read operations=0

HDFS: Number of write operations=4

Job Counters

Launched map tasks=2

Other local map tasks=2

Total time spent by all maps in occupied slots (ms)=264387

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=88129

Total vcore-seconds taken by all map tasks=88129

Total megabyte-seconds taken by all map tasks=135366144

Map-Reduce Framework

Map input records=10

Map output records=10

Input split bytes=158

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=399

CPU time spent (ms)=3170

Physical memory (bytes) snapshot=188051456

Virtual memory (bytes) snapshot=2521628672

Total committed heap usage (bytes)=31719424

org.apache.hadoop.examples.terasort.TeraGen$Counters

CHECKSUM=24474895181

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1000

Hadoop : Fix ERROR check containerexecutor

Where is this error coming from ?

Now checking log file :
[root@hadoopmaster1 yarn]# pwd
/var/log/hadoop-yarn/yarn
[root@hadoopmaster1 yarn]# grep ERROR yarn-yarn-resourcemanager-hadoopmaster1.log
2014-09-08 23:47:58,974 INFO rmnode.RMNodeImpl (RMNodeImpl.java:transition(621)) - Node hadoopmaster1:45454 reported UNHEALTHY with details: ERROR check containerexecutor, OK: disks ok,

In /etc/hadoop/conf/yarn-site.xml

<name>yarn.nodemanager.health-checker.script.path</name>
<value>/etc/hadoop/conf/health_check</value>

[root@hadoopmaster1 conf]# pwd
/etc/hadoop/conf
[root@hadoopmaster1 conf]# ./health_check
ERROR check containerexecutor, OK: disks ok,

in /etc/hadoop/conf/health_check

function check_containerexecutor {
perm=`stat -c %a:%U:%G /usr/lib/hadoop-yarn/bin/container-executor 2>/dev/null
`
if [ $? -eq 0 ] && [ "$perm" == "6050:root:hadoop" ] ; then
echo "containerexecutor ok"
else
echo 'check containerexecutor' ; exit 1
fi
}

Solution :
chown root:hadoop /usr/lib/hadoop-yarn/bin/container-executor
chmod 6050 /usr/lib/hadoop-yarn/bin/container-executor

[root@hadoopmaster1 yarn]# chown root:hadoop /usr/lib/hadoop-yarn/bin/container-executor
[root@hadoopmaster1 yarn]# ls -al /usr/lib/hadoop-yarn/bin/container-executor
----r-x---. 1 root hadoop 40096 Aug 27 23:15 /usr/lib/hadoop-yarn/bin/container-executor
[root@hadoopmaster1 yarn]# chmod 6050 /usr/lib/hadoop-yarn/bin/container-executor
[root@hadoopmaster1 yarn]# ls -al /usr/lib/hadoop-yarn/bin/container-executor
---Sr-s---. 1 root hadoop 40096 Aug 27 23:15 /usr/lib/hadoop-yarn/bin/container-executor

[root@hadoopmaster1 conf]# ./health_check
OK: disks ok,containerexecutor ok,

Now restart YARN

stop:
/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager

/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager

start:
/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager

/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager