Skip to content

How to use kerberos authentication in streamsx.hdfs toolkit

Ahmad Nouri edited this page May 14, 2019 · 11 revisions

How to use kerberos authentication in streamsx.hdfs toolkit

The streamsx.hdfs toolkit supports kerberos authentication.

This document describes a step by step procedure to:

  • Installation and configuration of Ambari

  • Installation of HDFS and HBASE start the services

  • Enabling kerberos

  • Using of principal and *keytab" in stremasx.hdfs toolkit.

A keytab is a file containing pairs of Kerberos principals and encrypted keys that are derived from the Kerberos password.

Ambari installation

Here is a short installation and configuration guide for Ambari. For detail information about the installation of Ambari check this link: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-installation/content/install-ambari-server.html

login as root:

cd /etc/yum.repos.d/
wget  http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.6.1.5/ambari.repo

for redhat 6

wget  http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.1.5/ambari.repo

Now install ambari-server and ambari-agent

yum install ambari-server
yum install ambari-agent
ambari-server setup -s -v
# accept all defaults and go to the next:
ambari-server start
ambari-server status

edit /etc/ambari-agent/conf/ambari-agent.ini file and adding the following configuration property to the security section:

[security]
force_https_protocol=PROTOCOL_TLSv1_2

Install and start ambari-agent in every hosts of your HBASE cluster.

ambari-agent start
ambari-agent status

If everything goes well you can see the following status:

Found ambari-agent PID: 1429
ambari-agent running.
Agent PID at: /run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log

Now check the status of ambari-server

ambari-server status
Using python  /usr/bin/python
Ambari-server status
Ambari Server running
Found Ambari Server PID: 16227 at: /var/run/ambari-server/ambari-server.pid

Create cluster with ambari

Start the ambari web configuration GUI. open this link in your browse.

 http:/<your-hdp-server>:8080

default port is 8080 default user: admin default password: admin It is possible to change all these defaults. More details in:

https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/ambari-chap2-2a.html

Cat

Ambari registration of hosts

Cat

Install HBASE and HBASE.

Launch the Ambari Install Wizard and follow the installation and configuration. Registration of hosts.

Customizing of services: For HBASE and HBASE test it is recommended to install only the following services: HBASE, MapReduce2, HBase, ZooKepper.

Install and start the services:

It takes about 30 minutes (or more dependence to the number of your services) to install and start the services . Started Services

Cat

Now you can login as root on your HDP server and copy the configuration files to your client: Hadoop configuration file: /usr/hdp/current/haddop-client/conf/core-seite.xml

HBASE configuration file: /usr/hdp/current/hbase-client/conf/hbase-site.xml

Installation and configuration of kerberos

Login as root in your hadoop server and install kerberos.

yum install krb5* -y

Kerberos configuration

In the following example the realm name is HDP2 and the server is hdp21.hdp.ibm.com. You can change it with your realm name and your server name.

Edit Kerberos configuration file

vi /etc/krb5.conf
[libdefaults]
default_realm = HDP2.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
forwardable = true
udp_preference_limit = 1000000
default_tkt_enctypes = des-cbc-md5 des-cbc-crc des3-cbc-sha1
default_tgs_enctypes = des-cbc-md5 des-cbc-crc des3-cbc-sha1
permitted_enctypes = des-cbc-md5 des-cbc-crc des3-cbc-sha1

[realms]
HDP2.COM = {
    kdc = hdp21.hdp.ibm.com:88
    admin_server = hdp21.hdp.ibm.com:749
    default_domain = hdp.ibm.com
}

[domain_realm]
.hdp.ibm.com = HDP2.COM
 hdp.ibm.com = HDP2.COM

[logging]
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmin.log
default = FILE:/var/log/krb5lib.log

edit /var/kerberos/krb5kdc/kdc.conf

default_realm = HDP2.COM

[kdcdefaults]
v4_mode = nopreauth
kdc_ports = 0

[realms]
HDP2.COM = {
    kdc_ports = 88
    admin_keytab = /etc/kadm5.keytab
    database_name = /var/kerberos/krb5kdc/principal
    acl_file = /var/kerberos/krb5kdc/kadm5.acl
    key_stash_file = /var/kerberos/krb5kdc/stash
    max_life = 10h 0m 0s
    max_renewable_life = 7d 0h 0m 0s
    master_key_type = des3-hmac-sha1
    supported_enctypes = arcfour-hmac:normal des3-hmac-sha1:normal des-cbc-crc:normal des:normal des:v4 des:norealm des:onlyrealm des:afs3
    default_principal_flags = +preauth
}

Edit /var/kerberos/krb5kdc/kadm5.acl

3- Creating the KDC database

kdb5_util create -r HDP2.COM -s

4- Add a principal for root user

/usr/sbin/kadmin.local -q "addprinc root/admin"

5- Restart krb5kdc and kadmin services

/sbin/service krb5kdc stop
/sbin/service kadmin stop

/sbin/service krb5kdc start
/sbin/service kadmin start

Check the status of servers: if everything goes well you can see like this status:

/sbin/service krb5kdc status
Redirecting to /bin/systemctl status krb5kdc.service
krb5kdc.service - Kerberos 5 KDC
Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2018-07-30 08:13:40 PDT; 20h ago
Main PID: 2030 (krb5kdc)
CGroup: /system.slice/krb5kdc.service
       └─2030 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid

Jul 30 08:13:40 hdp21.hdp.ibm.com systemd[1]: Starting Kerberos 5 KDC...
Jul 30 08:13:40 hdp21.hdp.ibm.com systemd[1]: Started Kerberos 5 KDC.


/sbin/service kadmin status
Redirecting to /bin/systemctl status kadmin.service
kadmin.service - Kerberos 5 Password-changing and Administration
Loaded: loaded (/usr/lib/systemd/system/kadmin.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2018-07-30 08:13:48 PDT; 20h ago
Main PID: 2045 (kadmind)
CGroup: /system.slice/kadmin.service
       └─2045 /usr/sbin/kadmind -P /var/run/kadmind.pid

Enabling of kerberos in Ambari

Now you can open your ambari GUI and enable Kerberos.

Cat

It restarts all service and create principals and keytab files for every user.

Cat

It is also possible to create your principals and keytab manually via kadmin tool.

It creates keytab files and saves them in /etc directory

/etc/security/keytabs

ls   /etc/security/keytabs/
activity-analyzer.headless.keytab
activity-explorer.headless.keytab
ambari.server.keytab
ams.collector.keytab
ams-hbase.master.keytab
ams-hbase.regionserver.keytab
ams-zk.service.keytab
dn.service.keytab
hbase.headless.keytab
hbase.service.keytab
hbase.headless.keytab
jhs.service.keytab
nfs.service.keytab
nm.service.keytab
nn.service.keytab
rm.service.keytab
smokeuser.headless.keytab
spnego.service.keytab
yarn.service.keytab
zk.service.keytab

The csv file with list of principals configured can be download using following Ambari URL

https://<ambari_hostname:port>/api/v1/clusters/<Cluster_name>/kerberos_identities?fields=*&format=csv

Where the placeholders <ambari_hostname:port> , and <Cluster_name> has to be replaced with your server corresponding values.

If you have another hadoop server, you can enable Kerberos authentication on your Hadoop cluster.

The following links shows "how to enable the kerberos authentication in Hortenworks, Cloudera and IOP".

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_security/content/_enabling_kerberos_security_in_ambari.html

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s4_kerb_wizard.html

https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_iop_kerberos.html

Create a streamsadmin user on Hadoop server

Login as root in your hadoop server

useradd streamsadmin
su - hdfs
hadoop fs -mkdir /user/streamsadmin
hadoop fs -mkdir /user/streamsadmin/testDirectory
hadoop fs -chown -R streamsadmin:streamsadmin /user/streamsadmin

In the above sample the name of streamsuser is streamsadmin.

If your streamsuser has another username, you can create a HDFS user with your user name.

HDFS Connection via kerberos authentication

Kerberos authentication is a network protocol to provide strong authentication for client/server applications.

The streamsx.hdfs toolkit support kerberos authentication. All operators have 2 additional parameters for kerberos authentication:

The authKeytab parameter specifies the kerberos keytab file that is created for the principal. The authPrincipal parameter specifies the Kerberos principal, which is typically the principal that is created for the HDFS server.

  • 1- Copy core-site.xml file from Hadoop server on your Streams server in etc directory of your SPL application.

  • 2- Copy the keytab file from Hadoop server on your Streams server in etc directory of your SPL application.

  • 3- Test the keytab For example:

      kinit -k -t hdfs.headless.keytab <your-hdfs-pricipal>
    

If your configuration is correct, it creates a file in /tmp directory like this:

      /tmp/crb5_xxxx

xxxx is your user id for example 1005

  • 4- Add the hostname and the IP address of your HDFS server (cluster) in your /etc/hosts file on stream server.

  • 5- If you have any problem to access to the realm, copy the crb5.conf file from your HDFS server into a directory and add the following vmArg parameter to all your HDFS2 operators in your SPL files: In this case the path of krb5.conf file is an absolute path For example:

    vmArg : "-Djava.security.krb5.conf=/etc/krb5.conf";
    

    vmArgs is a set of arguments to be passed to the Java VM within which this operator will be run.

  • 5- Create a test file in data directory and add some lines into the test file data/LineInput.txt

The following SPL application demonstrates how to connect to an HDFS via kerberos authentication.

It reads the lines from the file data/LineInput.txt via FileSource operator.

Then the HDFS2FileSink operator writes all incoming lines into a file in your HDFS server in your user directory.

For example:

/user/streamsadmin/HDFS-LineInput.txt

In the next step the HDFS2FileSource reads lines from the HDFS file /user/streamsadmin/HDFS-LineInput.txt.

In the last step the FileSink operator writes all incoming lines from HDFS file in your local system into data/result.txt

All project files are in :

https://github.com/IBMStreams/streamsx.hdfs/tree/gh-pages/samples/HdfsKerberos

namespace application ;

use com.ibm.streamsx.hdfs::HDFS2FileSink ;
use com.ibm.streamsx.hdfs::HDFS2FileSource ;

composite HdfsKerberos
{
    param
            expression<rstring> $authKeytab : getSubmissionTimeValue("authKeytab", "etc/hdfs.headless.keytab") ;
            expression<rstring> $authPrincipal : getSubmissionTimeValue("authPrincipal", "[email protected]") ;
            expression<rstring> $configPath : getSubmissionTimeValue("configPath", "etc") ;

    graph
            stream<rstring lines> LineIn = FileSource()
            {
                    param
                            file : "LineInput.txt" ;
            }

            () as lineSink1 = HDFS2FileSink(LineIn)
            {
                    param
                            authKeytab       : $authKeytab ;
                            authPrincipal    : $authPrincipal ;
                            configPath       : $configPath ;
                            file             : "HDFS-LineInput.txt" ;
            }

            stream<rstring lines> LineStream = HDFS2FileSource()
            {
                    param
                            authKeytab       : $authKeytab;
                            authPrincipal    : $authPrincipal ;
                            configPath       : $configPath ;
                            file             : "HDFS-LineInput.txt" ;
                            initDelay        : 10.0l ;
            }

            () as MySink = FileSink(LineStream)
            {
                    param
                            file             : "result.txt" ;
                            flush            : 1u ;
   
            }
}