Accessing Kerberized Hadoop cluster using Ranger security policies and native APIs

Security always played important role in every informational system. For each particular solution before actual implementation we first need to carefully design its protection layers by means of different techniques like authorization, authentication or encryption.  But sometimes at the early start developers don’t pay much attention to this topic and concentrate their efforts on the functional aspects of their applications. This is what has happened to Hadoop.

In my previous article about security in world of Big Data I’ve already given a high-level overview of this model. Now I want to share some experience about how to work with Hadoop services on the low level straight from the source code. We will create new principle in Hadoop environment, then we will give him permissions in Ranger and will use him from java application to access the services remotely.

Understanding your realm

In this article I assume that you already have your cluster kerberized. It means that:

  1. You have Kerberos server up and running
  2. Every service on every node in Hadoop has a relative principle existing in Kerberos server database
  3. Every service in Hadoop is configured to authenticate to other applications using correspondent principle from Kerberos server

If your cluster is not kerberized you can follow this guide to apply authentication model to the environment.

Kerberos is represented by a service running under admin user permissions. By default  it is located at /usr/sbin/krb5kdc. The configuration part is located in separate file –  /etc/krb5.conf. It describes Kerberos environment and settings like logs location, machines in realm, trusted domains and tokens characteristics. Users can interact with the service through a special utility called kadmin (kadmin.local command from the local Kerberos server machine). In order to list all principles within Kerberos database you can use list_principals command:

[root@node1 ~]# kadmin.local

[root@node1 ~]# list_principals



Every principle can represent either service on particular node or a standalone user. Users are authenticating using input of the principle name and password. Services also have principle names but instead of passwords they are using special files called keytabs. Consider it an equivalent of user’s password so you should be really careful about granting read permissions for working with these files.

Adding new principle in Kerberos

In my example I want to create application which would access Hadoop file system in secured cluster. Assuming that it would be a standalone program like data provider service, I will create a service principle for this application so that Hadoop could recognize it properly using kadmin.local tool:

[root@node1 ~]# kadmin.local

[root@node1 ~]# addprinc -randkey alexservice@MYDOMAIN.COM

Now lets generate keytab for this principle:

[root@node1 ~]# ktadd -k /tmp/alexservice.keytab alexservice@MYDOMAIN.COM

By default we are working in Linux environment using local operation system user account. In order to decorate our session with Kerberos credentials we need to perform authentication. Lets do it using our new principle and his keytab file:

[root@node1 ~]# kdestroy
[root@node1 ~]# klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0)

[root@node1 ~]# kinit -kt /tmp/alexservice.keytab alexservice@MYDOMAIN.COM
[root@node1 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: alexservice@MYDOMAIN.COM

Valid starting Expires Service principal
06/14/17 11:59:37 06/15/17 11:59:37 krbtgt/NODE1@MYDOMAIN.COM
renew until 06/21/17 11:59:37

Now every interaction from shell to any Kerberized service would recognized as an interaction initiated by alexservice@MYDOMAIN.COM principle. For example when new request will come to HDFS, it will be able to recognize the original sender. Kerberos service will do all the magic behind the scene.

Granting permissions in Ranger

If you perform kdestroy command again, you will clean you session from authenticated principle. Also if you try to access any Hadoop service after this, you will get GSS error:

[root@node1 ~]# hdfs dfs -ls /users

17/06/14 12:23:05 WARN ipc.Client: Exception encountered while connecting to the server : GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Perform authentication again and repeat command:

[root@node1 ~]# kinit -kt /tmp/alexservice.keytab alexservice@MYDOMAIN.COM

[root@node1 ~]# hdfs dfs -ls /users

org.apache.hadoop.ipc.RemoteException( Permission denied: user=alexservice, access=READ_EXECUTE, inode=”/tmp”:hdfs:hdfs:drwxr-x—

Our session is now recognized as alexservice’s session but HDFS does not know what this principle can do. It verifies Access Control List and does not find any rule related to our new principle. At this stage I want to point on two very important things which you should get in order to understand security model:

  • Our principle is alexservice@MYDOMAIN.COM but in error we see alexservice without domain part. The reason is that Hadoop does not use the actual principle name but a user account mapped to this principle for authorization. To check what account our principle is mapped to we can run a tool HadoopKerberosName:

[root@node1 ~]# hadoop alexservice@MYDOMAIN.COM
Name: alexservice@MYDOMAIN.COM to alexservice

Hadoop uses special set of rules responsible for this mapping and they located at setting part of core-site.xml configuration. Other services can override this setting in their own config files. You can specify custom mapping logic there and I would recommend you to get aquatinted this a nice article of Robert Levas in order to learn how to do this

  • Our principle is mapper for alexservice user account, but who created this account and where is it been stored? The answer to this question depends on the type of authorization policy your cluster is configured to. Option one is that you are using flat Linux accounts and that would mean that you need to create user in your file system and put him into correspondent group using useradd command. Even though you are working as a root user, kerberized services will recognize you as a user mapped to you principle account and every service will apply custom authorization policy related to this user. If user does not exist you will get an error. Option two is that you are using Ranger security policies which allow you to systemize all authorization rules for different services in a single place. Ranger has custom database of users fed by special Sync Service or manually by administrator. In this case Hadoop will map your principle to these user accounts.

My cluster is configured to use Ranger. It contains set of authorization services for different Hadoop applications. Currently it support HDFS, HBase, Hive, YARN, KNOX, Storm, SOLR, Kafka and Atlas. Ranger provides web-based UI for managing these services. Each service plays the role of interceptor which handles every request and passes it through a set of policies which define the access rules to internal resources.


Figure 1 – Ranger UI applications and services

Ranger UI also allows to manage database of users and group – Settings => Users/Groups. Here you can verify all existing accounts which will be available for the usage in configuration of the policies of the services. Lets create new user following the result of the rule validated through the usage of HadoopKerberosName utility which we’ve performed a bit earlier:


Figure 2 – Create new user in Ranger

Now we can start using this user in our security policies. Let say we want to allow this user to access specific catalog in HDFS. Go to HDFS service and create a new policy for that:


Figure 3 – Create HDFS access policy for alexservice account

Once we add this policy to the service, Ranger will start using it for every request. If we try to access this catalog again we will see no exceptions this time:

[root@node1 ~]# hdfs dfs -ls /users
Found 39 items
drwxrwxrwx – oleksii.yermolenko hdfs 0 2017-05-18 09:01 /user/Oleksii.Yermolenko
-rwxrw-rw- 3 hbase hdfs 1433 2017-06-12 10:26 /user/hive


One important clarification here is that if Ranger fails to find suitable policy for the owner of incoming request for HDFS service, then it will be validated through file system access permissions. So for example if your Ranger policy has Read-only rule for certain resource and in HDFS file system has same user account or group which has Read-Write rule for this resource, at the end you will be able to perform Read-Write.

Accessing Hadoop through client APIs

At this point we have service principle which can properly authenticate to Hadoop services, this principle is mapped to Hadoop user account created through Ranger and Ranger has enabled policy which allows our user to access /user catalog in HDFS. When we start working outside the scope of the cluster we need to take into consideration that our environment should be fully prepared by the system engineering team to recognize Kerberos Hadoop realm. That means that our machine should trust this server or if we are working from another domain, it should have trusted relations with Hadoop realm. There is a nice post about this on one of Cloudera resources which could be quite useful for system administrators. If your domain is not configured for this, you can update you registry with settings which will allow your system to trust KDC located in the cluster. You can do it by launching these two commands from command line:

ksetup /addkdc MYDOMAIN.COM

ksetup /addhosttorealmmap MYDOMAIN.COM

HDFS provides a couple of ways to access the data within file system. First of all you can test your access to the service using your browser through WebHDFS RESTful API. But remember that we were using service principle what means that we will not be able to use password in prompt window. That is why we will follow an option of using Java client API in order to access the data. Hadoop provides a set of libraries which allow us to work with secured environment very easily. In order to get them you first need to add the dependency to you pom.xml file (I use 2.5 version of the client here):


This package contains a class called UserGroupInformation which encapsulates the logic around JAAS Subject class. If you are not familiar with JAAS specification I highly advice you to visit Oracle tutorial and implement a simple authenticator example. UserGroupInformation is a key component which allows you to manage your context principle. One of the methods loginUserFromKeytab of this class allows us to authenticate through keytab which we’ve created earlier. This is an example of how you can create a file at the location which we’ve decorated earlier with necessary permissions by means of Ranger policy:

public class Jaas {
  public static void main(String[] args) throws IOException {

    Configuration conf = new Configuration();
    conf.set("", "Kerberos");
    UserGroupInformation.loginUserFromKeytab("alexservice@MYDOMAIN.COM", "c:\\alexservice.keytab");
Configuration fsconf = new Configuration();
    fsconf.set("fs.defaultFS", "hdfs://node1:8020");
    fsconf.set("dfs.namenode.kerberos.principal.pattern", "nn/node1@MYDOMAIN.COM");
    FileSystem fs = FileSystem.get(fsconf);
    fs.createNewFile(new Path("/user/newfile"));
    FileStatus[] status = fs.listStatus(new Path("/user/newfile"));
    for (int i = 0; i  < status.length; i++) {

If you look at the output, you will see that that after authentication our context has changed from current user to alexservice@DOMAIN.COM. Once we start using FileSystem object after that, it will automatically acquire all required information about the owner of the context from UserGroupInformation class and will use it for every request. We can check the result directly in HDFS:

[root@node1 ~]# hdfs dfs -ls /user/test1
-rw-r–r– 3 alexservice hdfs 0 2017-06-15 11:06 /user/newfile

Advice for C#: RESTful APIs are your best friends

Generally Hadoop ecosystem has been created by means of languages and packages which were outside the scope of Microsoft stack. Things are becoming even more complicated when you start trying to access the Kerberized services of the cluster using C# code. The most easiest option for implementation of such sort of interactions would be doing them through HTTP protocol. Unfortunately not every Hadoop service provides web API for accessing its content. For these cases you have to either operate on the low socket level and use such .NET classes like NegotiateStream or you have to apply for some third-party tools like MIT Kerberos client in combination with specific data provider applications like ODBC Connectors. In web-based approach you can relay on the existing packages for working with Web resources like System.Web and classes like WebHttpRequest. This class allows to accept credentials and use them to impersonate your requests. Behind the scene it will automatically establish trusted context with the server side using Negotiate method and will transfer the proper token to it. I will not concentrate on the examples of how you can create REST web clients as you can find lots of them in the internet. Just want to point out that you will have to use user principle accounts for this purpose as these classes do not support the keytabs-based functionality described in Java example.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s