Accessing Kerberized Hadoop cluster using Ranger security policies and native APIs

Security always played important role in every informational system. For each particular solution before actual implementation we first need to carefully design its protection layers by means of different techniques like authorization, authentication or encryption.  But sometimes at the early start developers don’t pay much attention to this topic and concentrate their efforts on the functional aspects of their applications. This is what has happened to Hadoop.

In my previous article about security in world of Big Data I’ve already given a high-level overview of this model. Now I want to share some experience about how to work with Hadoop services on the low level straight from the source code. We will create new principle in Hadoop environment, then we will give him permissions in Ranger and will use him from java application to access the services remotely.

Continue reading “Accessing Kerberized Hadoop cluster using Ranger security policies and native APIs”

Hadoop security overview

If you ask me what is the post complicated part of Hadoop configuration, I will say that it is security. From early start of development of this product the main efforts were focused on making a stable distributed framework and security was not the priority of that time. The base assumption was that system would work as a part of some trusted network environment and simple security model would be sufficient to cover the requirements of that period. But by the time Hadoop evolved and the problems of more complicated security challenges started to play more and more important role. Especially is became a sharp question once Big Data started to drive into the side cloud computing. So the integration of Kerberos protocol became the first serious step made in this direction. After authentication part logically community started to solve the problems related to authorization. According to basic security model most part of the services worked with custom Access Control Lists (ACL) and the general idea was to localize their management in a single place. Cloudera invented Senrty product and HortonWorks proposed alternative in view of Ranger application. Later on security components were improved with other features like support of encryption, protection of RESTful endpoints, integration with Active Directory and other. In this article I want to give the general overview of primary parts of Hadoop security model.

Continue reading “Hadoop security overview”