Choosing Kerberos approach for Hadoop cluster in an enterprise environment

Factors to consider before choosing an approach for Kerberos implementation within an enterprise.

Article

Choosing an approach for Kerberos implementation on Hadoop cluster is critical from a long term maintenance point. Enterprises have their own security policies and guidelines and a successful kerberos implementation needs to adhere to enterprise security architecture. There are multiple guides available on how to implement Kerberos but I couldn't find information on which approach to choose and Pros and Cons associated with each approach.
In a Hortonworks Hadoop cluster, there are 3 different ways of generating keytabs and principals and managing them.
a. Use an MIT KDC specific to Hadoop cluster - automated keytab management using Ambari
KDC specific to Hadoop cluster can be installed and maintained on one of the Hadoop nodes. All users/keytabs required for kerberos implementation are automatically managed using Ambari.
Pros:
  • Enterprise security teams not involved with KDC setup. Hadoop administrators have complete control of KDC installation.
  • Automated keytab management using Ambari. No need to manually manage any keytabs during cluster configuration changes or cluster topology changes.
  • Non-expiring keytabs can be generated for developers and distributed to hadoop developers. Developers can have a copy of keytabs attached to their own id.
  • One way trust can be set up so enterprise Active Directory can recognize hadoop users.
Cons:
  • May be against enterprise security policies.
  • Hadoop administrators have additional responsibility of managing KDC. Any security vulnerabilities will be responsibility of Hadoop administrators.
  • Ensuring KDC is setup for high availability and Disaster Recovery is responsibility of Hadoop administrators.
  • Requires manual keytab generation for any developers. For any new developers, new keytabs need to be generated and distributed by hadoop administrators.
  • Need to setup procedures for loss of keytabs.
b. Use an existing Enterprise Active Directory - Manual setup
An alternative to having local KDC for hadoop cluster is to manually generate usernames and principals required for kerberos using Ambari and then use corporate AD to create these users.
Pros:
  • Meets enterprise security standards by leveraging existing corporate AD infrastructure.
  • Developers are part of existing AD and no keytabs generations are required for developers.
Cons:
  • Manually managing keytabs in a large cluster becomes tedious and difficult to maintain with continous changes to cluster structure.
  • Any changes in Hadoop cluster structure (add/delete node, add/delete service on new node) require new keytabs to be generated and distributed
c. Using existing Enterprise AD with automated management using Ambari
In this approach a new OU unit is created in enterprise AD and an AD account is created with complete administrative privilege on new OU. This account and OU are then used during automated setup in Ambari. This allows Ambari to automatically manage all keytabs/principal generation and keytab distribution. OU maintains all keytabs and principals for hadoop internal users required for kerberos functionality.
Pros:
  • Satisfies corporate security policies. Since complete auditing of users creation/maintenance is available within AD.
  • All developers and users are part of enterprise AD and a kerberos ticket is already issued to them. Existing tickets are used for any communication with Kerberos cluster.
  • Backup, High availability and other administrative tasks for KDC are taken care by enterprise AD teams managing AD.
  • Separate OU within AD ensures hadoop internal users are not mixed with other users in AD.
  • Any existing Active Directory groups are available in Ranger to implement security policies.
  • Automated management of all hadoop internal users for keytab generation/distribution.
  • Changes to cluster topology configuration are handled by Ambari.
Cons:
  • Any manual service users ( with non-expiring passwords ) for hadoop cluster need to be added to Active Directory manually and keytab distributed manually. ( May require service requests to generate new id and keytabs to other enterprise groups )
  • Developers do not have access to keytabs associated with their own ids. Keytabs associated to developer ids are invalidated due to password change policy rules ( Password expiration after certain number of days). Developers can use ticket associated to their id by Active Directory.
  • Some JAVA applications/tools require copy of keytab files. It may be difficult to find workaround to use cached tickets with these applications/tools.
This is a prelim guide based on my experience with implementing Kerberos. Any other suggestions/ideas are welcome.

Comments

  1. Thanks for sharing the information very useful about hadoop and keep updating

    us, Please........
    http://www.nareshit.com/course/hadoop-online-training...

    ReplyDelete
  2. There are lots of information about latest technology and how to get trained in them, like this have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies. By the way you are running a great blog. Thanks for sharing this.

    Hadoop Training in Chennai

    Base SAS Training in Chennai

    ReplyDelete
  3. I am expecting more interesting topics from you. And this was nice content and definitely it will be useful for many people.

    Digital Marketing Training in Chennai

    Hadoop Training in Chennai

    ReplyDelete

  4. I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

    ReplyDelete
  5. It is really a great and useful piece of info. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thank you for sharing.

    MSBI Training in Chennai

    Informatica Training in Chennai

    ReplyDelete
  6. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  7. good explaination about hadoop and map reduce ,
    i found more resources where you can find tested source code of map reduce programs


    refere this

    top 10 map reduce program sources code : https://goo.gl/mZkDX7

    top 10 Read Write fs program using java api : https://goo.gl/GTgb8U

    top 30 hadoop shell commands : https://goo.gl/ZLbNMj

    ReplyDelete
  8. Informative post about hadoop, i am looking forward for realtime hadoop online training institute.

    ReplyDelete
  9. This information you provided in the blog that is really unique I love it!! Thanks for sharing such a great blog. Keep posting..
    Hadoop training
    Hadoop Course
    Hadoop training institute

    ReplyDelete
  10. This is post is very good. Its very useful.

    Big Data Hadoop Training in electronic city, Bangalore | #Big Data Hadoop Training in electronic city, Bangalore

    ReplyDelete
  11. The best explanation given very useful
    Hadoop is the most powerful keyword plenty of opportunities are there
    There are number of professionals trained in Hadoop. So it’s easy to grab a job with big companies
    Improve your career prospects by exploring your career path.
    Hadoop training in Hyderabad

    ReplyDelete

Post a Comment

Popular posts from this blog

Hive Indexing

HIVE Sorting and Join

Sqoop with Postgresql