Monday, February 6, 2017

Securing Apache Atlas using Apache Ranger

Apache Atlas, currently in the Apache Incubator, is a data governance and metadata framework for Apache Hadoop. It allows you to import data from a backend such as Apache Hive or Apache Falcon, and to classify and tag the data according to a set of business rules. In this tutorial we will show how to to use Apache Ranger to create authorization policies to secure access to Apache Atlas.

1) Set up Apache Atlas

First let's look at setting up Apache Atlas. Download the latest released version (0.7.1-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.7.1-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Before starting Atlas, for testing purposes let's add a new user called 'alice' in the group 'DATA_SCIENTIST' with password 'password'. Edit 'conf/users-credentials.properties' and add:
  • alice=DATA_SCIENTIST::5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
Now let's start Apache Atlas with 'bin/atlas_start.py'. The Apache Atlas web service can be explored via 'http://localhost:21000/'. To populate some sample data in Apache Atlas, run the command 'bin/quick_start.py' (using credentials admin/admin). To see all traits/tags that have been created, use Curl as follows:
  • curl -u alice:password http://localhost:21000/api/atlas/types?type=TRAIT
2) Install the Apache Ranger Atlas plugin

To use Apache Ranger to secure Apache Atlas, the next step we need to do is to configure and install the Apache Ranger Atlas plugin. Follow the steps in an earlier tutorial to build Apache Ranger and to setup and start the Apache Ranger Admin service. I recommend to use the latest SNAPSHOT of Ranger (0.7.0-SNAPSHOT at this time) as there are some bugs fixed in relation to Atlas support since the 0.6.x release. Once this is done, go back to the Apache Ranger distribution that you have built and extract the atlas plugin:
  • tar zxvf target/ranger-0.7.0-SNAPSHOT-atlas-plugin.tar.gz
 Edit 'install.properties' with the following changes:
  • POLICY_MGR_URL=http://localhost:6080
  • Specify location for SQL_CONNECTOR_JAR 
  • Specify REPOSITORY_NAME (AtlasTest)
  • COMPONENT_INSTALL_DIR_NAME pointing to your Atlas install
Now install the plugin via 'sudo ./enable-atlas-plugin.sh'. If you see an error about "libext" then create a new empty directory called "libext" in the Atlas distribution and try again. Note that the ranger plugin will try to store policies by default in "/etc/ranger/AtlasTest/policycache". As we installed the plugin as "root" make sure that this directory is accessible to the user that is running Apache Atlas. Now restart Apache Atlas to enable the Ranger plugin.

3) Creating authorization policies for Atlas in the Ranger Admin Service

Now that we have set up Apache Atlas to use Apache Ranger for authorization, what remains is to start the Apache Ranger Admin Service and to create some authorization policies. Start Apache Ranger ('sudo ranger-admin start'). Log in to 'http://localhost:6080/' (credentials admin/admin). Click on the "+" button for Atlas, and specify the following fields:
  • Service Name: AtlasTest
  • Username: admin
  • Password: admin
  • atlas.rest.address: http://localhost:21000
Click on "Test Connection" to make sure that we can communicate successfully with Apache Atlas and then "Add". Click on the new link for "AtlasTest". Let's see if our new user "alice" is authorized to read the tags in Atlas. Execute the Curl command defined above (allowing 30 seconds for the Ranger plugin to pull the policies from the Ranger Admin Service). You should see a 403 Forbidden message from Atlas.

Now let's update the authorization policies to allow "alice" access to reading the tags. Back in Apache Ranger, click on "Settings" and then "Users/Groups" and "Groups". Click on "Add new group" and enter "DATA_SCIENTIST" for the name. Now go back into "AtlasTest", and edit the policy called "all - type". Create a new "Allow Condition" for the group "DATA_SCIENTIST" with permission "read" and click "Save". After waiting some time for the policies to sync, try again with the "Curl" command and it should work.


1 comment:

  1. In near future, big data handling and processing is going to the future of IT industry. Thus taking Hadoop Training in Chennai | Big Data Training in Chennai will prove beneficial for talented professionals.

    ReplyDelete