Friday, September 15, 2017

Securing Apache Hive - part V

This is the fifth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In this post we will look at an alternative authorization solution called Apache Sentry.

1) Build the Apache Sentry distribution

First we will build and install the Apache Sentry distribution. Download Apache Sentry (1.8.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract and build the source and copy the distribution to a location where you wish to install it:
  • tar zxvf apache-sentry-1.8.0-src.tar.gz
  • cd apache-sentry-1.8.0-src
  • mvn clean install -DskipTests
  • cp -r sentry-dist/target/apache-sentry-1.8.0-bin ${sentry.home}
I previously covered the authorization plugin that Apache Sentry provides for Apache Kafka. In addition, Apache Sentry provides an authorization plugin for Apache Hive. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file locally to the Hive Server. Therefore we don't need to do any further configuration to the distribution at this point.

2) Install and configure Apache Hive

Please follow the first tutorial to install and configure Apache Hadoop if you have not already done so. Apache Sentry 1.8.0 does not support Apache Hive 2.1.x, so we will need to download and extract Apache Hive 2.0.1. Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Then follow the steps as outlined in the first tutorial to create the table in Hive and make sure that a query is successful.

3) Integrate Apache Sentry with Apache Hive

Now we will integrate Apache Sentry with Apache Hive. We need to add three new configuration files to the "conf" directory of Apache Hive.

3.a) Configure Apache Hive to use authorization

Create a file called 'conf/hiveserver2-site.xml' with the content:
Here we are enabling authorization and adding the Sentry authorization plugin.

3.b) Add Sentry plugin configuration

Create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:
This is the configuration file for the Sentry plugin for Hive. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true".

3.c) Add the authorization privileges for our test-case

Next, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:
Here we are granting the user "alice" a role which allows her to perform a "select" on the table "words".

3.d) Add Sentry libraries to Hive

Finally, we need to add the Sentry libraries to Hive. Copy the following files from ${sentry.home}/lib  to ${hive.home}/lib:
  • sentry-binding-hive-common-1.8.0.jar
  • sentry-core-model-db-1.8.0.jar
  • sentry*provider*.jar
  • sentry-core-common-1.8.0.jar
  • shiro-core-1.2.3.jar
  • sentry-policy*.jar
  • sentry-service-*.jar
In addition we need the "sentry-binding-hive-v2-1.8.0.jar" which is not bundled with the Apache Sentry distribution. This can be obtained from "http://repo1.maven.org/maven2/org/apache/sentry/sentry-binding-hive-v2/1.8.0/sentry-binding-hive-v2-1.8.0.jar" instead.

4) Test authorization with Apache Hive

Now we can test authorization after restarting Apache Hive. The user 'alice' can query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)
However, the user 'bob' is denied access:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
  • select * from words where word == 'Dare'; (fails)

1 comment:

  1. very informative blog and useful article thank you for sharing with us , keep postingBig data hadoop online Course

    ReplyDelete