Wednesday, February 14, 2018

The Apache Sentry security service - part II

This is the second in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The next logical question is how can we can define the authorization privileges held in the Sentry security service. In this post we will briefly cover what those privileges look like, and how we can query them using two different tools that ship with the Apache Sentry distribution.

1) Apache Sentry privileges

The Apache Sentry docker image we covered in the previous tutorial ships with a 'sentry.ini' configuration file (see here) that is used to retrieve the groups associated with a given user. A user must be a member of the "admin" group to invoke on the Apache Sentry security service, as configured in 'sentry-site.xml' (see here).  To avoid confusion, 'sentry.ini' also contains "[groups]" and "[roles]" sections, but these are not used by the Sentry security service.

In Apache Sentry, a user is associated with one or more groups, which in turn are associated with one or more roles, which in turn are associated with one or more privileges. Privileges are made up of a number of different components that vary slightly depending on what service the privilege is associated with (e.g. Hive, Kafka, etc.). For example:
  • Host=*->Topic=test->action=ALL - This Kafka privilege grants all actions on the "test" topic on all hosts.
  • Collection=logs->action=* - This Solr privilege grants all actions on the "logs" collection.
  • Server=sqoopServer1->Connector=c1->action=* - This Sqoop privilege grants all actions on the "c1" connector on the "sqoopServer1" server.
  • Server=server1->Db=default->Table=words->Column=count->action=select - This Hive privilege grants the "select" action on the "count" column of the "words" table in the "default" database on the "server1" server.
For more information on the Apache sentry privilege model please consult the official wiki.

2) Querying the Apache Sentry security service using 'sentryShell'

Follow the steps outlined in the previous tutorial to get the Apache Sentry security service up and running using either the docker image or by setting it up manually. The Apache Sentry distribution ships with a "sentryShell" command line tool that we can use to query that Apache Sentry security service. So depending on which approach you followed to install Sentry, either go to the distribution or else log into the docker container.

We can query the roles, groups and privileges via:
  • bin/sentryShell -conf sentry-site.xml -lr
  • bin/sentryShell -conf sentry-site.xml -lg
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role
We can create a "admin_role" role and add it to the "admin" group via:
  • bin/sentryShell -conf sentry-site.xml -cr -r admin_role
  • bin/sentryShell -conf sentry-site.xml -arg -g admin -r admin_role
We can grant a (Hive) privilege to the "admin_role" role as follows:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Server=*->action=ALL"
If we are adding a privilege for anything other than Apache Hive, we need to explicitly specify the "type", e.g.:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Host=*->Cluster=kafka-cluster->action=ALL" -t kafka
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role -t kafka
3) Querying the Apache Sentry security service using 'sentryCli'

A rather more user-friendly alternative to the 'sentryShell' is available in Apache Sentry 2.0.0. The 'sentryCli' can be started with 'bin/sentryCli'. Typing ?l lists the available commands:

The Apache Sentry security service can be queried using any of these commands.

Monday, February 12, 2018

The Apache Sentry security service - part I

Apache Sentry is a role-based authorization solution for a number of big-data projects. I have previously blogged about how to install the authorization plugin to secure various deployments, e.g.:
For all of these tutorials, the authorization privileges were stored in a configuration file local to the deployment. However this is just a "test configuration" to get simple examples up and running quickly. For production scenarios, Apache Sentry offers a central security service, which stores the user roles and privileges in a database, and provides an RPC service that the Sentry authorization plugins can invoke on. In this article, we will show how to set up the Apache Sentry security service in a couple of different ways.

1) Installing the Apache Sentry security service manually

Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}. In addition, download a compatible version of Apache Hadoop (2.7.5 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Hadoop distribution.

First we need to specify two configuration files, "sentry-site.xml" which contains the Sentry configuration, and "sentry.ini" which defines the user/group information for the user who will be invoking on the Sentry security service. You can download sample configuration files here. Copy these files to the root directory of "${sentry.home}". Edit the 'sentry.ini' file and replace 'user' with the user who will be invoking on the security service (such as "kafka" or "solr"). The other entries will be ignored - 'sentry-site.xml' defines that a user must belong to the "admin" group to invoke on the security service successfully.

Finally configure the database and start the Apache Sentry security service via:
  • bin/sentry --command schema-tool --conffile sentry-site.xml --dbType derby --initSchema
  • bin/sentry --command service -c sentry-site.xml
2) Installing the Apache Sentry security service via docker

Instead of having to download and configure Apache Sentry and Hadoop, a simpler way to get started is to download a pre-made docker image that I created. The DockerFile is available here and the docker image is available here. Note that this docker image is only for testing use, as the security service is not secured with kerberos and it uses the default credentials. Download and run the docker image with:
  • docker pull coheigea/sentry
  • docker run -p 8038:8038 coheigea/sentry
Once the container has started, we need to update the 'sentry.ini' file with the username that we are going to use to invoke on the Apache Sentry security service. Get the "id" of the running container via "docker ps" and then run "docker exec -it <id> bash". Edit 'sentry.ini' and change 'user' to the username you are using.

In the next tutorial we will look at how to manually invoke on the security service.

Thursday, February 8, 2018

Securing Apache Sqoop - part III

This is the third and final post about securing Apache Sqoop. The first post looked at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. The second post showed how to secure Apache Sqoop with Apache Ranger. In this post we will look at an alternative way of implementing authorization in Apache Sqoop, namely using Apache Sentry.

1) Install the Apache Sentry Sqoop plugin

If you have not done so already, please follow the steps in the earlier tutorial to set up Apache Sqoop. Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}.

a) Configure sqoop.properties

We need to configure Apache Sqoop to use Apache Sentry for authorization. Edit 'conf/sqoop.properties' and add the following properties:
  • org.apache.sqoop.security.authentication.type=SIMPLE
  • org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
  • org.apache.sqoop.security.authorization.handler=org.apache.sentry.sqoop.authz.SentryAuthorizationHandler
  • org.apache.sqoop.security.authorization.access_controller=org.apache.sentry.sqoop.authz.SentryAccessController
  • org.apache.sqoop.security.authorization.validator=org.apache.sentry.sqoop.authz.SentryAuthorizationValidator
  • org.apache.sqoop.security.authorization.server_name=SqoopServer1
  • sentry.sqoop.site.url=file:./conf/sentry-site.xml
In addition, we need to add some of the Sentry jars to the Sqoop classpath. Add the following property to 'conf/sqoop.properties', substituting the value for "${sentry.home}":
  • org.apache.sqoop.classpath.extra=${sentry.home}/lib/sentry-binding-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-core-common-2.0.0.jar:${sentry.home}/lib/sentry-core-model-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-provider-file-2.0.0.jar:${sentry.home}/lib/sentry-provider-common-2.0.0.jar:${sentry.home}/lib/sentry-provider-db-2.0.0.jar:${sentry.home}/lib/shiro-core-1.4.0.jar:${sentry.home}/lib/sentry-policy-engine-2.0.0.jar:${sentry.home}/lib/sentry-policy-common-2.0.0.jar

b) Add Apache Sentry configuration files

Next we will configure the Apache Sentry authorization plugin. Create a new file in the Sqoop "conf" directory called "sentry-site.xml" with the following content (substituting the correct directory for "sentry.sqoop.provider.resource"):

It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. Finally, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content, substituting "colm" for the name of the user running the Sqoop shell:

2) Test authorization 

Now start Apache Sqoop ("bin/sqoop2-server start") and start the shell ("bin/sqoop2-shell"). "show connector" should list the full range of Sqoop Connectors, as authorization has succeeded. To test that authorization is correctly disabling access for unauthorized users, change the "ALL" permission in 'conf/sentry.ini' to "WRITE", and restart the server and shell. This time access is not granted and a blank list should be returned for "show connector".