1) Download Talend Open Studio for Big Data and create a job
Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":
3) Configure the components
Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
- Distribution: Hortonworks
- Version: HDP V2.5.0
- Host: localhost
- Database: default
- Select "Use Kerberos Authentication"
- Hive Principal: email@example.com
- Namenode Principal: firstname.lastname@example.org
- Resource Manager Principal: email@example.com
- Select "Use a keytab to authenticate"
- Principal: alice
- Keytab: Path to "alice.keytab" in the Kerby test project.
- Unselect "Set Resource Manager"
- Set Namenode URI: "hdfs://localhost:9000"
Now click on "tHiveInput" and select the following configuration options:
- Select "Use an existing Connection"
- Choose the tHiveConnection name from the resulting "Component List".
- Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int.
- Table name: words
- Query: "select * from words where word == 'Dare'"
Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":