Hive connector security configuration#
Overview#
The Hive connector supports both authentication and authorization.
Trino can impersonate the end user who is running a query. In the case of a
user running a query from the command line interface, the end user is the
username associated with the Trino CLI process or argument to the optional
--user
option.
Authentication can be configured with or without user impersonation on Kerberized Hadoop clusters.
Requirements#
End user authentication limited to Kerberized Hadoop clusters. Authentication user impersonation is available for both Kerberized and non-Kerberized clusters.
You must ensure that you meet the Kerberos, user impersonation and keytab requirements described in this section that apply to your configuration.
Kerberos#
In order to use the Hive connector with a Hadoop cluster that uses kerberos
authentication, you must configure the connector to work with two services on
the Hadoop cluster:
The Hive metastore Thrift service
The Hadoop Distributed File System (HDFS)
Access to these services by the Hive connector is configured in the properties file that contains the general Hive connector configuration.
Kerberos authentication by ticket cache is not yet supported.
Note
If your krb5.conf
location is different from /etc/krb5.conf
you
must set it explicitly using the java.security.krb5.conf
JVM property
in jvm.config
file.
Example: -Djava.security.krb5.conf=/example/path/krb5.conf
.
Warning
Access to the Trino coordinator must be secured e.g., using Kerberos or password authentication, when using Kerberos authentication to Hadoop services. Failure to secure access to the Trino coordinator could result in unauthorized access to sensitive data on the Hadoop cluster. Refer to Security for further information.
See Kerberos authentication for information on setting up Kerberos authentication.
Keytab files#
Keytab files contain encryption keys that are used to authenticate principals to the Kerberos KDC. These encryption keys must be stored securely; you must take the same precautions to protect them that you take to protect ssh private keys.
In particular, access to keytab files must be limited to only the accounts that must use them to authenticate. In practice, this is the user that the Trino process runs as. The ownership and permissions on keytab files must be set to prevent other users from reading or modifying the files.
Keytab files must be distributed to every node running Trino. Under common deployment situations, the Hive connector configuration is the same on all nodes. This means that the keytab needs to be in the same location on every node.
You must ensure that the keytab files have the correct permissions on every node after distributing them.
Impersonation in Hadoop#
In order to use impersonation, the Hadoop cluster must be
configured to allow the user or principal that Trino is running as to
impersonate the users who log in to Trino. Impersonation in Hadoop is
configured in the file core-site.xml
. A complete description of the
configuration options can be found in the Hadoop documentation.
Authentication#
The default security configuration of the Hive connector does not use authentication when connecting to a Hadoop cluster. All queries are executed as the user who runs the Trino process, regardless of which user submits the query.
The Hive connector provides additional security options to support Hadoop clusters that have been configured to use Kerberos.
When accessing HDFS, Trino can impersonate the end user who is running the query. This can be used with HDFS permissions and ACLs to provide additional security for data.
Hive metastore Thrift service authentication#
In a Kerberized Hadoop cluster, Trino connects to the Hive metastore Thrift service using SASL and authenticates using Kerberos. Kerberos authentication for the metastore is configured in the connector’s properties file using the following optional properties:
Property value |
Description |
Default |
---|---|---|
|
Hive metastore authentication type. One of When set to |
|
|
Enable Hive metastore end user impersonation. See KERBEROS authentication with impersonation for more information. |
|
|
The Kerberos principal of the Hive metastore service. The coordinator uses this to authenticate the Hive metastore. The Example: |
|
|
The Kerberos principal that Trino uses when connecting to the Hive metastore service. Example: The Unless KERBEROS authentication with impersonation is enabled, the principal
specified by Warning: If the principal does have sufficient permissions, only the metadata is removed, and the data continues to consume disk space. This occurs because the Hive metastore is responsible for deleting the internal table data. When the metastore is configured to use Kerberos authentication, all of the HDFS operations performed by the metastore are impersonated. Errors deleting data are silently ignored. |
|
|
The path to the keytab file that contains a key for the principal
specified by |
Configuration examples#
The following sections describe the configuration properties and values needed for the various authentication configurations needed to use the Hive metastore Thrift service with the Hive connector.
Default NONE
authentication without impersonation#
hive.metastore.authentication.type=NONE
The default authentication type for the Hive metastore is NONE
. When the
authentication type is NONE
, Trino connects to an unsecured Hive
metastore. Kerberos is not used.
KERBEROS
authentication with impersonation#
hive.metastore.authentication.type=KERBEROS
hive.metastore.thrift.impersonation.enabled=true
hive.metastore.service.principal=hive/hive-metastore-host.example.com@EXAMPLE.COM
hive.metastore.client.principal=trino@EXAMPLE.COM
hive.metastore.client.keytab=/etc/trino/hive.keytab
When the authentication type for the Hive metastore Thrift service is
KERBEROS
, Trino connects as the Kerberos principal specified by the
property hive.metastore.client.principal
. Trino authenticates this
principal using the keytab specified by the hive.metastore.client.keytab
property, and verifies that the identity of the metastore matches
hive.metastore.service.principal
.
When using KERBEROS
Metastore authentication with impersonation, the
principal specified by the hive.metastore.client.principal
property must be
allowed to impersonate the current Trino user, as discussed in the section
Impersonation in Hadoop.
Keytab files must be distributed to every node in the cluster that runs Trino.
HDFS authentication#
In a Kerberized Hadoop cluster, Trino authenticates to HDFS using Kerberos. Kerberos authentication for HDFS is configured in the connector’s properties file using the following optional properties:
Property value |
Description |
Default |
---|---|---|
|
HDFS authentication type; one of When set to |
|
|
Enable HDFS end-user impersonation. Impersonating the end user can provide additional security when accessing HDFS if HDFS permissions or ACLs are used. HDFS Permissions and ACLs are explained in the HDFS Permissions Guide. |
|
|
The Kerberos principal Trino uses when connecting to HDFS. Example: The |
|
|
The path to the keytab file that contains a key for the principal specified
by |
|
|
Enable HDFS wire encryption. In a Kerberized Hadoop cluster that uses HDFS
wire encryption, this must be set to |
Configuration examples#
The following sections describe the configuration properties and values needed for the various authentication configurations with HDFS and the Hive connector.
Default NONE
authentication without impersonation#
hive.hdfs.authentication.type=NONE
The default authentication type for HDFS is NONE
. When the authentication
type is NONE
, Trino connects to HDFS using Hadoop’s simple authentication
mechanism. Kerberos is not used.
NONE
authentication with impersonation#
hive.hdfs.authentication.type=NONE
hive.hdfs.impersonation.enabled=true
When using NONE
authentication with impersonation, Trino impersonates
the user who is running the query when accessing HDFS. The user Trino is
running as must be allowed to impersonate this user, as discussed in the
section Impersonation in Hadoop. Kerberos is not used.
KERBEROS
authentication without impersonation#
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.trino.principal=hdfs@EXAMPLE.COM
hive.hdfs.trino.keytab=/etc/trino/hdfs.keytab
When the authentication type is KERBEROS
, Trino accesses HDFS as the
principal specified by the hive.hdfs.trino.principal
property. Trino
authenticates this principal using the keytab specified by the
hive.hdfs.trino.keytab
keytab.
Keytab files must be distributed to every node in the cluster that runs Trino.
KERBEROS
authentication with impersonation#
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=true
hive.hdfs.trino.principal=trino@EXAMPLE.COM
hive.hdfs.trino.keytab=/etc/trino/hdfs.keytab
When using KERBEROS
authentication with impersonation, Trino impersonates
the user who is running the query when accessing HDFS. The principal
specified by the hive.hdfs.trino.principal
property must be allowed to
impersonate the current Trino user, as discussed in the section
Impersonation in Hadoop. Trino authenticates
hive.hdfs.trino.principal
using the keytab specified by
hive.hdfs.trino.keytab
.
Keytab files must be distributed to every node in the cluster that runs Trino.