Skip to content

Commit d797c18

Browse files
maltesanderdervoetilfrancke
authored
feat: Enable opa authorization (#652)
* remove obsolete errors * compiling - without tls * oparized smoke test working * add hive opa example * add opa testing to smoke * adapted changelog * enable tls * add opa-use-tls dimension * remove left over trino references * started docs * formatting * pre commit * add opa opeartor to test suite * Update docs/modules/hive/pages/usage-guide/security.adoc Co-authored-by: Lukas Krug <lukas.krug@stackable.tech> * add missing docs link * review feedback * fix broken crd docs url * regenerate charts * add 4.2.0 to tests and supported versions * Use "org.apache.derby.iapi.jdbc.AutoloadedDriver" for derby in 4.2.0 * fix pre commit * add document start * Apply suggestions from code review Co-authored-by: Lars Francke <jujutsu@lars-francke.de> * fix precommit --------- Co-authored-by: Lukas Krug <lukas.krug@stackable.tech> Co-authored-by: Lars Francke <jujutsu@lars-francke.de>
1 parent b3901b6 commit d797c18

File tree

22 files changed

+740
-78
lines changed

22 files changed

+740
-78
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- Add OPA authorization using the operator-rs `OpaConfig` ([#652]).
10+
11+
[#652]: https://github.com/stackabletech/hive-operator/pull/652
12+
713
## [25.11.0] - 2025-11-07
814

915
## [25.11.0-rc1] - 2025-11-06

deploy/config-spec/properties.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
---
12
version: 0.1.0
23
spec:
34
units:
@@ -84,7 +85,10 @@ properties:
8485
unit: *unitClassName
8586
roles:
8687
- name: "metastore"
87-
required: true
88+
# This property *is* required even though it is set to `false` here.
89+
# The required value changed in Derby in version 4.2.0.
90+
# We now set this value depending on the Hive version (and its corresponding Derby version) and add it to the ConfigMap programatically.
91+
required: false
8892
asOfVersion: "0.0.0"
8993
- property:
9094
propertyNames:

deploy/helm/hive-operator/configs/properties.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
---
12
version: 0.1.0
23
spec:
34
units:
@@ -84,7 +85,10 @@ properties:
8485
unit: *unitClassName
8586
roles:
8687
- name: "metastore"
87-
required: true
88+
# This property *is* required even though it is set to `false` here.
89+
# The required value changed in Derby in version 4.2.0.
90+
# We now set this value depending on the Hive version (and its corresponding Derby version) and add it to the ConfigMap programatically.
91+
required: false
8892
asOfVersion: "0.0.0"
8993
- property:
9094
propertyNames:

deploy/helm/hive-operator/crds/crds.yaml

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ spec:
3434
The settings in the `clusterConfig` are cluster wide settings that do not need to be configurable at role or role group level.
3535
properties:
3636
authentication:
37-
description: Settings related to user [authentication](https://docs.stackable.tech/home/nightly/usage-guide/security).
37+
description: Settings related to user [authentication](https://docs.stackable.tech/home/nightly/hive/usage-guide/security).
3838
nullable: true
3939
properties:
4040
kerberos:
@@ -49,6 +49,33 @@ spec:
4949
required:
5050
- kerberos
5151
type: object
52+
authorization:
53+
description: |-
54+
Authorization options for Hive.
55+
Learn more in the [Hive authorization usage guide](https://docs.stackable.tech/home/nightly/hive/usage-guide/security#authorization).
56+
nullable: true
57+
properties:
58+
opa:
59+
description: |-
60+
Configure the OPA stacklet [discovery ConfigMap](https://docs.stackable.tech/home/nightly/concepts/service_discovery)
61+
and the name of the Rego package containing your authorization rules.
62+
Consult the [OPA authorization documentation](https://docs.stackable.tech/home/nightly/concepts/opa)
63+
to learn how to deploy Rego authorization rules with OPA.
64+
nullable: true
65+
properties:
66+
configMapName:
67+
description: |-
68+
The [discovery ConfigMap](https://docs.stackable.tech/home/nightly/concepts/service_discovery)
69+
for the OPA stacklet that should be used for authorization requests.
70+
type: string
71+
package:
72+
description: The name of the Rego package containing the Rego rules for the product.
73+
nullable: true
74+
type: string
75+
required:
76+
- configMapName
77+
type: object
78+
type: object
5279
database:
5380
description: Database connection specification for the metadata database.
5481
properties:

docs/modules/hive/pages/usage-guide/security.adoc

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
= Security
22
:description: Secure Apache Hive with Kerberos authentication in Kubernetes. Configure Kerberos server, SecretClass, and access Hive securely with provided guides.
3+
:opa-rego-docs: https://www.openpolicyagent.org/docs/latest/#rego
34

45
== Authentication
56
Currently, the only supported authentication mechanism is Kerberos, which is disabled by default.
@@ -45,3 +46,118 @@ The `kerberos.secretClass` is used to give Hive the possibility to request keyta
4546
=== 5. Access Hive
4647
In case you want to access Hive it is recommended to start up a client Pod that connects to Hive, rather than shelling into the master.
4748
We have an https://github.com/stackabletech/hive-operator/blob/main/tests/templates/kuttl/kerberos/70-install-access-hive.yaml.j2[integration test] for this exact purpose, where you can see how to connect and get a valid keytab.
49+
50+
51+
== Authorization
52+
The Stackable Operator for Apache Hive supports the following authorization methods.
53+
54+
=== Open Policy Agent (OPA)
55+
The Apache Hive metastore can be configured to delegate authorization decisions to an Open Policy Agent (OPA) instance.
56+
More information on the setup and configuration of OPA can be found in the xref:opa:index.adoc[OPA Operator documentation].
57+
A Hive cluster can be configured using OPA authorization by adding this section to the configuration:
58+
59+
[source,yaml]
60+
----
61+
spec:
62+
clusterConfig:
63+
authorization:
64+
opa:
65+
configMapName: opa # <1>
66+
package: hms # <2>
67+
----
68+
<1> The name of your OPA Stacklet (`opa` in this case)
69+
<2> The rego rule package to use for policy decisions.
70+
This is optional and defaults to the name of the Hive Stacklet.
71+
72+
==== Defining rego rules
73+
For a general explanation of how rules are written, please refer to the {opa-rego-docs}[OPA documentation].
74+
Authorization with OPA is done using the https://github.com/boschglobal/hive-metastore-opa-authorizer[hive-metastore-opa-authorizer] plugin.
75+
76+
===== OPA Inputs
77+
The payload sent by Hive with each request to OPA, that is accessible within the rego rules, has the following structure:
78+
79+
[source,json]
80+
----
81+
{
82+
"identity": {
83+
"username": "<user>",
84+
"groups": ["<group1>", "<group2>"]
85+
},
86+
"resources": {
87+
"database": null,
88+
"table": null,
89+
"partition": null,
90+
"columns": ["col1", "col2"]
91+
},
92+
"privileges": {
93+
"readRequiredPriv": [],
94+
"writeRequiredPriv": [],
95+
"inputs": null,
96+
"outputs": null
97+
}
98+
}
99+
----
100+
* `identity`: Contains user information.
101+
** `username`: The name of the user.
102+
** `groups`: A list of groups the user belongs to.
103+
* `resources`: Specifies the resources involved in the request.
104+
** `database`: The database object.
105+
** `table`: The table object.
106+
** `partition`: The partition object.
107+
** `columns`: A list of column names involved in the request.
108+
* `privileges`: Details the privileges required for the request.
109+
** `readRequiredPriv`: A list of required read privileges.
110+
** `writeRequiredPriv`: A list of required write privileges.
111+
** `inputs`: Input tables for the request.
112+
** `outputs`: Output tables for the request.
113+
114+
===== Example OPA Rego Rule
115+
Below is a basic rego rule that demonstrates how to handle input dictionary sent from the hive authorizer to OPA:
116+
117+
[source,rego]
118+
----
119+
package hms
120+
121+
default database_allow = false
122+
default table_allow = false
123+
default column_allow = false
124+
default partition_allow = false
125+
default user_allow = false
126+
127+
database_allow if {
128+
input.identity.username == "stackable"
129+
input.resources.database.name == "test_db"
130+
}
131+
132+
table_allow if {
133+
input.identity.username == "stackable"
134+
input.resources.table.dbName == "test_db"
135+
input.resources.table.tableName == "test_table"
136+
input.privileges.readRequiredPriv[0].priv == "SELECT"
137+
}
138+
139+
table_allow if {
140+
input.identity.username == "stackable"
141+
input.resources.table.dbName == "test_db"
142+
input.privileges.writeRequiredPriv[0].priv == "CREATE"
143+
}
144+
----
145+
* `database_allow` grants access if the user is `stackable` and the database is `test_db`.
146+
* `table_allow` grants access if the user is `stackable`, the database is `test_db` and:
147+
** the table is `test_table` and the required read privilege is `SELECT`.
148+
** the required write privilege is `CREATE` without any table restriction.
149+
150+
==== Configuring policy URLs
151+
152+
The `database_allow`, `table_allow`, `column_allow`, `partition_allow`, and `user_allow` policy URLs can be xref:usage-guide/overrides.adoc#_configuration_properties[config overridden] using the properties in `hive-site.xml`:
153+
154+
* `com.bosch.bdps.opa.authorization.policy.url.database`
155+
* `com.bosch.bdps.opa.authorization.policy.url.table`
156+
* `com.bosch.bdps.opa.authorization.policy.url.column`
157+
* `com.bosch.bdps.opa.authorization.policy.url.partition`
158+
* `com.bosch.bdps.opa.authorization.policy.url.user`
159+
160+
==== TLS secured OPA cluster
161+
162+
Stackable OPA clusters secured via TLS are supported and no further configuration is required.
163+
The Stackable Hive operator automatically adds the certificate from the SecretClass used to secure the OPA cluster to its trust.

docs/modules/hive/partials/supported-versions.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
// This is a separate file, since it is used by both the direct Hive-Operator documentation, and the overarching
33
// Stackable Platform documentation.
44

5+
- 4.2.0 (experimental)
56
- 4.1.0 (experimental)
67
- 4.0.1 (LTS)
78
- 4.0.0 (deprecated)

examples/hive-opa-cluster.yaml

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# helm install postgresql oci://registry-1.docker.io/bitnamicharts/postgresql \
2+
# --version 16.5.0 \
3+
# --namespace default \
4+
# --set image.repository=bitnamilegacy/postgresql \
5+
# --set volumePermissions.image.repository=bitnamilegacy/os-shell \
6+
# --set metrics.image.repository=bitnamilegacy/postgres-exporter \
7+
# --set global.security.allowInsecureImages=true \
8+
# --set auth.username=hive \
9+
# --set auth.password=hive \
10+
# --set auth.database=hive \
11+
# --set primary.extendedConfiguration="password_encryption=md5" \
12+
# --wait
13+
---
14+
apiVersion: hive.stackable.tech/v1alpha1
15+
kind: HiveCluster
16+
metadata:
17+
name: hive
18+
spec:
19+
image:
20+
productVersion: 4.1.0
21+
pullPolicy: IfNotPresent
22+
clusterConfig:
23+
authorization:
24+
opa:
25+
configMapName: opa
26+
package: hms
27+
database:
28+
connString: jdbc:postgresql://postgresql:5432/hive
29+
credentialsSecret: hive-postgresql-credentials
30+
dbType: postgres
31+
metastore:
32+
roleGroups:
33+
default:
34+
replicas: 1
35+
config:
36+
resources:
37+
cpu:
38+
min: 300m
39+
max: "2"
40+
memory:
41+
limit: 5Gi
42+
---
43+
apiVersion: v1
44+
kind: Secret
45+
metadata:
46+
name: hive-postgresql-credentials
47+
type: Opaque
48+
stringData:
49+
username: hive
50+
password: hive
51+
---
52+
apiVersion: opa.stackable.tech/v1alpha1
53+
kind: OpaCluster
54+
metadata:
55+
name: opa
56+
spec:
57+
image:
58+
productVersion: 1.8.0
59+
servers:
60+
config:
61+
logging:
62+
enableVectorAgent: false
63+
containers:
64+
opa:
65+
console:
66+
level: INFO
67+
file:
68+
level: INFO
69+
loggers:
70+
decision:
71+
level: INFO
72+
roleGroups:
73+
default: {}
74+
---
75+
apiVersion: v1
76+
kind: ConfigMap
77+
metadata:
78+
name: hive-opa-bundle
79+
labels:
80+
opa.stackable.tech/bundle: "hms"
81+
data:
82+
hive.rego: |
83+
package hms
84+
85+
database_allow = true
86+
table_allow = true
87+
column_allow = true
88+
partition_allow = true
89+
user_allow = true

rust/operator-binary/src/command.rs

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
use stackable_operator::crd::s3;
22

3-
use crate::crd::{
4-
DB_PASSWORD_ENV, DB_PASSWORD_PLACEHOLDER, DB_USERNAME_ENV, DB_USERNAME_PLACEHOLDER,
5-
HIVE_METASTORE_LOG4J2_PROPERTIES, HIVE_SITE_XML, STACKABLE_CONFIG_DIR,
6-
STACKABLE_CONFIG_MOUNT_DIR, STACKABLE_LOG_CONFIG_MOUNT_DIR, STACKABLE_TRUST_STORE,
7-
STACKABLE_TRUST_STORE_PASSWORD, v1alpha1,
3+
use crate::{
4+
config::opa::HiveOpaConfig,
5+
crd::{
6+
DB_PASSWORD_ENV, DB_PASSWORD_PLACEHOLDER, DB_USERNAME_ENV, DB_USERNAME_PLACEHOLDER,
7+
HIVE_METASTORE_LOG4J2_PROPERTIES, HIVE_SITE_XML, STACKABLE_CONFIG_DIR,
8+
STACKABLE_CONFIG_MOUNT_DIR, STACKABLE_LOG_CONFIG_MOUNT_DIR, STACKABLE_TRUST_STORE,
9+
STACKABLE_TRUST_STORE_PASSWORD, v1alpha1,
10+
},
811
};
912

1013
pub fn build_container_command_args(
1114
hive: &v1alpha1::HiveCluster,
1215
start_command: String,
1316
s3_connection_spec: Option<&s3::v1alpha1::ConnectionSpec>,
17+
hive_opa_config: Option<&HiveOpaConfig>,
1418
) -> Vec<String> {
1519
let mut args = vec![
1620
// copy config files to a writeable empty folder in order to set s3 access and secret keys
@@ -51,6 +55,14 @@ pub fn build_container_command_args(
5155
}
5256
}
5357

58+
if let Some(opa) = hive_opa_config {
59+
if let Some(ca_cert_dir) = opa.tls_ca_cert_mount_path() {
60+
args.push(format!(
61+
"cert-tools generate-pkcs12-truststore --pkcs12 {STACKABLE_TRUST_STORE}:{STACKABLE_TRUST_STORE_PASSWORD} --pem {ca_cert_dir}/ca.crt --out {STACKABLE_TRUST_STORE} --out-password {STACKABLE_TRUST_STORE_PASSWORD}"
62+
));
63+
}
64+
}
65+
5466
// db credentials
5567
args.extend([
5668
format!("echo replacing {DB_USERNAME_PLACEHOLDER} and {DB_PASSWORD_PLACEHOLDER} with secret values."),
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
pub mod jvm;
2+
pub mod opa;

0 commit comments

Comments
 (0)