PostgreSQL as metadata storage, and replacing ZooKeeper with Kubernetes Extensions

Let’s continue the series of posts about Apache Druid. In the first part, we dealt with Druid itself – its architecture and monitoring, in the second part – we deployed the PostgreSQL cluster and configured its monitoring.

The following tasks:

Let’s start with PostgreSQL.

See PostgreSQL Metadata Store and Metadata Storage.

PostgreSQL users

Let’s return to the file manifests/minimal-master-replica-svcmonitor.yamlfrom which the PostgreSQL cluster was created – add a user druid and the base druid:

...
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
    druid:       
    - createdb
  databases:
    foo: zalando  # dbname: owner
    druid: druid
...

We update the cluster:

kubectl apply -f maniapplyminimal-master-replica-svcmonitor.yaml

We get the user’s password druid:

kubectl -n test-pg get secret druid.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o ‘jsonpath={.data.password}’ | base64 -d

Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY

We open the port to the PostgreSQL master:

kubectl -n test-pg port-forward acid-minimal-cluster-0 6432:5432

Forwarding from 127.0.0.1:6432 -> 5432

Forwarding from [::1]:6432 -> 5432

We connect:

psql -U druid -h localhost -p 6432

Password for user druid:

psql (14.5, server 13.7 (Ubuntu 13.7-1.pgdg18.04+1))

SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

Type “help” for help.

druid=>

We are checking the content of the database – it is empty so far:

druid-> \dt

Did not find any relations.

Apache Druid metadata.storage config

We use the same file druid-operator/examples/tiny-cluster.yamlfrom which the Apache Druid cluster was deployed (see Running a Druid Cluster).

Now we have in it the config for DerbyDB, which stores data on the local disk:

...
    druid.metadata.storage.type=derby
    druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
    druid.metadata.storage.connector.host=localhost
    druid.metadata.storage.connector.port=1527
    druid.metadata.storage.connector.createTables=true
...

For PostgreSQL, we also need to specify connectURIso we find its Kubernetes Service:

kubectl -n test-pg get svc

NAME                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE

acid-minimal-cluster                       ClusterIP   10.97.188.225           5432/TCP   14h

But we edit the manifest – delete or comment the terms from Derbi, and add a new config:

...
    # Extensions
    #
    druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
    # Metadata Store
    #druid.metadata.storage.type=derby
    #druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
    #druid.metadata.storage.connector.host=localhost
    #druid.metadata.storage.connector.port=1527
    #druid.metadata.storage.connector.createTables=true

    druid.metadata.storage.type=postgresql
    druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.test-pg.svc.cluster.local/druid
    druid.metadata.storage.connector.user=druid
    druid.metadata.storage.connector.password=Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
    druid.metadata.storage.connector.createTables=true
...

We update the Druid cluster:

kubectl -n druid apply -f examples/tiny-cluster.yaml

Let’s check the data in the Postgre database:

druid-> \dt

List of relations

Schema |         Name          | Type  | Owner

——–+———————–+——-+——-

public | druid_audit           | table | druid

public | druid_config          | table | druid

public | druid_datasource      | table | druid

public | druid_pendingsegments | table | druid

public | druid_rules           | table | druid

public | druid_segments        | table | druid

public | druid_supervisors     | table | druid

Nice!

If you need to migrate data from Derby to Postgre, see Metadata Migration.

Next, we will get rid of the need for ZooKeeper.

Module documentation – тут>>>.

Let’s return to druid-operator/examples/tiny-cluster.yamland update the config – disable ZooKeeper, add a new extension druid-kubernetes-extensions and additional parameters:

...
    druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
    ...
    druid.zk.service.enabled=false
    druid.serverview.type=http
    druid.coordinator.loadqueuepeon.type=http
    druid.indexer.runner.type=httpRemote
    druid.discovery.type=k8s

    # Zookeeper
    #druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk
    #druid.zk.paths.base=/druid
    #druid.zk.service.compress=false
...

We update:

kubectl -n druid apply -f examples/tiny-cluster.yam

Druid RBAC Role

Add RBAC Role and RoleBinding, otherwise we will have authorization errors of the following type:

ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Error while watching node type [BROKER]
org.apache.druid.java.util.common.RE: Expection in watching pods, code[403] and error[{“kind”:”Status”,”apiVersion”:”v1″,”metadata”:{},”status”:”Failure”,”message”:”pods is forbidden: User \”system:serviceaccount:druid:default\” cannot watch resource
\”pods\” in API group \”\” in the namespace \”druid\””,”reason”:”Forbidden”,”details”:{“kind”:”pods”},”code”:403}

Создаємо маніфест із документації:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: druid-cluster
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - configmaps
  verbs:
  - '*'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: druid-cluster
subjects:
- kind: ServiceAccount
  name: default
roleRef:
  kind: Role
  name: druid-cluster
  apiGroup: rbac.authorization.k8s.io

Створюємо нові ресурси в неймспейсі Друіда:

kubectl -n druid apply -f druid-serviceaccout.yaml

role.rbac.authorization.k8s.io/druid-cluster created

rolebinding.rbac.authorization.k8s.io/druid-cluster created

І за хвилину перевіряємо логи:

2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Starting NodeRoleWatcher for [HISTORICAL]…

2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Started NodeRoleWatcher for [HISTORICAL].

2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Created NodeRoleWatcher for nodeRole [HISTORICAL].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Creating NodeRoleWatcher for nodeRole [PEON].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Starting NodeRoleWatcher for [PEON]…

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Started NodeRoleWatcher for [PEON].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Created NodeRoleWatcher for nodeRole [PEON].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Creating NodeRoleWatcher for nodeRole [INDEXER].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Starting NodeRoleWatcher for [INDEXER]…

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Started NodeRoleWatcher for [INDEXER].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Created NodeRoleWatcher for nodeRole [INDEXER].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Creating NodeRoleWatcher for nodeRole [BROKER].

2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Starting NodeRoleWatcher for [BROKER]…

2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Started NodeRoleWatcher for [BROKER].

2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider – Created NodeRoleWatcher for nodeRole [BROKER].

Done.

Apache Druid,Databases,PostgreSQL,databases,Kubernetes,

#PostgreSQL #metadata #storage #replacing #ZooKeeper #Kubernetes #Extensions

Leave a Comment

Your email address will not be published. Required fields are marked *