New Tutorials:   JAVASCRIPT    SASS/SCSS    PL/SQL  
CLOSE
   Kubernetes  HowTo  Logging  
   Technology    Programming

EFK Stack Setup (Elasticsearch, Fluent-bit and Kibana) for Kubernetes Log Management

         
 MAY 21, 2020   by iamabhishek

EFK stack is Elasticsearch, Fluent bit and Kibana UI, which is gaining popularity for Kubernetes log aggregation and management. The 'F' is EFK stack can be Fluentd too, which is like the big brother of Fluent bit. Fluent bit being a lightweight service is the right choice for basic log management use case.

EFK stack setup for logging in kubernetes

So in this tutorial we will be deploying Elasticsearch, Fluent bit and Kibana on Kuberentes. Before taking the EFK stack setup to Kubernetes, if you want to test it on a local server, you can check this post How to setup Elasticsearch, Fluent bit and Kibana for Log aggregation and Visualization. Also we have written some more posts, explaining setup Fluent bit on Linux machine and in general what is Fluent bit?, you should check these too.

Why we need EFK Stack?

Well if you know about Kuberenetes then you must be thinking that you can use the kubectl logs command to easily check logs for any Kuberneted pod running. But what if there are 100 pods or even more, in that case it will be very difficult, on top of this, Kibana dashboard UI can be configured as you want to continuosly monitor logs in runtime, which makes it easier for someone with no experience of running linux commands to check logs, monitor the Kubernetes cluster and applications running on it.

If you are on AWS, then you can configure Elasticsearch to archive logs on S3 bucket(which can be configured without EFK stack too, but just saying), to have historical logs persisted.

If you have a large application with 100 pods running along with logs coming in from kubernetes system, docker container, etc, if you do not have a centralised log aggregation and management system, you will, sooner or later, regret big time, hence the EFK stack is a good choice.

Also, using Fluent bit we can parse logs from various different input sources, filter them to add more info. or remove unwanted info, and then store the data in Elasticsearch.

How does it Work?

Well, to understand the setup, here is a picture:

EFK stack setup in Kubernetes

Here we have a Kubernetes cluster with 3 nodes, on these 3 nodes pods will be created to run various services like your Applications, and in this case teh EFK stack.

Fluent bit is run as a DaemonSet, which means each node in the cluster will have one pod for Fluent bit, and it will read logs from the /var/log/containers directory where log files are created for each Kubernetes namspace.

Elastcisearch service runs in a separate pod while Kibana runs in a separate pod. They can be on the same cluster node too, depending upon the resource availability. But usually both of the them deman high CPU and memory so their pods get started on different cluster nodes.

The there will be some pods running your applications, which are shown as App1, App2, in the above picture.

The Fluent bit service will read logs from these Apps, and push the data in JSON document format in Elasticsearch, and from there Kibana will stream data to show in the UI.

So let's start with the setup.

Step 1: Create a Namespace

It's good practice to create a separate namespace for every functional unit in Kubernetes as this makes the management of pods running within a particular namespace easy. To see the existing namespaces, you can use the following command:

kubectl get namespaces

and you will see the list of existing namespaces:


NAME STATUS AGE
default Active 5m
kube-system Active 5m
kube-public Active 5m

We will be creating a new namespace with name kube-logging for us. To do so create a new file and name it kube-logging.yaml using your favorite editor like vim:

vi kube-logging.yaml

Press i to enter the INSERT mode and then copy the following text in it.

kind: Namespace
apiVersion: v1
metadata:
  name: kube-logging

Then press ESC followed by :wq! and hit ENTER.

To create the namespace using the YAML file created above, run the following command:

kubectl create -f kube-logging.yaml

you will see the following output:


namespace/kube-logging created

You can further confirm the namespace creation by running the kubectl get namespaces command.

Step 2: Setup Elasticsearch

For Elasticsearch we will setup a headless service and a statefulset which will get attached to this service. A headless service does not perform load balancing or have a static IP. We are making Elasticsearch a headless service because we will setup a 3 node elastic cluster and we want each node to have all the data stored in it, so we don't want any load balancing. We will get 3 Elasticsearch pods running once we are done with everything, which will ensure high availability.

Creating Elasticsearch Service:

Create a new file and name it elastic-service.yaml using your favorite editor like vim:

vi elastic-service.yaml

Press i to enter the INSERT mode and then copy the following text in it.

kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  namespace: kube-logging
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
    - port: 9200
      name: rest
    - port: 9300
      name: inter-node

Then press ESC followed by :wq! and hit ENTER.

In the YAML file, we have defined a Service called elasticsearch in the kube-logging namespace, and give it a label app: elasticsearch which will be used when we define the statefulset for Elasticsearch. Also, we have kept the clusterIP as None as this is requried for making it aheadless service.

And we have specified the ports as 9200 and 9300 for REST API access and for inter-node communication.

To create the service using the YAML file created above, run the following command:

kubectl create -f elastic-service.yaml

You should see the following output:


service/elasticsearch created

To double check, we can run the following command to see all the services running in the kube-logging namespace that we created:

kubectl get services -n kube-logging

You will see the output similar to this:


NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 26s

Creating the StatefulSet

Now let's define the YAML for creating the statefulset for Elasticsearch service. When we define a statefulset, we provide a lot of information like the cluster information which includes the cluster name, number of replicas, template for replica creation, then along with cluster information, we specify which Elasticsearch version to be installed, we provide the resources like CPU and Memory too in the StatefulSet only.

Create a new file and name it elastic-service.yaml using your favorite editor like vim:

vi elastic-statefulset.yaml

Press i to enter the INSERT mode and then copy the following text in it.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
  namespace: kube-logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
        resources:
            limits:
              cpu: 1000m
              memory: 2Gi
            requests:
              cpu: 500m
              memory: 1Gi
        ports:
        - containerPort: 9200
          name: rest
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: cluster.name
            value: k8s-logs
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: discovery.seed_hosts
            value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
          - name: cluster.initial_master_nodes
            value: "es-cluster-0,es-cluster-1,es-cluster-2"
          - name: ES_JAVA_OPTS
            value: "-Xms512m -Xmx512m"
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: do-block-storage
      resources:
        requests:
          storage: 10Gi

Then press ESC followed by :wq! and hit ENTER.

In the above YAML file, we have defined the follwoing:

  • The Elasticsearch cluster information like cluster name which is es-cluster, the namespace for it which will be kube-logging, name of the service which we defined in the section above, number of replicas as 3, and the template for those replicas which will be app: elasticsearch.

  • We have defined the container information, like the version of Elasticsearch to be setup, which is 7.2.0 in this case, then the resource allocation, CPU and Memory, the limit section defines the maximum limit and the requests section defines how much will be used.

  • The Port information to define the port numbers for REST API and inter-node communication.

  • Then we have the environment variables followed by init containers which is some pre-setup commands run before Elasticsearch app is run, and at last we have defined the storage to be allocated for Elasticsearch data which we have kept as 10 GB, but you can increase it as per your requriements.

To create the service using the YAML file created above, run the following command:

kubectl create -f elastic-statefulset.yaml

You should see the following output:


statefulset.apps/es-cluster created

To double check, we can run the following command to see all the pods running in the kube-logging namespace that we created:

kubectl get pod -n kube-logging

You should see something like this in the output:


es-cluster-0 1/1 Running 0 3m07s
es-cluster-1 1/1 Running 0 3m07s
es-cluster-2 0/1 Pending 0 3m07s

We can also do a CURL request to the REST API, but for that we need the IP address of the pod, to get that, run the following command:

kubectl get pod -n kube-logging -o wide

The output for this command will be:


NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
es-cluster-0 1/1 Running 0 3m12s XX.XXX.XXX.XXX YOUR_NODE_NAME <none> <none>

Now you can run the following CURL command hit Elasticsearch service:

curl http://XX.XXX.XXX.XXX:9200/

which will give output like this:


{
"name" : "es-cluster-0",
"cluster_name" : "es-cluster",
"cluster_uuid" : "UfWUnhaIJUyPLu4_DkW7ew",
"version" : {
"number" : "7.2.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "508c38a",
"build_date" : "2019-06-20T15:54:18.811730Z",
"build_snapshot" : false,
"lucene_version" : "8.0.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

Step 3: Setup Kibana

Now you know some kubectl commands and how we create the YAML files and run it, so I will skip that part. For Kibana, we will have a kibana service and a deployment to launch one pod.

We will be creating two YAML files, one for Kibana service and other for Kibana deployment.

Here is the kibana-service.yaml file (Use the vim editor to create a file and save the content in it, just like we did above):

apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: kube-logging
  labels:
    app: kibana
spec:
  ports:
  - port: 5601
  selector:
    app: kibana

In the service YAML we specified the service name, namespace name, port on which the service will be accessible and label which is app: kibana for the service.

Now, let's create the deployment YAML file with name kibana-deployment.yaml,

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: kube-logging
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.2.0
        resources:
          limits:
            cpu: 1000m
            memory: 1Gi
          requests:
            cpu: 700m
            memory: 1Gi
        env:
          - name: ELASTICSEARCH_URL
            value: http://elasticsearch:9200
        ports:
        - containerPort: 5601

In the deployment file we have specified the image version for Kibana which is 7.2.0(the version of Elasticsearch and Kibana should be same), port information, the resource information like CPU and memory, etc.

To create the service and deployment using the YAML file created above, run the following command:

kubectl create -f kibana-service.yaml
kubectl create -f kibana-deployment.yaml

You should see the following output:


service/kibana created
deployment.apps/kibana created

To check the pod status run the following command:

kubectl get pods -n kube-logging


NAME READY STATUS RESTARTS AGE
es-cluster-0 1/1 Running 0 2h
es-cluster-1 1/1 Running 0 2h
es-cluster-2 0/1 Pending 0 2h
kibana-598vgt546f5-7b9wx 1/1 Running 0 2h

To access the Kibana UI, we can run the below command to access the UI using the browser,

kubectl port-forward kibana-598vgt546f5-7b9wx 5601:5601 --namespace=kube-logging

Then you can access the Kibana UI using the following URL http://localhost:5601/

Step 4: Fluent Bit Service

For Fluent Bit we will create 5 YAML files and apply them using the kubectl command like we did in the above sections. The YAML files will be:

YAML File Purpose
fluent-bit-service-account.yaml This is used to create a ServiceAccount with name fluent-bit in the namespace kube-logging, which Fluent Bit pods will use to access the Kubernetes API.
fluent-bit-role.yaml This creates a ClusterRole which is used to grant the get, list, and watch permissions to fluent-bit service on the Kubernetes resources like the nodes, pods and namespaces objects.
fluent-bit-role-binding.yaml This is to bind the ServiceAccount to the ClusterRole created above.
fluent-bit-configmap.yaml This is the main file in which we specify the configurations for the Fluent Bit service like Input plugin, Parser, Filter, Output plugin, etc. We have already covered about Fluent Bit Service and its Configurations.
fluent-bit-ds.yaml This defines the DaemonSet for Fluent Bit along with Elasticsearch configuration, along with other basic configurations.

Below we have the content of all the files. Please create these files and then we will run them all.

fluent-bit-service-account.yaml File:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: kube-logging
  labels:
    app: fluent-bit

fluent-bit-role.yaml File:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit
  labels:
    app: fluent-bit
rules:
- apiGroups: [""]
  resources:
  - pods
  - namespaces
  verbs: ["get", "list", "watch"]

fluent-bit-role-binding.yaml File:

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluent-bit
roleRef:
  kind: ClusterRole
  name: fluent-bit
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: kube-logging

fluent-bit-configmap.yaml File:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-elasticsearch.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Replace_Dots    On
        Retry_Limit     False

  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

fluent-bit-ds.yaml File:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.3.11
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

Once you have created these files, we will run the kubectl create command to run the Fluent Bit service. Run the following commands:

kubectl create -f fluent-bit-service-account.yaml
kubectl create -f fluent-bit-role.yaml
kubectl create -f fluent-bit-role-binding.yaml
kubectl create -f fluent-bit-configmap.yaml
kubectl create -f fluent-bit-ds.yaml

Run the following command to see if the daemonset it created or not:

kubectl get ds -n kube-logging

And that's it. Our work here is done. You can use the kubectl get pod and kubectl get services commands used in aboe section to see the pod information and the services running.

Conclusion:

So in this long tutorial, we successfully setup the EFK stack for logging in Kubernetes. The EFK stack here refers to Elasticsearch, Fluent Bit and Kibana. If you want to setup Fluentd service, the same YAML files can be used that we used for Fluent Bit, with fluent-bit changed to fluentd everywhere.

If you face any issue in the setup, share it with us and we will definitely help you out.


RELATED POSTS



Subscribe and receive amazing posts directly in your inbox.