Signup/Sign In

EFK Stack Setup (Elasticsearch, Fluent-bit and Kibana) for Kubernetes Log Management

Posted in Programming   LAST UPDATED: JUNE 15, 2023

    EFK stack is Elasticsearch, Fluent bit and Kibana UI, which is gaining popularity for Kubernetes log aggregation and management. The 'F' is EFK stack can be Fluentd too, which is like the big brother of Fluent bit. Fluent bit being a lightweight service is the right choice for basic log management use case.

    EFK stack setup for logging in kubernetes

    So in this tutorial, we will be deploying Elasticsearch, Fluent bit, and Kibana on Kubernetes. Before taking the EFK stack setup to Kubernetes, if you want to test it on a local server, you can check this post How to set up Elasticsearch, Fluent bit, and Kibana for Log aggregation and Visualization. Also, we have written some more posts, explaining the setup of Fluent bit on Linux machines and in general what is Fluent bit?, you should check these too.

    Why do we need EFK Stack?

    Well if you know about Kubernetes then you must be thinking that you can use the kubectl logs command to easily check logs for any Kubernetes pod running. But what if there are 100 pods or even more, in that case, it will be very difficult, on top of this, Kibana dashboard UI can be configured as you want to continuously monitor logs in runtime, which makes it easier for someone with no experience of running Linux commands to check logs, monitor the Kubernetes cluster and applications running on it.

    If you are on AWS, then you can configure Elasticsearch to archive logs on the S3 bucket(which can be configured without the EFK stack too, but just saying), to have historical logs persisted.

    If you have a large application with 100 pods running along with logs coming in from the Kubernetes system, docker container, etc, if you do not have a centralized log aggregation and management system, you will, sooner or later, regret the big-time, hence the EFK stack is a good choice.

    Also, using Fluent bit we can parse logs from various different input sources, and filter them to add more info. or remove unwanted info, and then store the data in Elasticsearch.

    How does it Work?

    Well, to understand the setup, here is a picture:

    EFK stack setup in Kubernetes

    Here we have a Kubernetes cluster with 3 nodes, on these 3 nodes pods will be created to run various services like your Applications, and in this case the EFK stack.

    The fluent bit is run as a DaemonSet, which means each node in the cluster will have one pod for Fluent bit, and it will read logs from the /var/log/containers directory where log files are created for each Kubernetes namespace.

    Elastcisearch service runs in a separate pod while Kibana runs in a separate pod. They can be on the same cluster node too, depending upon the resource availability. But usually, both of them demand high CPU and memory so their pods get started on different cluster nodes.

    There will be some pods running your applications, which are shown as App1, App2, in the above picture.

    The Fluent bit service will read logs from these Apps, and push the data in JSON document format in Elasticsearch, and from there Kibana will stream data to show in the UI.

    So let's start with the setup.

    How To Setup EFK Stack On Kubernetes

    Step 1: Create a Namespace

    It's good practice to create a separate namespace for every functional unit in Kubernetes as this makes the management of pods running within a particular namespace easy. To see the existing namespaces, you can use the following command:

    kubectl get namespaces

    and you will see the list of existing namespaces:


    NAME STATUS AGE
    default Active 5m
    kube-system Active 5m
    kube-public Active 5m

    We will be creating a new namespace with the name kube-logging for us. To do so create a new file and name it kube-logging.yaml using your favorite editor like vim:

    vi kube-logging.yaml

    Press i to enter the INSERT mode and then copy the following text in it.

    kind: Namespace
    apiVersion: v1
    metadata:
      name: kube-logging

    Then press ESC followed by :wq! and hit ENTER.

    To create the namespace using the YAML file created above, run the following command:

    kubectl create -f kube-logging.yaml

    you will see the following output:


    namespace/kube-logging created

    You can further confirm the namespace creation by running the kubectl get namespaces command.

    Step 2: Setup Elasticsearch

    For Elasticsearch we will set up a headless service and a stateful set which will get attached to this service. A headless service does not perform load balancing or have a static IP. We are making Elasticsearch a headless service because we will set up a 3-node elastic cluster and we want each node to have all the data stored in it, so we don't want any load balancing. We will get 3 Elasticsearch pods running once we are done with everything, which will ensure high availability.

    Creating Elasticsearch Service:

    Create a new file and name it elastic-service.yaml using your favorite editor like vim:

    vi elastic-service.yaml

    Press i to enter the INSERT mode and then copy the following text in it.

    kind: Service
    apiVersion: v1
    metadata:
      name: elasticsearch
      namespace: kube-logging
      labels:
        app: elasticsearch
    spec:
      selector:
        app: elasticsearch
      clusterIP: None
      ports:
        - port: 9200
          name: rest
        - port: 9300
          name: inter-node

    Then press ESC followed by:wq! and hit ENTER.

    In the YAML file, we have defined a Service called elastic search in the kube-logging namespace, and give it a label app: elasticsearch which will be used when we define the stateful set for Elasticsearch. Also, we have kept the clusterIP as None as this is required for making it a headless service.

    And we have specified the ports as 9200 and 9300 for REST API access and for inter-node communication.

    To create the service using the YAML file created above, run the following command:

    kubectl create -f elastic-service.yaml

    You should see the following output:


    service/elasticsearch created

    To double check, we can run the following command to see all the services running in the kube-logging namespace that we created:

    kubectl get services -n kube-logging

    You will see the output similar to this:


    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 26s

    Creating the StatefulSet

    Now let's define the YAML for creating the statefulset for Elasticsearch service. When we define a statefulset, we provide a lot of information like the cluster information which includes the cluster name, number of replicas, and template for replica creation, then along with cluster information, we specify which Elasticsearch version to be installed, we provide the resources like CPU and Memory too in the StatefulSet only.

    Create a new file and name it elastic-service.yaml using your favorite editor like vim:

    vi elastic-statefulset.yaml

    Press i to enter the INSERT mode and then copy the following text in it.

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: es-cluster
      namespace: kube-logging
    spec:
      serviceName: elasticsearch
      replicas: 3
      selector:
        matchLabels:
          app: elasticsearch
      template:
        metadata:
          labels:
            app: elasticsearch
        spec:
          containers:
          - name: elasticsearch
            image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
            resources:
                limits:
                  cpu: 1000m
                  memory: 2Gi
                requests:
                  cpu: 500m
                  memory: 1Gi
            ports:
            - containerPort: 9200
              name: rest
              protocol: TCP
            - containerPort: 9300
              name: inter-node
              protocol: TCP
            volumeMounts:
            - name: data
              mountPath: /usr/share/elasticsearch/data
            env:
              - name: cluster.name
                value: k8s-logs
              - name: node.name
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: discovery.seed_hosts
                value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
              - name: cluster.initial_master_nodes
                value: "es-cluster-0,es-cluster-1,es-cluster-2"
              - name: ES_JAVA_OPTS
                value: "-Xms512m -Xmx512m"
          initContainers:
          - name: fix-permissions
            image: busybox
            command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
            securityContext:
              privileged: true
            volumeMounts:
            - name: data
              mountPath: /usr/share/elasticsearch/data
          - name: increase-vm-max-map
            image: busybox
            command: ["sysctl", "-w", "vm.max_map_count=262144"]
            securityContext:
              privileged: true
          - name: increase-fd-ulimit
            image: busybox
            command: ["sh", "-c", "ulimit -n 65536"]
            securityContext:
              privileged: true
      volumeClaimTemplates:
      - metadata:
          name: data
          labels:
            app: elasticsearch
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: do-block-storage
          resources:
            requests:
              storage: 10Gi

    Then press ESC followed by :wq! and hit ENTER.

    In the above YAML file, we have defined the following:

    • The Elasticsearch cluster information like cluster name which is es-cluster, the namespace for it which will be kube-logging, name of the service which we defined in the section above, the number of replicas as 3, and the template for those replicas which will be app: elasticsearch.

    • We have defined the container information, like the version of Elasticsearch to be set up, which is 7.2.0 in this case, then the resource allocation, CPU and Memory, the limit section defines the maximum limit, and the requests section defines how much will be used.

    • The Port information to define the port numbers for REST API and inter-node communication.

    • Then we have the environment variables followed by init containers which is some pre-setup commands run before Elasticsearch app is run, and at last we have defined the storage to be allocated for Elasticsearch data which we have kept as 10 GB, but you can increase it as per your requirements.

    To create the service using the YAML file created above, run the following command:

    kubectl create -f elastic-statefulset.yaml

    You should see the following output:


    statefulset.apps/es-cluster created

    To double check, we can run the following command to see all the pods running in the kube-logging namespace that we created:

    kubectl get pod -n kube-logging

    You should see something like this in the output:


    es-cluster-0 1/1 Running 0 3m07s
    es-cluster-1 1/1 Running 0 3m07s
    es-cluster-2 0/1 Pending 0 3m07s

    We can also do a CURL request to the REST API, but for that we need the IP address of the pod, to get that, run the following command:

    kubectl get pod -n kube-logging -o wide

    The output for this command will be:


    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    es-cluster-0 1/1 Running 0 3m12s XX.XXX.XXX.XXX YOUR_NODE_NAME <none> <none>

    Now you can run the following CURL command hit Elasticsearch service:

    curl http://XX.XXX.XXX.XXX:9200/

    which will give output like this:


    {
    "name" : "es-cluster-0",
    "cluster_name" : "es-cluster",
    "cluster_uuid" : "UfWUnhaIJUyPLu4_DkW7ew",
    "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
    },
    "tagline" : "You Know, for Search"
    }

    Step 3: Setup Kibana

    Now you know some kubectl commands and how we create the YAML files and run them, so I will skip that part. For Kibana, we will have a kibana service and a deployment to launch one pod.

    We will be creating two YAML files, one for Kibana service and the other for Kibana deployment.

    Here is the kibana-service.yaml file (Use the vim editor to create a file and save the content in it, just like we did above):

    apiVersion: v1
    kind: Service
    metadata:
      name: kibana
      namespace: kube-logging
      labels:
        app: kibana
    spec:
      ports:
      - port: 5601
      selector:
        app: kibana

    In the service YAML we specified the service name, namespace name, a port on which the service will be accessible, and label which is app: kibana for the service.

    Now, let's create the deployment YAML file with name kibana-deployment.yaml,

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kibana
      namespace: kube-logging
      labels:
        app: kibana
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: kibana
      template:
        metadata:
          labels:
            app: kibana
        spec:
          containers:
          - name: kibana
            image: docker.elastic.co/kibana/kibana:7.2.0
            resources:
              limits:
                cpu: 1000m
                memory: 1Gi
              requests:
                cpu: 700m
                memory: 1Gi
            env:
              - name: ELASTICSEARCH_URL
                value: http://elasticsearch:9200
            ports:
            - containerPort: 5601

    In the deployment file, we have specified the image version for Kibana which is 7.2.0(the version of Elasticsearch and Kibana should be the same), port information, resource information like CPU and memory, etc.

    To create the service and deployment using the YAML file created above, run the following command:

    kubectl create -f kibana-service.yaml
    kubectl create -f kibana-deployment.yaml

    You should see the following output:


    service/kibana created
    deployment.apps/kibana created

    To check the pod status run the following command:

    kubectl get pods -n kube-logging


    NAME READY STATUS RESTARTS AGE
    es-cluster-0 1/1 Running 0 2h
    es-cluster-1 1/1 Running 0 2h
    es-cluster-2 0/1 Pending 0 2h
    kibana-598vgt546f5-7b9wx 1/1 Running 0 2h

    To access the Kibana UI, we can run the below command to access the UI using the browser,

    kubectl port-forward kibana-598vgt546f5-7b9wx 5601:5601 --namespace=kube-logging

    Then you can access the Kibana UI using the following URL http://localhost:5601/

    Step 4: Fluent Bit Service

    For Fluent Bit, we will create 5 YAML files and apply them using the kubectl command like we did in the above sections. The YAML files will be:

    YAML File Purpose
    fluent-bit-service-account.yaml This is used to create a ServiceAccount with name fluent-bit in the namespace kube-logging, which Fluent Bit pods will use to access the Kubernetes API.
    fluent-bit-role.yaml This creates a ClusterRole which is used to grant the get, list, and watch permissions to fluent-bit service on the Kubernetes resources like the nodes, pods and namespaces objects.
    fluent-bit-role-binding.yaml This is to bind the ServiceAccount to the ClusterRole created above.
    fluent-bit-configmap.yaml This is the main file in which we specify the configurations for the Fluent Bit service like Input plugin, Parser, Filter, Output plugin, etc. We have already covered Fluent Bit Service and its Configurations.
    fluent-bit-ds.yaml This defines the DaemonSet for Fluent Bit along with Elasticsearch configuration, along with other basic configurations.

    Below we have the content of all the files. Please create these files and then we will run them all.

    fluent-bit-service-account.yaml File:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: fluent-bit
      namespace: kube-logging
      labels:
        app: fluent-bit

    fluent-bit-role.yaml File:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: fluent-bit
      labels:
        app: fluent-bit
    rules:
    - apiGroups: [""]
      resources:
      - pods
      - namespaces
      verbs: ["get", "list", "watch"]

    fluent-bit-role-binding.yaml File:

    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: fluent-bit
    roleRef:
      kind: ClusterRole
      name: fluent-bit
      apiGroup: rbac.authorization.k8s.io
    subjects:
    - kind: ServiceAccount
      name: fluent-bit
      namespace: kube-logging

    fluent-bit-configmap.yaml File:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: fluent-bit-config
      namespace: logging
      labels:
        k8s-app: fluent-bit
    data:
      # Configuration files: server, input, filters and output
      # ======================================================
      fluent-bit.conf: |
        [SERVICE]
            Flush         1
            Log_Level     info
            Daemon        off
            Parsers_File  parsers.conf
            HTTP_Server   On
            HTTP_Listen   0.0.0.0
            HTTP_Port     2020
    
        @INCLUDE input-kubernetes.conf
        @INCLUDE filter-kubernetes.conf
        @INCLUDE output-elasticsearch.conf
    
      input-kubernetes.conf: |
        [INPUT]
            Name              tail
            Tag               kube.*
            Path              /var/log/containers/*.log
            Parser            docker
            DB                /var/log/flb_kube.db
            Mem_Buf_Limit     5MB
            Skip_Long_Lines   On
            Refresh_Interval  10
    
      filter-kubernetes.conf: |
        [FILTER]
            Name                kubernetes
            Match               kube.*
            Kube_URL            https://kubernetes.default.svc:443
            Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
            Kube_Tag_Prefix     kube.var.log.containers.
            Merge_Log           On
            Merge_Log_Key       log_processed
            K8S-Logging.Parser  On
            K8S-Logging.Exclude Off
    
      output-elasticsearch.conf: |
        [OUTPUT]
            Name            es
            Match           *
            Host            ${FLUENT_ELASTICSEARCH_HOST}
            Port            ${FLUENT_ELASTICSEARCH_PORT}
            Logstash_Format On
            Replace_Dots    On
            Retry_Limit     False
    
      parsers.conf: |
        [PARSER]
            Name   apache
            Format regex
            Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
            Time_Key time
            Time_Format %d/%b/%Y:%H:%M:%S %z
    
        [PARSER]
            Name   apache2
            Format regex
            Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
            Time_Key time
            Time_Format %d/%b/%Y:%H:%M:%S %z
    
        [PARSER]
            Name   apache_error
            Format regex
            Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
    
        [PARSER]
            Name   nginx
            Format regex
            Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
            Time_Key time
            Time_Format %d/%b/%Y:%H:%M:%S %z
    
        [PARSER]
            Name   json
            Format json
            Time_Key time
            Time_Format %d/%b/%Y:%H:%M:%S %z
    
        [PARSER]
            Name        docker
            Format      json
            Time_Key    time
            Time_Format %Y-%m-%dT%H:%M:%S.%L
            Time_Keep   On
    
        [PARSER]
            Name        syslog
            Format      regex
            Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
            Time_Key    time
            Time_Format %b %d %H:%M:%S

    fluent-bit-ds.yaml File:

    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: fluent-bit
      namespace: logging
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      template:
        metadata:
          labels:
            k8s-app: fluent-bit-logging
            version: v1
            kubernetes.io/cluster-service: "true"
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "2020"
            prometheus.io/path: /api/v1/metrics/prometheus
        spec:
          containers:
          - name: fluent-bit
            image: fluent/fluent-bit:1.3.11
            imagePullPolicy: Always
            ports:
              - containerPort: 2020
            env:
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "elasticsearch"
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "9200"
            volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
          terminationGracePeriodSeconds: 10
          volumes:
          - name: varlog
            hostPath:
              path: /var/log
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
          - name: fluent-bit-config
            configMap:
              name: fluent-bit-config
          serviceAccountName: fluent-bit
          tolerations:
          - key: node-role.kubernetes.io/master
            operator: Exists
            effect: NoSchedule
          - operator: "Exists"
            effect: "NoExecute"
          - operator: "Exists"
            effect: "NoSchedule"

    Once you have created these files, we will run the kubectl create command to run the Fluent Bit service. Run the following commands:

    kubectl create -f fluent-bit-service-account.yaml
    kubectl create -f fluent-bit-role.yaml
    kubectl create -f fluent-bit-role-binding.yaml
    kubectl create -f fluent-bit-configmap.yaml
    kubectl create -f fluent-bit-ds.yaml

    Run the following command to see if the daemons it created or not:

    kubectl get ds -n kube-logging

    And that's it. Our work here is done. You can use the kubectl get pod and kubectl get services commands used in the above section to see the pod information and the services running.

    Conclusion:

    So in this long tutorial, we successfully set up the EFK stack for logging in Kubernetes. The EFK stack here refers to Elasticsearch, Fluent Bit, and Kibana. If you want to set up Fluentd service, the same YAML files can be used that we used for Fluent Bit, with fluent-bit changed to fluentd everywhere.

    If you face any issue in the setup, share it with us and we will definitely help you out.

    Frequently Asked Questions(FAQs)

    1. What is the EFK stack?

    The EFK stack consists of Elasticsearch, Fluent-bit, and Kibana. It is a popular combination of open-source tools used for log management in Kubernetes environments. Elasticsearch is a distributed search and analytics engine, Fluent-bit is a lightweight log collector, and Kibana is a data visualization platform.

    2. Why use the EFK stack for Kubernetes log management?

    The EFK stack offers several advantages for Kubernetes log management. It provides centralized log storage, efficient log collection and parsing, and powerful data visualization capabilities. Using the EFK stack allows you to effectively monitor and analyze logs, troubleshoot issues, and gain valuable insights into your Kubernetes environment.

    3. How do I set up the EFK stack for Kubernetes log management?

    Setting up the EFK stack involves deploying and configuring Elasticsearch, Fluent-bit, and Kibana in your Kubernetes cluster. You can use YAML manifests or Helm charts to deploy these components. Configuration involves defining log sources, setting up log parsing and filtering, and configuring visualization dashboards in Kibana.

    4. Can I scale the EFK stack to handle large log volumes?

    Yes, the EFK stack is designed to handle large log volumes. Elasticsearch, the core component of the stack, is built for horizontal scalability and can be configured as a cluster to handle increased log ingestion and storage requirements. By properly configuring and scaling the EFK stack components, you can handle logs at scale.

    5. Are there alternatives to the EFK stack for Kubernetes log management?

    Yes, there are alternatives to the EFK stack for Kubernetes log management. Some popular alternatives include the ELK stack (Elasticsearch, Logstash, and Kibana) and the Prometheus-Grafana stack. Each stack has its own set of features and capabilities, so it's essential to evaluate your requirements and choose the stack that best aligns with your needs.

    You may also like:

    About the author:
    I like writing content about C/C++, DBMS, Java, Docker, general How-tos, Linux, PHP, Java, Go lang, Cloud, and Web development. I have 10 years of diverse experience in software development. Founder @ Studytonight
    Tags:kubectlkubernetes
    IF YOU LIKE IT, THEN SHARE IT
     

    RELATED POSTS