Resolving DNS resolution issues with Coredns

Resolving DNS resolution issues with Coredns


Problem statement

Increase in timeouts while DNS resolution in a Kubernetes Cluster (EKS on AWS).


Suspected Problem

Coredns getting overwhelmed with DNS resolution requests and resolution takes multiple hops and searches thus decreasing Coredns capability to handle the requests.

Solutions Implemented


1. Added coredns monitoring using prometheus and cloudwatch.
2. Update coredns configmap to increase the DNS cache time.
3. Added lameduck configuration for better graceful handling during scale-in and scale-out of coredns pods.
  1. Corefile: |
    .:53 {
    log
    errors
    health {
    lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
    pods insecure
    fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 100
    loop
    reload
    loadbalance
    }
4. Updated ndots configurations in the pod.specs from default 5 to 2 to reduce the number of internal DNS resolution hits. Refer this article for better understanding of ndots configurations.
To update ndots configurations for a microservice, update the deployment template podExtraSpecs section as below:
  1. podExtraSpecs:
    dnsConfig:
    options:
    - name: ndots
    value: "2"