Resolving DNS resolution issues with Coredns
Problem statement
Increase in timeouts while DNS resolution in a Kubernetes Cluster (EKS on AWS).
Suspected Problem
Coredns getting overwhelmed with DNS resolution requests and resolution takes multiple hops and searches thus decreasing Coredns capability to handle the requests.
1. Added coredns monitoring using prometheus and cloudwatch.
2. Update coredns configmap to increase the DNS cache time.
3. Added lameduck configuration for better graceful handling during scale-in and scale-out of coredns pods.
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 100
loop
reload
loadbalance
}
4. Updated ndots configurations in the pod.specs from default 5 to 2 to reduce the number of internal DNS resolution hits. Refer
this article for better understanding of ndots configurations.
To update ndots configurations for a microservice, update the deployment template
podExtraSpecs section as below:
podExtraSpecs:
dnsConfig:
options:
- name: ndots
value: "2"