Resolving DNS resolution issues with Coredns

Problem statement

Increase in timeouts while DNS resolution in a Kubernetes Cluster (EKS on AWS).

Suspected Problem

Coredns getting overwhelmed with DNS resolution requests and resolution takes multiple hops and searches thus decreasing Coredns capability to handle the requests.

Solutions Implemented

1. Added coredns monitoring using prometheus and cloudwatch.

2. Update coredns configmap to increase the DNS cache time.

3. Added lameduck configuration for better graceful handling during scale-in and scale-out of coredns pods.

Corefile: |
  .:53 {
      log
      errors
      health {
        lameduck 5s
      }
      ready
      kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
      }
      prometheus :9153
      forward . /etc/resolv.conf
      cache 100
      loop
      reload
      loadbalance
  }

4. Updated ndots configurations in the pod.specs from default 5 to 2 to reduce the number of internal DNS resolution hits. Refer this article for better understanding of ndots configurations.
To update ndots configurations for a microservice, update the deployment template podExtraSpecs section as below:

podExtraSpecs:
dnsConfig:
options:
- name: ndots
value: "2"

Related Articles
Rollout is showing error - :111: attempt to index a non-table object(nil) with key 'stableRS' stack traceback: :111: in main chunk [G]: ?
This can occur if you are using or recently upgraded to Kubernetes version 1.22 or above and you are using rollout controller version 0.13.0 from chart devtron-charts/rollout or devtron/rollout. The issue can be because of CRDs which were updated in ...
[Resolved] Workaround for resolving npm warn tar TAR_BAD_ARCHIVE: Unrecognized archive format
Overview Recently, users have encountered issues in their Node.js builds, reporting errors such as: npm WARN tar TAR_BAD_ARCHIVE: Unrecognized archive format Request failed "304 Not Modified" (if using Yarn as the package manager). This issue has ...
Devtron Enterprise DT-36 Release Note
The latest release of Devtron is here, with new features and enhancements. Here's a summary of the DT-36 release and related updates. Highlights: Provision EKS Cluster Through Devtron Devtron now allows you to provision Amazon EKS Kubernetes clusters ...
NGINX Ingress Helm Chart Install/Upgrade Sync Stuck with Argo CD
? Problem Overview Context: Using Argo CD to manage ingress-nginx Helm chart. Issue Introduced: In chart version 4.12.3+, pre-sync and post-sync Jobs have ttlSecondsAfterFinished: 0, meaning the Job itself is deleted immediately after completion. ...
Devtron Enterprise DT-34 Release Note
The latest release of Devtron is here, with new features and enhancements. Here's a summary of the DT-34 release and related updates. Highlights: Support for Bulk Deployment of Only Active (Non-Hibernated) Apps While executing bulk deployments, users ...

Resolving DNS resolution issues with Coredns

Resolving DNS resolution issues with Coredns

Problem statement

Suspected Problem

Solutions Implemented

Related Articles

Rollout is showing error - :111: attempt to index a non-table object(nil) with key 'stableRS' stack traceback: :111: in main chunk [G]: ?

[Resolved] Workaround for resolving npm warn tar TAR_BAD_ARCHIVE: Unrecognized archive format

Devtron Enterprise DT-36 Release Note

NGINX Ingress Helm Chart Install/Upgrade Sync Stuck with Argo CD

Devtron Enterprise DT-34 Release Note

NGINX Ingress Helm Chart Install/Upgrade Sync Stuck with Argo CD