Lots of DNS errors while developing a Jira Plugin

ChrisWininger · October 27, 2023, 1:49pm

I am working on a plugin to be deployed to the Atlasian market place for Jira cloud. I am leveraging Forge, ui-kit and the various apis.

While testing against my developer cloud environment, chriswininger.atlasian.com I get intermittent but frequent DNS lookup failures from the jira api, errors such as this:

getaddrinfo EAI_AGAIN api.atlassian.com

An example code snippet that results in this would be:

resp = await api.asApp().requestJira(route`/rest/api/3/issue`, {
  method: 'POST',
  headers: {
    'Accept': 'application/json',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(body)
});

It is making development and testing extremely difficult and I’m concerned about how this might perform in production. Is there any insight into why I am seeing this you can offer and/or any advice on how to prevent it?

PS: I posted first over here and it was suggested I repost here: Lots of DNS errors while developing a Jira Plugin

ChrisWininger · October 27, 2023, 6:42pm

hmm new data point, I’ve been seeing this when running forge tunnel from my mac-book which is using rancher-desktop to provide docker support.

I assumed all the plugin code was running server side on jira servers and that that the tunnel logs were just being forwarded along but…

When I run the code project using forge tunnel on my linux work station I am not seeing the issue. Any thoughts?

ibuchanan · October 27, 2023, 7:30pm

Welcome to the Atlassian developer community @ChrisWininger,

Since you discovered that forge tunnel is effectively running the code from your laptop, you’re well on your way to diagnosing the problem. The error indicates some kind of delay or interruption. Maybe dig api.atlassian.com would reveal something about what’s in between?

There might be something about how Rancher handles networking that isn’t 100% compatible with Forge’s Docker assumptions. You might look at For the Forge Tunnel, Podman is not Docker where I ruled out Podman for seemingly trivial incompatibilities. In the case of Rancher, it sounds like it could be in the networking assumptions, rather than CLI.

tied · October 30, 2023, 7:20am

On Friday and also today I had similar issues when running Forge commands like:

forge deploy

Error: request to https://api.atlassian.com/graphql failed, reason: getaddrinfo ENOTFOUND api.atlassian.com

Running it again solved the problem, but could be a DNS issue or in a cluster of servers a small amount of nodes are not healthy.

ibuchanan · October 30, 2023, 10:51am

@tied,

To be clear, the error does not indicate anything wrong with Atlassian infrastructure. DNS is itself a distributed system so DNS failures happen outside of Atlassian infrastructure. In both your case and for @ChrisWininger, the problem is with networks that sit between you and Atlassian, not on Atlassian servers. In neither case, do the error messages occur when the app is running inside Atlassian infrastructure. And nothing in this thread indicates potential for problems at runtime.

That said, DNS errors are still a dev loop problem, worthy of solving. If you, or anyone else, sees these problems, please provide more diagnostic information than the error message. We could only help diagnose if we know more about the route your computer is taking to resolve DNS (see my advice about dig above).

ChrisWininger · November 1, 2023, 5:18pm

I will see what I can find. I’m definitely suspicious of rancher. Especially given that problem goes away when you run it under pure linux. Unfortunately we do not have licenses for docker-desktop. I also ruled out using podman as a replacement so on my official work mac rancher is the only way I’ve been able to figure out how to run containers.

ChrisWininger · November 1, 2023, 5:25pm

I installed and ran dig from inside of the container created by forge tunnel, to make sure the context was all the same, here is what I get:

root@43efc8e3dfc8:/app# dig api.atlassian.com

; <<>> DiG 9.11.5-P4-5.1+deb10u9-Debian <<>> api.atlassian.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44430
;; flags: qr rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;api.atlassian.com.             IN      A

;; ANSWER SECTION:
api.atlassian.com.      5       IN      A       104.192.142.12
api.atlassian.com.      5       IN      A       104.192.142.14
api.atlassian.com.      5       IN      A       104.192.142.13

;; Query time: 25 msec
;; SERVER: 192.168.5.3#53(192.168.5.3)
;; WHEN: Wed Nov 01 17:22:44 UTC 2023
;; MSG SIZE  rcvd: 134

I can’t say networking is my specialty anything look odd? Nothing jumps out at me. I’ll run it a few times and see if it’s consistant

ChrisWininger · November 1, 2023, 5:26pm

ok third try:

root@43efc8e3dfc8:/app# dig api.atlassian.com

; <<>> DiG 9.11.5-P4-5.1+deb10u9-Debian <<>> api.atlassian.com
;; global options: +cmd
;; connection timed out; no servers could be reach

ChrisWininger · November 1, 2023, 5:34pm

tried from the host, also tried from inside an arbitrary container using ubuntu image, haven’t reproduced the timeout from either of these contexts, though I do see that some dns lookup take a a bit longer than others

tied · November 1, 2023, 5:46pm

Hi @ibuchanan,

everything what you say makes sense. The problem is totally random. It fails once. Run it again and it works. So Verbose doesn’t really help because when I add verbose it works.

Is the Forge CLI internally using a service of OneTrust? I don’t know some kind of analytics, etc.
Looking at my Pi-hole logs I see requests to geolocation.onetrust.com of type A, AAAA, HTTPS that are blocked.

I’ll have added it to the whitelist and will look if it’s improving the situation.

ibuchanan · November 1, 2023, 5:59pm

Thanks for pressing ahead @ChrisWininger,

I’m especially glad you caught a local DNS failure. I Googled a bit for DNS errors with Rancher, which leads into the wide world of k8s via kube-dns. The answers are all over the place, both in terms of URLs to link, and the solutions themselves.

But 1 root cause that we might fix first because it is a common cause of intermittent network protocol failures: time drift. As a quick fix, you might check the Mac OS settings. If that doesn’t work, maybe we have to check the Rancher ntp settings.

Although I completely understand your constraints, Forge was designed to work with Docker specifically, not Rancher, so I wouldn’t be able to help troubleshoot any deeper. So that might leave us with some choices to make about “work around”.

Maybe ignore the error? As I explained earlier, the deployed app isn’t going to be running in an environment where DNS is frequently failing.

Or, maybe try out the Forge native Node.js runtime. It’s a preview feature so there are caveats. However, the runtime does not require Docker for tunneling so you wouldn’t need to solve for DNS in Rancher.

ibuchanan · November 1, 2023, 6:07pm

@tied,

True. The diagnostics we’d need are going to be at a different layer, below Node. Are you also running Rancher or is this happening for you in Docker? @ChrisWininger’s comparison of environments was really helpful to isolate the problem.

Also, when are you hitting these DNS errors? @ChrisWininger described hitting them inside the runtime when his code is trying to make API requests to Jira. Is yours in the same area, or during CLI commands? Both call api.atlassian.com but in very different ways.

ChrisWininger · November 1, 2023, 6:24pm

thx for all the help. Oddly after a system reboot it actually seems to be working again. I think I’ve seen this before but then it degraded again over time.

I checked the mac settings it is set to get time automatically.

I’ll give the native node runtime a try and if nothing else maybe next time this starts happening I’ll just go for tried and true “turn it on and off again”

PS

I did test in docker-desktop just to see if it resolved the issue and I did not see the issue under docker-desktop, but… that was also after a reboot so hard to say if docker-desktop really fixed it or rebooting fixed it

ibuchanan · November 1, 2023, 6:27pm

The classic infrastructure fix.

ChrisWininger · November 1, 2023, 7:56pm

sadly it did not take long for the issue to return. I have since been using the native node runtime. I have not seen the issue with the native node runtime