It is making development and testing extremely difficult and I’m concerned about how this might perform in production. Is there any insight into why I am seeing this you can offer and/or any advice on how to prevent it?
Since you discovered that forge tunnel is effectively running the code from your laptop, you’re well on your way to diagnosing the problem. The error indicates some kind of delay or interruption. Maybe dig api.atlassian.com would reveal something about what’s in between?
There might be something about how Rancher handles networking that isn’t 100% compatible with Forge’s Docker assumptions. You might look at For the Forge Tunnel, Podman is not Docker where I ruled out Podman for seemingly trivial incompatibilities. In the case of Rancher, it sounds like it could be in the networking assumptions, rather than CLI.
To be clear, the error does not indicate anything wrong with Atlassian infrastructure. DNS is itself a distributed system so DNS failures happen outside of Atlassian infrastructure. In both your case and for @ChrisWininger, the problem is with networks that sit between you and Atlassian, not on Atlassian servers. In neither case, do the error messages occur when the app is running inside Atlassian infrastructure. And nothing in this thread indicates potential for problems at runtime.
That said, DNS errors are still a dev loop problem, worthy of solving. If you, or anyone else, sees these problems, please provide more diagnostic information than the error message. We could only help diagnose if we know more about the route your computer is taking to resolve DNS (see my advice about dig above).
I will see what I can find. I’m definitely suspicious of rancher. Especially given that problem goes away when you run it under pure linux. Unfortunately we do not have licenses for docker-desktop. I also ruled out using podman as a replacement so on my official work mac rancher is the only way I’ve been able to figure out how to run containers.
tried from the host, also tried from inside an arbitrary container using ubuntu image, haven’t reproduced the timeout from either of these contexts, though I do see that some dns lookup take a a bit longer than others
everything what you say makes sense. The problem is totally random. It fails once. Run it again and it works. So Verbose doesn’t really help because when I add verbose it works.
Is the Forge CLI internally using a service of OneTrust? I don’t know some kind of analytics, etc.
Looking at my Pi-hole logs I see requests to geolocation.onetrust.com of type A, AAAA, HTTPS that are blocked.
I’ll have added it to the whitelist and will look if it’s improving the situation.
I’m especially glad you caught a local DNS failure. I Googled a bit for DNS errors with Rancher, which leads into the wide world of k8s via kube-dns. The answers are all over the place, both in terms of URLs to link, and the solutions themselves.
But 1 root cause that we might fix first because it is a common cause of intermittent network protocol failures: time drift. As a quick fix, you might check the Mac OS settings. If that doesn’t work, maybe we have to check the Rancher ntp settings.
Although I completely understand your constraints, Forge was designed to work with Docker specifically, not Rancher, so I wouldn’t be able to help troubleshoot any deeper. So that might leave us with some choices to make about “work around”.
Maybe ignore the error? As I explained earlier, the deployed app isn’t going to be running in an environment where DNS is frequently failing.
True. The diagnostics we’d need are going to be at a different layer, below Node. Are you also running Rancher or is this happening for you in Docker? @ChrisWininger’s comparison of environments was really helpful to isolate the problem.
Also, when are you hitting these DNS errors? @ChrisWininger described hitting them inside the runtime when his code is trying to make API requests to Jira. Is yours in the same area, or during CLI commands? Both call api.atlassian.com but in very different ways.
thx for all the help. Oddly after a system reboot it actually seems to be working again. I think I’ve seen this before but then it degraded again over time.
I checked the mac settings it is set to get time automatically.
I’ll give the native node runtime a try and if nothing else maybe next time this starts happening I’ll just go for tried and true “turn it on and off again”
I did test in docker-desktop just to see if it resolved the issue and I did not see the issue under docker-desktop, but… that was also after a reboot so hard to say if docker-desktop really fixed it or rebooting fixed it