Connect SpringBoot app accumulates sockets as open file descriptors

I have this very peculiar problem that I believe might be a bug in the Connect framework.

Background: My app is an Atlassian Connect SpringBoot (2.2.3) Jira app running on AWS Fargate in a Docker container running an Alpine Linux base image.

As soon as the container boots up, it starts accumulating open sockets (as File Descriptors). The max allowed file descriptor count on my Linux config is 4096 at the moment. It takes about 8-10 days for my service to reach that level and then the container starts getting the error below because it can’t create new FDs. Fargate health checks start to fail. Then Fargate kills the container and spawns a new one. The cycle starts all over again.

2022-02-18T00:40:24.536+03:00 2022-02-17 21:40:24.535 ERROR 1 — [o-8080-Acceptor] org.apache.tomcat.util.net.Acceptor : Socket accept failed
2022-02-18T00:40:24.536+03:00 java.io.IOException: No file descriptors available
2022-02-18T00:40:24.536+03:00 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[na:1.8.0_275]

Below are my findings:

  • An investigation with “lsof” command shows that it is not files that are accumulating but network sockets. Something opens network sockets and doesn’t close them.
  • Deeper investigation with “ss -epr” shows that the open sockets are ONLY to these specific URLs below:
    *** ec2-18-246-31-137.us-west-2.compute.amazonaws.com:https
    *** ec2-18-246-31-138.us-west-2.compute.amazonaws.com:https
    *** ec2-18-246-31-139.us-west-2.compute.amazonaws.com:https
  • The sockets are open but the received and send packet counts do not change over time. Received 32, Send 0.
  • Trying to navigate to these URLs revealed that these are 3 decommissioned Jira Cloud instances. I don’t know who they used to belong to because I get the usual error for all of them “Your Atlassian Cloud site is currently unavailable.”
  • HTTP logs show no requests coming from these IPs.
  • I can’t reproduce the problem in our test environment. (Believe me, we tried hard)

Since (or should I say if) these are decommissioned instances, they can’t be sending me requests. Right? These connections must have been initiated from our side. But why? And why only these three instances? When I spawn a new container, it is a new container with a clean OS and clean file system. Only the DB is old so it must be data-dependent. Where does it get the idea of connecting to these instances? What is it sending?

I tend to think that this is a Connect framework bug. These were probably old customer instances. For some reason, my service is (probably) sending these instances some requests periodically and leaving open sockets.

But why? I have no way to debug this further. Any help or direction will be greatly appreciated.

Update on this issue:
We built an HTTP Request Interceptor and logged each HTTP request sent out from our containers. Didn’t help.
The open File Descriptors of the OS keep piling up but there is nothing unexpected in the logged data.

This is pretty much a dead-end for us. We have no more leads to follow and we don’t have the means to investigate further.

@emre.toptanci I could only think of one single place where we load files, and indeed, we are reading from an InputStream without closing it :man_facepalming:

I have raised [ACSPRING-151] - Ecosystem Jira for this. A fix is on the way.

Regarding the issue you suspect with decomissioned Jira instances, you may want to raise that in the Developer Support Service Desk.

1 Like

@emre.toptanci the above-mentioned fix was included in the 2.3.1 release today.

Hi team,

This is now logged as

Please watch the ticket for updates.

James.

@epehrson

Hello again,

I added a new comment on the ACSPRING-152 ticket. I wanted to inform you out of here in case you missed it because the ticket was closed.

Omer.

1 Like

For anyone here at CDAC, the Connect fix mentioned above did not solve the problem. We are still looking for a solution. Any suggestions are welcome.