API_TOKEN_DB_LIMIT_EXCEEDED has multiple projects in ruins

IainDooley · May 29, 2022, 10:37pm

Hi there, on Friday multiple accounts across different clients started seeing API_TOKEN_DB_LIMIT_EXCEEDED

These applications have been running for years without ever seeing this. At first it appeared to correlate with boards that had lots of archived cards, but that’s seeming less likely the more it pops up.

I’ve never seen this error before but it has brought the systems for multiple (paid) Trello subscribers to a halt and rendered their businesses inoperable. I can’t seem to get past it, and the suggestion on the documentation is to “back off” queries – well that will entail some sort of ground up rewrite because the only rate limit we have ever had to worry about in the past is queries within a given time period.

Are we the only ones experiencing this?? I can’t see any other reports of similar problems …

remie · May 30, 2022, 1:14am

I would suggest reposting this as a critical cloud incident topic

Mattlassian · May 31, 2022, 3:28pm

Chiming in to say same here. Started Thursday morning for us. Headers are showing NaN for X-Rate-Limit-Db-Query-Time-Remaining e.g. ‘X-Rate-Limit-Db-Query-Time-Remaining’: ‘NaN’

MattS · June 1, 2022, 3:11pm

Contacted Atlassian multiple ways. Have heard no acknowledgement of a problem. Multiple customers are blocked. Where is trello staff?? @bentley

bentley · June 1, 2022, 5:00pm

Acknowledging that we’re looking into this. Will share more ASAP.

Mattlassian · June 1, 2022, 5:52pm

Thank you for the update @bentley !

bentley · June 1, 2022, 6:24pm

We believe that we have a fix going out. I will reply again when it is live and we can confirm that we’re seeing a drop in API_TOKEN_DB_LIMIT_EXCEEDED errors.

Mattlassian · June 1, 2022, 6:57pm

Thank you again for the update. Happy to hear a fix is on the way.

bentley · June 1, 2022, 7:02pm

The fix has been deployed and we’re seeing a return to normal for the errors.

bentley · June 1, 2022, 7:39pm

A bit more info on what happened: We shipped an update to the way we track API keys’ DB usage. The change was expected to improve the accuracy of our tracking, but resulted in us improperly tracking and overcounting usage. That overcount resulted in API keys hitting the DB limit sooner than they normally would have.

remie · June 1, 2022, 7:44pm

@Bentley thanks for fixing the issue and providing the background information.

It seems that this incident started last Thursday for some of the vendors. It took until Monday for people to post on CDAC about it, and until today before anyone from Atlassian noticed / responded.

What would have been the appropriate way for the Trello developer community to get a quicker response from Atlassian? Should they also use the Critical Cloud Incident topic, and if so, has this been communicated?

IainDooley · June 1, 2022, 11:04pm

Hi @bentley thanks for the update, how long can we expect before things return to normal? I’m still seeing this error in all my retry queues.

@remie this started for me last Friday morning Sydney time, I emailed Trello support and heard back relatively quickly from someone in APAC but they had to send it to the US. No-one obviously looked at it Friday (their time) and it was a public holiday Monday (their time).

I posted here after I didn’t hear back from support via email.

IainDooley · June 2, 2022, 5:05am

@bentley still seeing API_TOKEN_DB_LIMIT_EXCEEDED on multiple accounts, is there anything I need to do in order to “reset” this or something? Is this just a matter of the code change propagating through your servers or something?

bentley · June 2, 2022, 1:15pm

Can you DM me the API key you’re using?

bentley · June 2, 2022, 1:23pm

Although we’ve been making some progress in trying to bring Trello’s developer-related programs/processes in-line with the broader Atlassian ecosystem’s, we’re not all the way there yet. Notably, we haven’t yet looped them into the full Marketplace programs stuff - which includes access to the Critical Cloud Incident category.

We do have Jira Service Management which is where I would start. Failing to get a response there, posting here and DMs to @bentley are usually a good approach .

bentley · June 2, 2022, 1:24pm

No, this is completely rolled out and we immediately saw the error rates return to normal.

What are you getting back in the headers regarding rate limits?

Mattlassian · June 2, 2022, 1:35pm

FWIW, some of our tasks recovered right away but overall took 4 or 5 hours for all our retrying tasks to recover. We were still seeing ‘X-Rate-Limit-Db-Query-Time-Remaining’: ‘NaN’ in headers.

bentley · June 2, 2022, 1:44pm

Do you know when you started seeing those? This was a totally unrelated change to the one that caused the initial spike in errors, but it is a high priority item being investigated now. And it is unrelated to how we actually calculate limits.

Mattlassian · June 2, 2022, 2:00pm

I don’t know when it started - I only noticed it when I started looking into the rate limiting problem and printing out headers. (Last Thursday)

Mattlassian · June 2, 2022, 2:36pm

Hey @bentley - I’m still seeing some rate limited api requests (API_KEY_DB_LIMIT_EXCEEDED) that I wasn’t seeing before last week. Is this just due to the new way of tracking DB usage? Trying to gauge whether or not we need to dive in an start debugging/changing our code.