Todays Confluence Cloud search incident

Hi, this morning we saw requests to Confluence search failing - the incident seemed to be resolved pretty quickly :clap:
We have captured a response with body, which looked like this:

400 Bad Request
{
  "statusCode" : 400,
  "data" : {
    "authorized" : false,
    "valid" : true,
    "errors" : [ ],
    "successful" : false
  },
  "message" : "com.atlassian.confluence.api.service.exceptions.BadRequestException: 
CQL was parsed but the search manager was unable to execute the search. CQL: '/*redacted :) */'.
Error message: java.lang.RuntimeException: Hystrix circuit short-circuited and is OPEN"
}

I was wondering why we get a 400 here, which means we did something wrong. As the message exposes thereโ€™s a circuit breaker in open state, so I would hope that situation gets resolved pretty quick - ideally even automatically.
So getting a 429 or 503 might enable API users to retry the requests. For a background/batch task that runs for hours it can make all the difference if it can wait a few minutes (using exponential backoff etc. - see Rate limiting) instead of failing and having to start over.
Any comments or plans to change that?

1 Like