Slow Lucene issue search using SearchProvider and custom Issue Collector

vasiliy.krokha · May 27, 2021, 9:22am

Hello,

In our app we perform issue search using Lucene index, but it takes 40-60 seconds to handle 10-15K issues. Quite often it takes even more time so that the whole search fails with TimeOut exception later in REST calls that trigger the search.

Any help/suggestions on what we do wrong is very much appreciated.

The IssueCollector code is:

public class IssueIdHitCollector extends SimpleCollector {
    private static final Logger logger = LoggerFactory.getLogger(IssueIdHitCollector.class);
    private final ObjLongConsumer<Document> issueCollector;
    private final Set<String> fields;
    private LeafReader leafReader;

    public IssueIdHitCollector(ObjLongConsumer<Document> issueCollector, String... fields) {
        this.issueCollector = issueCollector;

        this.fields = (fields.length == 0 ? ImmutableSet.of(DocumentConstants.ISSUE_ID) : ImmutableSet.<String>builder().add(DocumentConstants.ISSUE_ID).add(fields).build());
    }

    protected void doSetNextReader(LeafReaderContext context) throws IOException {
        this.leafReader = context.reader();
    }

    public boolean needsScores() {
        return false;
    }

    public void collect(int doc) throws IOException {
        try {
            Document document = this.leafReader.document(doc, this.fields);
            if (document != null) {
                String v = document.get("issue_id");
                try {
                    collectIssue(document, Long.parseLong(v));
                } catch (NumberFormatException e) {
                    logger.warn("error reading issue document " + doc + " id [" + v + "]", e);
                }
            }
        } catch (IOException e) {
            logger.warn("error reading issue document " + doc, e);
        }
    }

    private void collectIssue(Document document, long id) {
        this.issueCollector.accept(document, id);
    }
}

IssueConsumer code is:

public class IntervalIssuesConsumer implements ObjLongConsumer<Document> {

    private List<IssueObject> issues;

    private Estimation estimation;

    public IntervalIssuesConsumer(Estimation estimation) {
        issues = new ArrayList<>();
        this.estimation = estimation;
    }

    public List<IssueObject> getIssues() {
        return issues;
    }

    @Override
    public void accept(Document d, long value) {
        issues.add(createIssueObject(d));
    }

    private IssueObject createIssueObject(Document d) {
        String estimate = "0";
        switch (estimation.getType()) {
            case EstimationType.FIELD: {
                estimate = d.get(estimation.getField().getFieldId());
                if (estimate == null) {
                    estimate = "0";
                }
                else if (EstimationField.isTimeEstimateField(estimation.getField().getFieldId())) {
                    estimate = String.valueOf(Long.parseLong(estimate, 36));
                }
                break;
            }
            case EstimationType.ISSUE_COUNT:
                estimate = "1";
                break;
        }


        return new IssueObject(
                d.get(DocumentConstants.ISSUE_ID),
                d.get(DocumentConstants.ISSUE_KEY),
                d.get(DocumentConstants.ISSUE_TYPE),
                d.get(DocumentConstants.ISSUE_CREATED),
                d.get(DocumentConstants.ISSUE_STATUS),
                d.get(DocumentConstants.ISSUE_SUMMARY),
                null,
                estimate
        );
    }
}

m.herrmann · May 27, 2021, 11:46am

Hi @vasiliy.krokha ,

While I may not be able to point out why it may be slow, we are currently using the com.atlassian.jira.bc.issue.search.SearchService for our custom issue collector as the regular one is too slow. I just worked on a customer ticket that involves around 10k issues and after some changes we currently sit at around 200ms per 1k issues (depending on the amount of collected fields) on our DC test system. However I have yet to find out why the customer system does not even come close to these numbers and run into our self implemented timeout while paging through the issues. Either the expected number of number of issues are incorrect or somehow the jql search is much slower.

Not all of these points may be related to your example, but what I took out of my work is:

if you search for something like “key in (PROJ-1, PROJ-2, PROJ-3)” each issueKey will cause a separate database query. “id in (10001, 10002, 10003)” seems to be more effective
There may be some cases where a simple method on the issue object will cause a huge delay or unexpected database queries. Using a Manager to get the IssueTypes / CustomFields may not have an internal mapping and will iterate over every item everytime you call it.
Datacenter environments have a database not on the same system and as a result high amount of queries should be prevented. Activate the database loggin in a local environment to check if there are a lot of unexpected queries and in case there are a lot of queries, use the debugger and something like tail on the log file to find the place where the queries are caused. You can enable the query trace in h2 database like this

SET TRACE_LEVEL_FIle=2;
SET TRACE_LEVEL_SYSTEM_OUT=2;

As for your code, I would take a look at all method calls within your methods and check if they may be more expensive than you think and try to optimize them. For Example the estimation.getField().getFieldId() is called very often within the createIssueObject while it does not seems to change over the whole process.
You may change the Collector with a very simple Collector that only collect the issueIds and return them. Maybe it’s not related to the collector itself. I started with this one:
https://bitbucket.org/atlassianlabs/jira-examples/src/master/src/main/java/com/atlassian/jira/examples/collector/IssueCountPerProjectCollector.java

vasiliy.krokha · May 27, 2021, 1:37pm

Hi @m.herrmann

Thank you very much for your answer.

In our implementation, we used SearchProvider, avoided sorting, and loaded the minimal number of system fields like id, key, status, type, summary from Lucene documents thus avoiding database queries at all. There is a perfect answer from Atlassian here - How to avoid Out Of Memory error using SearchProvider - #3 by kcichy

Where we stopped in our investigation is that pure Lucene search with loading minimal issue details takes 1-2 minutes for 10-15k issues in a Customer environment, while in our test environment it takes less than 1 second to find 40k issues.

m.herrmann · May 27, 2021, 2:30pm

I want to switch to that variant too, but I’m struggling a bit with the way to get the CustomField values (negative numbers are not working as expected).
Maybe we are running into the same/simmilar issue here. As I didn’t solve the customer issue yet, my remaining guess would be that it’s something lucene or system related.
It may be an issue with the disk I/O or a higher network latency between the node and volume than our test environments.

kcichy · May 27, 2021, 3:31pm

I had a look at your code and, assuming EstimationField.isTimeEstimateField does not go to the DB, it all looks good to me.

I’m wondering if this slowness comes from your collector, or maybe from the query? You could check that by running the same query with a primitive collector, e.g. com.atlassian.jira.jql.query.IssueIdCollector from jira-core. If that search is also slow, you can rule out your collector from the suspect list.

m.herrmann · May 28, 2021, 2:24pm

In case you have a support file from the customer, you could check the slow-queries log, in our case the query of 7k issues with a simple query like the following took 10 seconds, while it takes a few hundred ms on our test environment.

"Epic Link" in (xxxx, xxxx, xxxx, xxxx, xxxx)

I still have to check what may cause this with the customer