Safeguards in Jira DC index

We’re introducing a limit to how indexation works.

What is changing?

Jira will now index only topN (can be set to 500, 1000, or 10_000) of the following issue-related entities: comments, changehistory, and worklog. Any reindexing operation that triggers full issue reindex, including full foreground reindex and indexer#reindex(issue, all), will reindex a limited number of topN of related entities.

With this feature enabled, we guarantee that the newest N of entities will be reindexed. We are not deindexing entities that fall out of the pool of the newest N, so the actual value of indexed entities can be bigger than this default value, but they will not be reindexed in the future.

Why is it changing?

Jira reindexing quality and index consistency got much better moving from Jira 6.x to Jira 8.x. Many things were fixed, and we can safely assume that the quality of reindexing is good. As reindexing got more reliable, it still can be a very expensive operation. Triggered multiple times can cause the overload of the indexing queue and lead to significant performance problems (high CPU usage and timeouts for end-users), sometimes leading to instance failure.

You can find more details in the following ticket: [JSDSERVER-10886] SLA configuration changes create indexing pressure on the instance - Create and track feature requests for Atlassian products.

What do I need to do?

You don’t need to take any action. The feature is enabled by default with a limit set to 1000 related entities (comments, changehistory, worklog). However, based on specific use cases, Jira admins can change default topN values for these entities by setting a system property to the desired limit. System property names for issue entities:

  • For comments: jira.safeguards.indexing.issue.comments
  • For change history items: jira.safeguards.indexing.issue.changelogs
  • For worklogs: jira.safeguards.indexing.issue.worklogs

It’s also possible to disable the feature by setting the corresponding system property to -1.

When is it changing?

This upgrade is included in Jira 8.22.2. Following the 8.22.x release, the upgrade will be available in Jira 9.0.1 and 9.1.0.

References

As part of the feature, we’ve added stats to help you gather information around the indexing of issue-related entities. You can find more details in this KB Jira indexing-limits stats | Jira | Atlassian Documentation

8 Likes

I wonder if this indexing optimization will have (potential) impact on ScriptRunner scripted fields which are indexing sensitive. It’d be good to hear from Adaptavist on this matter.

Kamran

Index Safeguards extension: Filtering out items with unsupported fields

We’re introducing a new limit for the indexing of change items.

What is changing?

We’re extending the functionality of Safeguards in Jira index that changes the default behavior of change items indexing.

Now, change items with unsupported fields will be filtered out before they’re collected into change groups. As a result, when there’s a group of change items with unsupported fields only, it won’t be indexed. So, no document will be created for it in a database.

The feature is enabled by default and allows for indexing only six fields in change items:

* Assignee
* Fix Version
* Priority
* Reporter
* Resolution
* Status

Why is it changing?

With Jira 8.22.2, we’ve introduced Safeguards to alleviate pressure on the instance caused by the overload of indexing queues. According to our analysis, the new limit on the number of fields will remove about 94% of redundant Jira indexes for change items. This is vital to the health of the instance and will only impact large and long-running issues.

What do I need to do?

You don’t need to take any action. The feature is enabled by default with the limit of six fields in the change items index:

* Assignee
* Fix Version
* Priority
* Reporter
* Resolution
* Status

But in specific use cases, Jira administrators can change the default behavior.

For example, when there’s a need to use the UpdatedBy JQL function, a Jira administrator can configure the following system property: jira.safeguards.indexing.issue.changelogs.do.not.filter.out.unsupported.fields.

  • If this system property isn’t set up, change items with unsupported fields will be filtered out. This is the default behavior.
  • If this system property is set to true, unsupported fields won’t be filtered out.
  • The number of indexed change groups will be limited to DEFAULT_INDEXING_LIMIT.
  • To enable the old search functionality for historical changes, use the mentioned system property along withjira.safeguards.indexing.issue.changelogs (it defines how many groups should be indexed).

Note that if unsupported fields aren’t filtered out, this may cause performance degradation for large issues. To identify issues with large change items and change groups, use a special database query. Learn more about it in this document Retrieve issue change history from database in Jira server | Jira | Atlassian Documentation

The set of supported fields can be changed with IndexedChangeHistoryFieldManager. But this should be a feature request. Learn how to raise a request in Getting help.

When is it changing?

The feature is introduced in Jira 8.22.3.

Learn more about how to modify the default behavior here: Introducing Safeguards to Jira indexation KB

This article What’s changed in Jira after the implementation of indexing limits answers some common questions related to Jira indexing limits and explains differences in the way Jira works since the update.

Index Safeguards adjustment: New values for default limits

We’re adjusting the default limits of the topN issue-related entities.

What is changing?

Following performance tests we decided to adjust the default Index Safeguards limits. The new limits are as follows:

comments: 500
worklogs: 100
changehistory: 100

Why is it changing?

We performed extensive performance tests. We set up an 8-node cluster and we used 30 threads to constantly send requests to each node (240 threads in total). The requests consist of both reads and writes, simulating heavy user traffic. We then used Jira Indexing Queue stats to observe how the indexing performance changes with different limit values.

Choosing the right limit values is an act of balancing between functionality and usability. On one extreme a user can search in all 50000 comments of an issue, but it takes ages to index anything. On the other extreme the system is very fast, but searching for comments doesn’t work. With this in mind we chose the new default limits.

Jira will index the latest 500 comments. Comment search is more popular than change history/worklog search, so we allowed a higher number here to lean towards the functionality. Also, comments are indexed incrementally (adding a new comment causes only this comment to be indexed), so we are safe with a more relaxed limit here.

Jira will index the latest 100 change history items and worklogs. These are less often searched for, so we can lean towards indexing performance here. Moreover, all change items of an issue are always indexed at once, making it much more expensive, further incentivising a tougher limit.

When is it changing

The new default limits are introduced in Jira 8.22.4.

The impact

We took a look at Jira stats coming from a production instance of one of our large clients to assess the impact the new limits would have on them. We used a sample of 1.3M indexing operations. We observed that only in 0.3% of cases the number of comments exceeded the limit, meaning not all comments were indexed. For worklogs this was 0% and for changehistory 0.1%. For exact numbers head down to the bottom of this article.

We believe this trade-off is fair taking into account the increased Jira indexing stability.

Jira stats deep dive

This section is only for those who love numbers. :nerd_face:

We ran our performance tests to observe how Jira indexer is coping with different limits.

No limit -1

Without any limits the cluster quickly became unstable. Change history items had to wait in a queue on average 743ms (many waited for over 10s!) to be indexed and the index update process many times took over 1s, meaning less than one update per second was done.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 743,
        "distributionCounter":
        {
            "0": 2773,
            "1": 90,
            "10": 38,
            "100": 214,
            "1000": 573,
            "10000": 727,
            "20000": 44,
            "30000": 0,
            "9223372036854775807": 1
        }
    },
    "timeToUpdateIndexMillis":
    {
        "avg": 158,
        "distributionCounter":
        {
            "0": 3749,
            "1": 27,
            "10": 161,
            "50": 59,
            "100": 36,
            "500": 113,
            "1000": 147,
            "9223372036854775807": 167
        }
    },
    "totalTimeMillis":
    {
        "avg": 796,
        "distributionCounter":
        {
            "0": 1215,
            "1": 1023,
            "10": 393,
            "100": 313,
            "1000": 641,
            "10000": 820,
            "20000": 53,
            "30000": 0
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 76,
    },
}

Limit 1000

With the previous default limit of 1000 the situation was much better, but when we reached the maximum load the indexer still couldn’t withstand the amount of work. Time to update the index still exceeded 1s.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 515,
        "distributionCounter":
        {
            "0": 3190,
            "1": 249,
            "10": 239,
            "100": 622,
            "1000": 2683,
            "10000": 1041,
            "20000": 10,
            "30000": 7
        }
    },
    "timeToUpdateIndexMillis":
    {
        "avg": 93,
        "distributionCounter":
        {
            "0": 6665,
            "1": 196,
            "10": 194,
            "50": 77,
            "100": 27,
            "500": 424,
            "1000": 303,
            "9223372036854775807": 155
        }
    },
    "totalTimeMillis":
    {
        "avg": 609,
        "distributionCounter":
        {
            "0": 1329,
            "1": 294,
            "10": 1460,
            "100": 863,
            "1000": 2812,
            "10000": 1264,
            "20000": 9,
            "30000": 9
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 1,
    },
}

Limit 500

The limit of 500 brought another improvement. The average time to update the index fell down to 27ms, allowing for ~37 issue updates per second. Unfortunately, there were still 21 updates that spent over a second on the critical path. There were no timeouts anymore, though.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 79,
        "distributionCounter":
        {
            "0": 5616,
            "1": 170,
            "10": 219,
            "100": 2019,
            "1000": 2774,
            "10000": 80,
            "20000": 0,
            "30000": 0
        }
    }, 
    "timeToUpdateIndexMillis":
    {
        "avg": 27,
        "distributionCounter":
        {
            "0": 9300,
            "1": 44,
            "10": 101,
            "50": 100,
            "100": 81,
            "500": 1211,
            "1000": 20,
            "9223372036854775807": 21
        }
    },
    "totalTimeMillis":
    {
        "avg": 108,
        "distributionCounter":
        {
            "0": 2611,
            "1": 482,
            "10": 2054,
            "100": 2028,
            "1000": 3595,
            "10000": 108,
            "20000": 0,
            "30000": 0
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 0,
    },
}

Limit 100

The limit of 100 brought what we were aiming for - a cluster that is stable under heavy load. No timeouts, a short time spent in the indexer queue and on the writing thread (average 3ms for both) and only a single update that took over a second.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 3,
        "distributionCounter":
        {
            "0": 9440,
            "1": 201,
            "10": 530,
            "100": 1120,
            "1000": 18,
            "10000": 0,
            "20000": 0,
            "30000": 0
        }
    },
    "timeToUpdateIndexMillis":
    {
        "avg": 3,
        "distributionCounter":
        {
            "0": 9875,
            "1": 23,
            "10": 125,
            "50": 1203,
            "100": 75,
            "500": 6,
            "1000": 1,
            "9223372036854775807": 1
        }
    },
    "totalTimeMillis":
    {
        "avg": 8,
        "distributionCounter":
        {
            "0": 3795,
            "1": 2879,
            "10": 2036,
            "100": 2548,
            "1000": 50,
            "10000": 1,
            "20000": 0,
            "30000": 0
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 0,
    },
}

Real production instance indexing-limits stats

The data from one of our large clients proves the new limits will have minimal usability impact. Only very few issues would have their comments/changehistory items not fully indexed.

[JIRA-STATS] [INDEXING-LIMITS] total stats: ... data={
 ...
     "numberOfComments":{
          "count":1334984,
          "min":0,
          "max":2632,
          "sum":6588589,
          "avg":4,
          "distributionCounter":{
             "0":296176,
             "1":525645,
             "10":434660,
             "100":69058,
             "1000":9064,
             "10000":381,
             "20000":0,
             "50000":0
          }
       },
       "numberOfWorklogs":{
          "count":1334964,
          "min":0,
          "max":9,
          "sum":163,
          "avg":0,
          "distributionCounter":{
             "0":1334861,
             "1":78,
             "10":25,
             "100":0,
             "1000":0,
             "10000":0,
             "20000":0,
             "50000":0
          }
       },
       "numberOfChangeHistory":{
          "count":1334964,
          "min":1,
          "max":2455,
          "sum":10138251,
          "avg":7,
          "distributionCounter":{
             "0":0,
             "1":90395,
             "10":1111568,
             "100":131177,
             "1000":1469,
             "10000":355,
             "20000":0,
             "50000":0
          }
       },
...
}

Jira Stats

Jira Stats proved to be very useful for our performance assessment. You can use them too to evaluate the performance of your test and production environments! Find out more in this knowledge base article.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.