Changes to Custom Fields' implementation requiring action from you

DamianKedzierski · June 1, 2020, 5:25pm

A high number of custom fields can negatively impact performance, index size and indexing time in Jira. A big chunk of this impact is caused by the time it takes to index certain custom fields. This problem is especially painful for our biggest Data Center customers.

To minimise this impact, we’re introducing:

a view that surfaces custom fields that take longest to index,
optimizations that reduce the number of called field indexers
optimizations that reduce the amount of data stored in the Lucene index

A full reindex is required to benefit from the changes.

This post describes what are these changes about and how it can impact your apps.

Change 1: View the custom fields that take longest to index

When a custom field takes long to index, it can cause sudden indexing performance spikes. Normally, re-indexing time is not evenly distributed and there might be several fields which take up most of the indexing time. Now, instead of checking the logs for the stats, you can view them in Jira Data Center. Click Actions > Custom field indexing for a specific node to view the data (see EAP release notes for more details). If you see a custom field that is introduced by your app, you can take action to change its configuration to improve the overall performance.

Note: you can use these stats to validate how your app will impact reindexing time.

Change 2: Reducing the number of fields indexer calls when indexing an issue

Whenever an issue is being indexed, Jira retrieves all registered custom field indexers and calls their addIndex() method regardless of whether the custom field is applicable for the issue or not. This generates additional overhead and affects indexing time.

To reduce this overhead we’re introducing two improvements:

Optimization 1: We’re calling indexers only when their custom fields have values assigned

There is an experimental API exposed to mark your custom field type and the indexer being called only for existing values. See more details on how to use it here.

Optimization 2: We’re calling indexers only when their custom fields are applicable for the issue ( they are visible and have context assigned)

This optimization is enabled for each custom field and the indexer cannot be disabled selectively.
As a result, only the indexers for the custom fields that are applicable for the issue will be called.

Benefits

After reducing the number of called indexers, our tests showed a significant reduction in reindexing time (up to 70% improvement for Jira custom fields).

The optimizations will also decrease the response time for the actions that involve changing the issue (create / edit issue) and that trigger reindexing (adding or updating comments).

How does it impact my apps?

In order for a custom field type to benefit from optimization 1, you need to explicitly implement the new API. We’ve done it for standard Jira custom fields and you as App vendors need to opt-in to the new API to leverage the benefits for your custom field types. It means that your application should not be affected.

We want to hear your feedback on how the new APIs fulfils your use cases before we mark them as stable. Feel free to share it under this blog post.

Optimized built-in Jira custom fields:

Field Name	FieldType	Searcher	Indexer
Checkboxes	MultiSelectCFType	MultiSelectSearcher	SelectCustomFieldIndexer
Date Picker	DateCFType	DateRangeSearcher	LocalDateIndexer
Date Time Picker	DateTimeCFType	DateTimeRangeSearcher	DateCustomFieldIndexer
Number Field	NumberCFType	ExactNumberSearcher	NumberCustomFieldIndexer
Project Picker (single project)	ProjectCFType	ProjectSearcher	ProjectCustomFieldIndexer
Radio Buttons	SelectCFType	MultiSelectSearcher	SelectCustomFieldIndexer
Select List (cascading)	CascadingSelectCFType	CascadingSelectSearcher	CascadingSelectCustomFieldIndexer
Select List (multiple choices)	MultiSelectCFType	MultiSelectSearcher	SelectCustomFieldIndexer
Text Field (multiple line)	TextAreaCFType	TextSearcher	SortableTextCustomFieldIndexer
Select List (single choice)	SelectCFType	MultiSelectSearcher	SelectCustomFieldIndexer
Text Field (read only)	ReadOnlyCFType	TextSearcher	SortableTextCustomFieldIndexer
Text Field (single line)	RenderableTextCFType	TextSearcher	SortableTextCustomFieldIndexer
URL Field	URLCFType	ExactTextSearcher	ExactTextCustomFieldIndexer
Version Picker (multiple versions)	VersionCFType	VersionPickerSearcher	VersionCustomFieldIndexer
Version Picker (single version)	VersionCFType	VersionPickerSearcher	VersionCustomFieldIndexer

Optimization 2 can possibly affect your app since we stop calling indexers for ALL field types (also the custom types you can create) for the fields which are not visible or are not in the context of a specific issue.

In other words, if your app relies on writing something to the index (or executing some other code) for fields that are not visible or out of context, then this functionality will stop working. The end goal is to write to an index only when needed.

Also if your indexer inherits from the AbstractCustomFieldIndexer you need to check for the addDocumentFieldsNotSearchable() as it will not be called any more. If your indexer implements FieldIndexer you need to check for any logic called when isFieldVisibleAndInScope returns false.

Please let us know if you notice this change breaking your app.

How do I disable the optimizations?

You can use the system property jira.cfv.driven.indexing.disabled=true to disable executing indexers for fields having values assigned and jira.local.context.indexing.disabled=true to call indexers regardless of the field’s visibility and context.

Change 3: Removing redundant data from Lucene index

JQL supports sorting by the custom field values with the ‘ORDER BY’ clause:

project = MyProject ORDER BY MyCustomField

For this to work, Jira built-in custom fields store sorted_cf_name in the Lucene document. Whenever there is a custom field value assigned to an issue, it is stored in the index. However, when no value exists for a custom field, the sorting marker is stored instead (Double.MAX_VALUE, Long.MAX_VALUE or \ufffd depending on the custom field’s type).

However, the comparators used by Jira built-in custom fields are capable of sort null values correctly. For this reason, we decided not to store markers for null values in the Lucene index any more and rely on comparators to sort null values.

Benefits

We expect the reduction in the index size and reindexing time, depending on the number of custom fields and non-assigned values. On our internal instance (600k issues, 400 CFs, 2M CF values) we observed a 15% reduction in the index size.

How does it impact my apps?

You should not expect any functional changes if you use Jira built-in custom fields or define custom field types inheriting from the Jira ones. However, check that your app does not rely on any side effects of storing values in sort_ Lucene fields or access Lucene fields outside of Jira API.

Custom fields that may be affected by not storing null values:

Field Name	FieldType	Searcher	Indexer
Date Picker	DateCFType	DateRangeSearcher	LocalDateIndexer
Date Time Picker	DateTimeCFType	DateTimeRangeSearcher	DateCustomFieldIndexer
Import Id	ImportIdLinkCFType	ExactNumberSearcher	NumberCustomFieldIndexer
Hidden Job Switch	HiddenJobSwitchCFType	JobSearcher	SortableTextCustomFieldIndexer
Job CheckBox	JobCheckboxCFType	JobSearcher	SortableTextCustomFieldIndexer
Number Field	NumberCFType	ExactNumberSearcher	NumberCustomFieldIndexer
Text Field (multiple line)	TextAreaCFType	TextSearcher	SortableTextCustomFieldIndexer
Text Field (read only)	ReadOnlyCFType	TextSearcher	SortableTextCustomFieldIndexer
Text Field (single line)	RenderableTextCFType	TextSearcher	SortableTextCustomFieldIndexer
URL Field	URLCFType	ExactTextSearcher	ExactTextCustomFieldIndexer

How can my app benefit?

If your app uses custom indexers, we encourage you to review how your app handles non-existing values. Feel free to contact us to discuss your use case if needed.

How do I disable this feature?

You can use the system property jira.skip.indexing.null.disabled=true to disable it in case any problems arise or to compare with previous results.

Availability and download

These changes will be introduced in Jira Data Center 8.10. However, you can already test them in the EAP version. To start benefiting from the above optimisations a full reindex is required. You can download an EAP version here. Read more about the 8.10 EAP.

yvesriel · June 1, 2020, 6:17pm

Hi @DamianKedzierski

Just to confirm: So if a customer has a global context for his custom field (all projects, all issue types) but that the field is not associated to any screens for an issue, the field will not be indexed for that particular issue? In our experience, many customers had very large index files because fields had global contexts even though that the were used only on a few projects. That will definitely help.

And how long do we have before you actually release 8.10? We want to have an idea of the timeline to support and test the new API … and please don’t tell me it’s in a few days

Thanks,

Yves

DamianKedzierski · June 2, 2020, 8:46am

Hi @yvesriel

Thanks for your questions .

Fields having global context will be still indexed for every issue. What we want to improve is to make local contexts more performant, so customers will use it for each field that does not need to be global. With this change, Jira will execute field indexers only when the field is visible and in the scope. Previously, indexers have been called also for fields out of scope or invisible.

There is also another change reducing the number of executions for fields without any value assigned which applies to global fields as well.

Please feel free to call if you have more questions or concerns. We are happy to answer .

We plan to release 8.10 no sooner than 17 Jun. How much time do you think testing your app may take? I am wondering how much of the notice you wish to get?

marcin.kosmala · June 2, 2020, 9:22am

Hi @DamianKedzierski ,

What do you mean by “when the field is visible” ? Does it means that if field is not on any screen then indexers won’t be executed?

Thanks,
Marcin

yvesriel · June 2, 2020, 11:14am

Hi @DamianKedzierski,

Thanks for the reply! I must admit that I’m still a little bit confused. When you say:

Optimization 2 can possibly affect your app since we stop calling indexers for ALL field types (also the custom types you can create) for the fields which are not visible or are not in the context of a specific issue.

Can you explain exactly what visibility means? Since screen assignments is not considered, what’s left? “Hidden” as defined in a screen scheme?
I thought that issues not in the custom field’s context where already not indexed. This is what we suggest to customers with very large instances to reduce their index files size and it does affect the size of the index.

As for release date, it’s not that it takes very long to test the changes, it’s the fact that vendors are typically very small teams who are already working on something so we need some time to schedule these. Three weeks notice is already a good starting point

Cheers,

Yves

DamianKedzierski · June 2, 2020, 1:37pm

@yvesriel @marcin.kosmala

What do you mean by “when the field is visible” ? Does it means that if field is not on any screen then indexers won’t be executed?

Can you explain exactly what visibility means?

The custom field is considered as visible, when it is not hidden in Field Configuration menu. Screens are not used to calculate fields visibility - the field may be not assigned to any screen, but still be considered as visible.

I thought that issues not in the custom field’s context where already not indexed.

Unfortunately, it did not work this way. Before 8.10 field indexers have been called every time and had to decide what should be stored in Lucene index.

When implementing FieldIndexer interface directly, indexer has to check for field visibility on his own in addIndex() method.

On the other side, indexers extending AbstractCustomFieldIndexer have two implement two methods: addDocumentFieldsSearchable() and addDocumentFieldsNotSearchable().
The second one is called when the field is out of the context or hidden.
This behaviour was ambiguous and error prone apps - some apps could always store values in Lucene’s index for safety.

With 8.10, FieldIndexer.addIndex() will be called only for fields in the context and not hidden, while AbstractCustomFieldIndexer.addDocumentFieldsNotSearchable() will not be called anymore.

adam.labus · June 2, 2020, 2:37pm

@DamianKedzierski

Do you plan to add these features to the server version?

yvesriel · June 2, 2020, 2:50pm

Thanks @DamianKedzierski

I learned a lot today I really think that Altassian should come up with documentation on how to properly index the custom fields. We struggled a lot over this and found may issues the hard way.

Yves

DamianKedzierski · June 3, 2020, 9:18am

@adam.labus In this moment we do not plan to add these features to Server version.

DamianKedzierski · June 3, 2020, 9:20am

@yvesriel Thanks for your suggestion about creating the documentation for indexing custom fields.
I have added it to our backlog.

DamianKedzierski · June 3, 2020, 10:47am

I have got the info that EAP we published on Monday misses some of latests commits and there will be newer version released soon. Please hold on with testing till v2 is published.

I am sorry for the inconvenience.

a.belostotskiy · June 3, 2020, 11:02am

So if we would implement the experimental API, we’d lose backward compatibility?

DamianKedzierski · June 3, 2020, 11:29am

Yes, as unfortunately we could not optimised it without exposing additional API.

DamianKedzierski · June 4, 2020, 8:11am

EAP v2 is published, so you can proceed with testing

yvesriel · June 5, 2020, 9:31pm

Hi @DamianKedzierski,

We have started looking into the new API and I have two concerns:

Concern #1

The goal of this new API is to save indexing time and space. In our case we were not adding anything to the index when we were not supposed to but we now have to do twice the number of queries

In the new API call to tell you that we have something to index
In the normal index call

So we will now most likely be disadvantaged by taking more time and be finger pointed because of the ranking that shows custom field who takes the longest to index. So we feel that we will be ending penalized by this. Is there any mechanism where we could avoid having to do twice the work. E.g don’t force us to return the list of fields that have values to index by default, instead treat it as an optimization mechanism. The fact that you now display who takes the longest should, in itself, put pressure on the vendor. If I get pressure from my customer because my field in in the top of the ranking, I will make sure that I implement the API properly to have save time. Now, if I did the job well from the beginning, I don’t have to so extra calculation to tell you to call my indexers anyway. Please consider this. I’m open to discuss more about this.

Concern #2

The new API introduced a new return type which makes it a breaking change. We always strive to have one version that fits all Jira’s version. That will definitely give us headaches as we now have to compile different versions just for that. Can you somehow not introduce it as a breaking change? Or do you know of a mechanism that could help us having only one version?

Thanks!

Yves

yvesriel · June 8, 2020, 1:46pm

Hi @DamianKedzierski,

We realize that if we don’t implement the new API, both previously mentioned concerns are avoided and that’s what we plan to do. However, we just want to make sure that not implementing the API will always be possible and that it will not have side effects (e.g. saying which vendor doesn’t implement the API).

Regards,

Yves

DamianKedzierski · June 8, 2020, 1:47pm

@yvesriel Thanks for your feedback

About #1 - we are aware of this problem and will expose API to allow reusing calculated values to avoid performance problems. We will keep you updated.

About #2
Unfortunately, we did not find any way to achieve this effect without changing the API. If we could, we would definitely do that to not complicate vendors’ life
I heard about apps using bridges to work with multiple Jira versions.

DamianKedzierski · June 15, 2020, 7:05am

@yvesriel Currently we plan to make using these APIs voluntarily for apps. However, we may want to check it in DC apps certification process in the future (when we validate it will work for vendors and provides expected performance improvements).

yvesriel · June 15, 2020, 4:10pm

Hi @DamianKedzierski,

Thanks for the info. However, if it comes to a point where this is asked in DC apps certifications, please make sure that it could be waved if we bring sufficient arguments. We have discussed this again this morning in our team and because our use case with options is different than other custom fields, it’s still better for us not do implement it.

DamianKedzierski · June 17, 2020, 7:24am

@yvesriel Sure, we definitely want to understand vendors use cases first before making such decisions.

Is keeping one code base for your app the main concern?
Could you put some light how you implemented your custom field(s)?