History / Background
Jira uses Lucene library as search engine. Lucene pre-processes all issue data to build an index of words and values occurring the issues’ fields.
Jira has a few different types of indexes . These are:
- issues (main issue data)
- comments
- worklogs
- change history
Jira 8.0 changes the way plugins interact with the issue index. This article will focus on issues index only.
A document in Lucene is an abstraction of what gets found, in case of issues index it’s the equivalent of an issue. Each document is a set of key value pars called fields . Field in a document is either a system field (like: issue id, project key) or a custom field.
Lucene index is organised in segments where a single segment contains a set of documents and additional structures. Segments are saved on the filesystem and are read-only.
A plugin performing search can collect results by either:
- doing a search and getting list of matching documents , then extracting necessary fields
- using a Lucene collector to retrieve values of fields from the matching documents instead of entire documents
As usual, it’s a tradeoff. First approach is not recommended when handling big sets of results ie. more than a 1000 issues, unless you are accessing more than 20 fields of each issue. Using collectors is more memory efficient, and can be orders of magnitude faster. Benefits are diminishing the more fields you need to read.
Up to 7.13 collectors could collect results either by:
- Extracting necessary fields from documents (using visitor pattern meaning opening up document by document )
- Accessing the data using FieldCache which was a way to access field values without accessing the entire document .
The former had the same drawbacks as the basic search, since it was scanning documents. FieldCachewas a preferred way of accessing data using collectors .
Jira 8.0
In Jira 8.0 we are using an upgraded version of Lucene, where FieldCache was removed.
In the upgraded version of Lucene (7.3) a successor of the FieldCache called DocValues is available. It is a structure which can be used in similar manner as FieldCache.
DocValues is a data structure in Lucene allowing you to access data of specific document without unnzipping its content (in new version of Lucene documents are stored as zipped files on harddrive).
Document fields can have one of 6 different types of DocValues and these can be grouped (by collector usage) into 3 buckets:
- None - a field that doesn’t have its respective DocValues at all.
- Single value (Numeric, Binary, Sorted, SortedNumeric) - these fields will have a single value in DocValues
- Multiple values (SortedSet) - this field can have multiple values e.g. Labels field in Jira issues.
We’ve prepared a few examples that will you help understand how to use DocValues in Jira 8.0.
You can learn more in our examples repo.
Migration from 7.x to 8.0
Unless you were directly using FieldCache in your Lucene collector you are safe when migrating from Jira 7.x to 8.0.
If you did use FieldCache in 7.x please grab a coffee and continue reading this section.
If in 7.x you were calling one of getX(…) methods from FieldCache then you need to migrate to DocValues types. Use the following snippet to determine which DocValues type you will be interacting with:
@Override
protected void doSetNextReader(LeafReaderContext context) throws IOException {
// Segments separator
log.info("=== Next segment ===");
// Get fieldInfo for each field
for (FieldInfo fieldInfo : context.reader().getFieldInfos()) {
// Output fieldName and docValuesType
log.info("Field name: " + fieldInfo.name
+ " has docValues type: " + fieldInfo.getDocValuesType());
}
}
Once you know the DocValues type for your field type then depending on the bucket (None, Single, Multi-value) you need to extract the value (None is skipped as there is no value to extract).
- Single value
/**
* This method is called every time the search moves to the next segment.
* Collect with given docId is always after this method is called
*/
@Override
protected void doSetNextReader(LeafReaderContext context) throws IOException {
// For each segment we need to get doc values
fieldDocValues = context.reader().getSortedDocValues(FIELD_ID);
}
/**
* This method is called for every document that matches the query.
* @param doc docId
*/
@Override
public void collect(int doc) throws IOException {
// First we need to verify if value exists for given doc.
if (fieldDocValues.advanceExact(doc)) {
// Get the value for given doc (you need to call different methods depending on the DocValues type, e.g. .longValue() instead of .binaryValue())
String value = fieldDocValues.binaryValue().utf8ToString();
}
}
- Multi value (Sorted set):
/**
* This method is called every time the search moves to the next segment.
* Collect with given docId is always after this method is called.
*/
@Override
protected void doSetNextReader(LeafReaderContext context) throws IOException {
// For each segment we need to get doc values
fieldDocValues = context.reader().getSortedSetDocValues(FIELD_ID);
}
/**
* This method collect if Lucene has a hit in given segment when executing query.
* @param doc docId
*/
@Override
public void collect(int doc) throws IOException {
// First we need to verify if value exists for given doc.
if (fieldDocValues.advanceExact(doc)) {
// Iterate over values
long ord;
while ((ord = labelsDocValues.nextOrd()) != NO_MORE_ORDS) {
final BytesRef next = fieldDocValues.lookupOrd(ord);
// Get next value
String value = next.utf8ToString();
}
}
}
You can learn more in our examples repo.