We have looked into this and here are our findings:
Index replication is complex
There’s much more to the index replication than just copying files from one node to another.
In Jira DC, Indexes mutate based on events happening in the system. Issue index gets updated with any change in the issue. All these changes needs to be reflected on all other nodes in order to maintain consistently across the cluster. JiraDC uses the Document Based Replication technique for this.
Since interaction happens on one node and then gets replicated to other nodes, the changes to the indexes happen asynchronously, and they happen at different times on different nodes. So Indexes are updated in different order on individual nodes in the cluster.
Let’s unpack that. Let’s imagine two user requests reaching two different nodes to edit two different issues. These two nodes change their local index immediately and now they have new, unique states of their index. But they are ahead of all the other nodes in that cluster, which are unaware of the changes. As information reaches the other nodes, these nodes then reflect the changes in their index. And this may happen in different sequence due to the asynchronous nature of the communication.
It affects index replication to new nodes
The first problem is that if Apps bring in their own indexes, they need to manage the index state on individual nodes, and by copying the index, the App is still responsible to figure out which state is the index in, and that does not get replicated with just copying the files.
The second problem is that given any state of the index, the App needs to understand how to catch up in order to get the out-of-date index to up-to-date state.
There are various steps we have to take to manage this in Jira
- Keeping information on mutations to issues index in the database
- Index can be queried for the freshest update made to it, per-issue
- Once index is replicated, regular updates flowing from other nodes are blocked
- The fresh node scans that history and catches up from the latest included operation
- Once that is done, if that took a lot of time (dependent on how much time it took) the previous operation may be needed to happen again
- Once we catch up and we can start updating the index in real time, we do, and the replication is done.
In order to support index replication on basic, filesystem level, we could simply copy the files, without handling any of the other effects. This might give the false promise that this is enough to handle the scenario. It will create a huge gap in the support for this scenario. Due to these complex scenarios, for now it will not be possible to support index replication for Apps from our end.
But, here are some quick fixes for you
If your App handles data connected to issues, consider using Jira Fields to tap into the existing mechanism.
If you want to handle the index replication on your own, use the shared home to put snapshots of your index there and any additional information you need to catch up with later. You can also use cluster locks to provide synchronisation between nodes to build the behaviours needed to store the snapshot and to catch up with the index state later.