In just over 5 hours, our new version has been rolled out to 210 customers. Because we had to do a data migration this time, we could not retain backward compatibility. Arrrgh! the pain! I’m just sharing the joy with y’all.
This sounds like an interesting topic.
Just so I understand the issue, I assume you are referring to a backwardly incompatible schema change in your app’s externally hosted data store and that the problem is that the extended rollout time frame meant that different versions of the app were accessing the data store concurrently. Presumably the schema migration occurred at the start of this rollout which broke the old app installations.
Would it have been possible to roll out the new version of the app with the feature dependent on the new schema version behind a feature flag?
Yes, I think you understand the problem correctly. Our last release fundamentally changed the data model for our app. There were changes to the data store schema and broad changes to the application layer too. We began with the best of intentions: use a feature flag to switch customers from one data model to the other, but it all became too complex. The changes we made were not localized to a feature; they altered the platform on which all of the features depend.
We decided to bite the bullet and go for a cold swap: a scheduled down-time during the Christmas week. I thought it would be about 4 hours long. It used to take about that long for the new descriptor to roll out across our customer base. I was wrong.
There’s a trade-off, it seems to me, between the complexity cost of feature flags, especially in cases of fundamental data model changes, and the cost of down-time during a release. The extended rollout timeframe tips the scale towards feature flags. The consequence of the extended rollout timeframe is, therefore, increased cost, one way or the other, for us.
Can you verify that rollout timeframes have in fact been extended? I’m not mistaken there, am I? What was the rationale for that change? Did it solve some problem that I’m not aware of?
This makes sense - thanks for elaborating. I wasn’t aware of the rollout period being extended, but I’m not very close to that functionality so your observations are probably correct.