BBS 5.0 Database error on merge check

izymesdev · May 16, 2017, 9:13pm

When pullRequestService is asked to merge a pull request it performs registered merge-checks.
My merge check needs to check on pull request participants, so during the merge check I call pullRequestService.getParticipants(…) - this throws a DB error (see below)

It looks like the hibernate transaction for updating the DB for the merge operation gets mixed up with looking up pull request participants?

2017-05-17 06:11:08,434 ERROR [threadpool:thread-4] userb @3XS0NUx371x5129x0 127.0.0.1 "POST /rest/workzoneresource/latest/approvals/TPRJ/repo/1/sign HTTP/1.1" o.h.e.j.batch.internal.BatchingBatch HHH000315: Exception executing batch [org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1], SQL: update sta_pull_request set closed_timestamp=?, description=?, from_hash=?, locked_timestamp=?, rescoped_timestamp=?, pr_state=?, title=?, to_branch_name=?, to_hash=?, to_branch_fqn=?, updated_timestamp=?, entity_version=? where id=? and entity_version=?
2017-05-17 06:11:08,435 WARN  [threadpool:thread-4] userb @3XS0NUx371x5129x0 127.0.0.1 "POST /rest/workzoneresource/latest/approvals/TPRJ/repo/1/sign HTTP/1.1" c.a.s.i.h.r.DefaultRepositoryHookService [TPRJ/repo[1007]] Error calling com.atlassian.stash.internal.plugin.legacy.MergeRequestCheckAdapter.preUpdate (com.izymes.workzone:wz-approvals-merge-check)
com.atlassian.bitbucket.DataStoreException: A database error has occurred.
	at com.atlassian.stash.internal.aop.ExceptionRewriteAdvice.afterThrowing(ExceptionRewriteAdvice.java:45)
	at com.atlassian.plugin.util.ContextClassLoaderSettingInvocationHandler.invoke(ContextClassLoaderSettingInvocationHandler.java:26)
	at com.izymes.workzone.automerge.DefaultMergeManager.getSignedApprovals(DefaultMergeManager.java:350)
	at com.izymes.workzone.automerge.DefaultMergeManager.makeMergeCondition(DefaultMergeManager.java:334)
	at com.izymes.workzone.automerge.DefaultMergeManager.isVeto(DefaultMergeManager.java:264)
	at com.izymes.workzone.hook.ApprovalsMergeCheck.check(ApprovalsMergeCheck.java:63)
	at com.atlassian.stash.internal.plugin.legacy.MergeRequestCheckAdapter.preUpdate(MergeRequestCheckAdapter.java:30)
	at com.atlassian.stash.internal.plugin.legacy.MergeRequestCheckAdapter.preUpdate(MergeRequestCheckAdapter.java:11)
	at com.atlassian.stash.internal.hook.repository.DefaultRepositoryHookService.preUpdate(DefaultRepositoryHookService.java:583)
	at com.atlassian.stash.internal.hook.repository.DefaultRepositoryHookService.preUpdate(DefaultRepositoryHookService.java:284)
...



Caused by: org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
	at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:67)
	at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:54)
	at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.checkRowCounts(BatchingBatch.java:141)
	at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.performExecution(BatchingBatch.java:116)
	at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.doExecuteBatch(BatchingBatch.java:97)
	at org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl.execute(AbstractBatchImpl.java:147)
	at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.executeBatch(JdbcCoordinatorImpl.java:206)
	at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:618)
	at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:463)
	at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:337)
	at org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:50)
	at org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1396)
	at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1481)
	at org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1445)
	at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1414)
	at org.hibernate.query.internal.AbstractProducedQuery.uniqueResult(AbstractProducedQuery.java:1457)
	at com.atlassian.stash.internal.pull.HibernatePullRequestDao.findByRepositoryScopedId(HibernatePullRequestDao.java:211)
	at com.atlassian.stash.internal.pull.DefaultPullRequestService.getPullRequestOrFail(DefaultPullRequestService.java:1357)
	at com.atlassian.stash.internal.pull.DefaultPullRequestService.getParticipants(DefaultPullRequestService.java:570)

mheemskerk · May 17, 2017, 12:01am

It looks like your call to getParticipants is triggering other changes in the same transaction to be flushed to the database, in this case the query:

update sta_pull_request set closed_timestamp=?, description=?, from_hash=?, locked_timestamp=?, rescoped_timestamp=?, pr_state=?, title=?, to_branch_name=?, to_hash=?, to_branch_fqn=?, updated_timestamp=?, entity_version=? where id=? and entity_version=?

The fact that returns 0 rows changes means that the pull request has already changed in parallel. Can you elaborate on what the POST /rest/workzoneresource/latest/approvals/TPRJ/repo/1/sign endpoint does and when it’s called. Perhaps there are multiple requests in parallel that update the pull requests?

Also, does this problem happen consistently, or occasionally (as in a race condition)?

izymesdev · May 18, 2017, 9:12am

Hi Michael, good to finally meet the person behind the test repo you get in the atlas-debug instances of Bitbucket

I’ve created a gist that describes the sequence of events and calls including

Signature Rest endpoint approving the PR inside a lockService.getPullRequestLock context.
The PullRequestApproved event handler trying to merge the pull request
The merge-check invoked by pullRequestService.merge() wanting to check on pull request participants …

Would it be possible that approving the pull request via pullRequestService.setReviewerStatus interferes with fetching participants?

UPDATE: … yes this happens all the time, it’s not a race condition.

mheemskerk · May 19, 2017, 7:13am

Hi Ulrich,

Thanks for posting those code snippets, they helped reproduce the problem.

Would it be possible that approving the pull request via pullRequestService.setReviewerStatus interferes with fetching participants?

I don’t think so because the call to setReviewerStatus runs in a separate transaction which will have finished (and committed) by the time the event is raised.

Thanks to the detailed steps, I’ve been able to reproduce the issue with a small plugin. I’ll spend some time to figure out what’s causing it and will add an update when I do.

Cheers,
Michael

izymesdev · May 24, 2017, 6:53pm

Hi Michael,

I’m trying a few options as workarounds here. I’ve added a static ThreadLocal variable that is set just before the automatic merge starts from within workzone. Then when the I’ll-fated merge check is invoked from within the same thread it reads the variable and if it’s present skips the merge check. Then I clean up after the attempted merge (successful or failed) and reset the ThreadLocal variable
Let me know what you think about the workaround w.r.t. thread safety, mem leak and clustering.

mheemskerk · May 26, 2017, 7:23am

Hi Ulrich,

That workaround seems fine to me. I’m assuming that you do something like this?

itsOkFlag.set(true); // or perhaps set "" + pullRequest.getToRef().getRepository().getId() + ":" + pullRequest.getId();
try {
    pullRequestService.merge(..)
} finally {
   itsOkFlag.reset();

Getting this issue resolved is top priority for me next week. Aiming to get a fix in the next 5.0 patch release and the upcoming 5.1 release.

Cheers,
Michael