DC App Performance Toolkit appreciation post!

:wave:

I wanted to take some time to write about my recent interactions with the DC App Performance Toolkit engineering team.

As you all know, I can be very critical of Atlassian. I was very critical of the move to Terraform/Kubernetes for the DC approval process, as this would increase complexity of the deployment process for Marketplace Partners who did not have any experience with Terraform or Kubernetes.

I became even more vocal when the new solution was introduced without a proper replacement of the old CloudFormation one-click deployment, even though this was promised by the team during a meeting on the subject. As a result I postponed our participation in the program, leading to our submission of our annual review being 233 days overdue.

In those 233 days, the DC team reached out to me and we had very open conversations. The team listened to my concerns and worked on a solution, which was made available to me from a DEV branch and eventually ended up in the 7.5.0 release of the DC App Performance Toolkit.

I used the solution the team provided to run the performance & scaling tests of 23 DC apps (!) last week, and I was genuinely impressed by the process.

The one-click docker container solution created by the team was well document, easy to follow and allowed me to provision the environment without any understanding of Terraform or Kubernetes. Although not officially supported, I even managed to run performance tests of 3 products (Jira, Confluence & Bitbucket) simultaneously on the same cluster :exploding_head:

The main benefits of the current Terraform solution:

  • Three simple CLI commands to install, uninstall and terminate the cluster (using Docker)
  • No more manual steps: the Terraform solution provisions the cluster, the instance, the database, the shared storage, everything. All you need to do is change a few variables in a text file
  • It scales both vertically as well as horizontally, allowing you to run multiple products on the same cluster, saving you time to test multiple products
  • The solution is scriptable: for the next iteration I will be automating the entire process to run from CircleCI

The team was also very helpful in dealing with any troubles I ran into and I was able to provide feedback on the scripts.

So well done @OleksandrMetelytsia and team :clap::clap::clap:

26 Likes

I also want to chime in here.

Similar to Remie I’ve used the docker based cluster setup before it was officially released and can only confirm what he said.
The setup process was extremely simple (one docker command) and the cluster was available and ready to go in less than an hour. That’s a huge step forward compared to previous approaches where we had to load the dataset ourselves. I’m sure that it will save the ecosystem countless hours of manual cluster setup work.

Therefore, a big ‘thank you’ to the team maintaining the testing toolkit and the whole DC team in general for these improvements and for always being very responsive in both slack and the ECOHELP tickets.

Cheers,
Jens

6 Likes

2 kudos for the new Terraform/Docker framework here!

  • It’s quite close to a 1-click solution,
  • Maybe it would be nice if it auto-deployed the JMeter instance too, it requires understanding of AWS to launch and install it,
  • Oleksandr seems to be working 12hrs a day, maybe even 24hrs, since I haven’t found a moment when he wasn’t available.

The framework didn’t work on the default AWS zone, it took me two days to setup and understand that.

But now that it is working, wow, I can execute the 5 tests in less than one day!!! It used to take two weeks!!! The commands are really easy, no parameter to change, it doesn’t require importing SQL data manually or reindexing a million Confluence pages! It could even probably be scripted for the run 1 & 2 and for the runs 3-4-5 separately, but it’s great work that has been performed there!

Concerning the scripting, well, it’s sad that the framework doesn’t just tell us to setup an EC2 machine, and that machine starts the Terraforming, the BZT tests, scales, starts BZT again, etc. But the current setup is already excellent and an incredibly good improvement over the past!

Kudos to that team!

4 Likes

Let me say thank you to DCAPT team. Firstly, I was also confused to see changing from CloudFormation to Terraform/k8s platform because our team was used to the way using CloudFormation. However, the new procedure is understandable and shorter than before. It’s good for us to skip loading huge initial test data. It would be nice for me if execution environment is also automatically created. :smile:

DCAPT team’s support is always excellent and prompt, and their Slack channel always helps our team.

Thank you very much :clap:

4 Likes

Thank you for your feedback, community!

11 Likes

We are currently in the process of conducting our annual DC performance testing, and have tried the Terraform/k8s method this time around, instead of the older Quick Start CloudFormation templates.

Overall the experience has been great. We don’t use DCAPT to test our app, as we don’t have a lot of experience with Taurus/JMeter etc. and have never really been able to get it to work properly for us (instead, we have a suite of Cypress browser tests and we measure the end-user perceived times); but in terms of standing up the enterprise scale DC cluster and dataset, the TF/k8s method has been very smooth.

The initial run takes roughly about the same time (maybe a little less) as the older Quick Start method (presumably because the bottleneck is AWS provisioning the resources), but having the data load automated using an RDS snapshot instead of the old pg_restore script involves a fewer manual steps.

One thing we have noticed, and we’re not sure if this is expected or something specific to us:

When we run the command to bring up the environment:

docker run --pull=always --env-file aws_envs \
-v "$PWD/dcapt.tfvars:/data-center-terraform/config.tfvars" \
-v "$PWD/.terraform:/data-center-terraform/.terraform" \
-v "$PWD/logs:/data-center-terraform/logs" \
-it atlassianlabs/terraform ./install.sh -c config.tfvars

…it seems to get stuck at “Acquiring state lock. This may take a few moments…” for about ~20-30 minutes. The first time we assumed it had hung, so we killed it (and then had to figure out how to manually reset the terraform state lock). The next time we just left it running and found that it eventually completed.

We assumed this was because on first run it has to create everything from scratch.
However when we went to scale the cluster from 1 to 2, and then again to 4 nodes (by editing the confluence_replica_count in the dcapt.tfvars file, and then re-running the above command), it spent the same ~20-30 minutes stuck at the same spot.

I’m not sure what it’s doing (or if there’s any way we can get a better indicator of progress during this time); but it would be great it there was a quicker way to scale the cluster.

Would it be faster to log into the AWS console and scale it manually? If so, how would we do this? (are we scaling the number of pods in the EKS cluster?)

Or is it not supposed to take this long to apply the Terraform after editing the vars?

This is really our only issue with the new process.