Terraform/Jmeter performance testing - practical experience

Over the last while we've had the opportunity to put our new Jmeter learnings to work.

  • Bug came up that was only evident under load - we were able to reproduce it in our dev environments!  The dev ran Jmeter off his laptop, and it was enough load to generate the bug.
  • I think I mentioned last time about how the simple act of mapping out a Jmeter script revealed excess calls to our middleware - tickets were created to address this.
  • QA has used it to help draw out issues with a new production environment, but...
...the other day they ran out of steam on their laptops.  So we got to come back to the Terraform/Jmeter setup we built a few months back.  Thankfully everything still worked, and we were quickly (15m) on our feet with 1 master and 6 slaves (c4.large) raising heck.

This is where I will talk about the lessons we learned today...
  • Terraform is amazing and was totally worth the time investment
  • If you are testing a cold production environment - ASK ABOUT HOSTS FILE CHANGES!!  We hammered 'real' production for a good 5 minutes before realizing that.  "Huh, where are the logs?"  After the cold sweats passed, everyone had a good giggle.
  • Killing the Jmeter master test run doesn't also kill the slave processes!  Oops.
  • Manually updating the master/slave nodes quickly gets old (just git pulls, but definitely worth some automation)
  • Manually uploading the reports/data dumps to S3 quickly gets old (even if it's pasted cmds)
  • Manually linking S3 reports/dumpfiles quickly gets old (even if it's just 3 files)
  • Not having a parameterized Jmeter script (i.e. for the thread count variable) leads to wasted time (although not sure how thread count #s work in a master/slave setup - probably won't help that much)
  • There is a ceiling of thread count/thread groups that will cause Jmeter to blow up - GC overload, heap dump, etc - we had 15 thread groups with 3-4 requests each, and accidentally put in 417 threads per group.  Oops.  (fwiw, 417 threads with 5 thread groups was ok)
  • If you are using AWS ELB IPs and hosts file changes on the slaves - check the IP validity before each test run!  We had an IP expire and didn't realize - corrupted two of our test runs.  Better yet, find a better solution than hosts file changes!!
  • Don't use test accounts/organizations/users for performance testing!  Use real data!  Better yet, log replay!!! (this is our next big step, has to happen)
  • Start with a CLEAR idea of the targets you are testing against.  Don't assume you can easily translate a prod figure into a Jmeter figure!  Better yet, use log replay!!
  • It's hard to learn while performing!
  • Performance testing is hard
Our next big steps are going to probably be:
  • Put a front-end on the terraform/jmeter automation, make it accessible to dev/qa
  • Log replay.  Log replay.  Log replay.


Popular posts from this blog

In 2020, what will your 100 minutes be?

Breaking hero behaviour with systems thinking

Health check learnings, this time with data!