Terraform/Jmeter performance testing - practical experience

Over the last while we've had the opportunity to put our new Jmeter learnings to work.

  • Bug came up that was only evident under load - we were able to reproduce it in our dev environments!  The dev ran Jmeter off his laptop, and it was enough load to generate the bug.
  • I think I mentioned last time about how the simple act of mapping out a Jmeter script revealed excess calls to our middleware - tickets were created to address this.
  • QA has used it to help draw out issues with a new production environment, but...
...the other day they ran out of steam on their laptops.  So we got to come back to the Terraform/Jmeter setup we built a few months back.  Thankfully everything still worked, and we were quickly (15m) on our feet with 1 master and 6 slaves (c4.large) raising heck.

This is where I will talk about the lessons we learned today...
  • Terraform is amazing and was totally worth the time investment
  • If you are testing a cold production environment - ASK ABOUT HOSTS FILE CHANGES!!  We hammered 'real' production for a good 5 minutes before realizing that.  "Huh, where are the logs?"  After the cold sweats passed, everyone had a good giggle.
  • Killing the Jmeter master test run doesn't also kill the slave processes!  Oops.
  • Manually updating the master/slave nodes quickly gets old (just git pulls, but definitely worth some automation)
  • Manually uploading the reports/data dumps to S3 quickly gets old (even if it's pasted cmds)
  • Manually linking S3 reports/dumpfiles quickly gets old (even if it's just 3 files)
  • Not having a parameterized Jmeter script (i.e. for the thread count variable) leads to wasted time (although not sure how thread count #s work in a master/slave setup - probably won't help that much)
  • There is a ceiling of thread count/thread groups that will cause Jmeter to blow up - GC overload, heap dump, etc - we had 15 thread groups with 3-4 requests each, and accidentally put in 417 threads per group.  Oops.  (fwiw, 417 threads with 5 thread groups was ok)
  • If you are using AWS ELB IPs and hosts file changes on the slaves - check the IP validity before each test run!  We had an IP expire and didn't realize - corrupted two of our test runs.  Better yet, find a better solution than hosts file changes!!
  • Don't use test accounts/organizations/users for performance testing!  Use real data!  Better yet, log replay!!! (this is our next big step, has to happen)
  • Start with a CLEAR idea of the targets you are testing against.  Don't assume you can easily translate a prod figure into a Jmeter figure!  Better yet, use log replay!!
  • It's hard to learn while performing!
  • Performance testing is hard
Our next big steps are going to probably be:
  • Put a front-end on the terraform/jmeter automation, make it accessible to dev/qa
  • Log replay.  Log replay.  Log replay.

Comments

Popular posts from this blog

Fixing duplicate SPNs (service principal name)

DFSR - eventid 4312 - replication just won't work

Logstash to Nagios - alerting based on Windows Event ID