The trials of Stress Testing

Originally posted here: https://medium.com/@briankimball/the-trials-of-stress-testing-10cd29555129

A little while ago, we setup a new server to connect separate products from our product line together. We had an issue with customers that used our integrated mail plugin (Bullhorn for Email) for Outlook desktop could not use Google Chrome because the plugins are loaded with an embedded version of Internet Explorer. The solution we came up with was to send the requests instead through a web socket. This setup was pretty easy, we build a simple nodejs server using socket.io to support our web socket connection and bonus we already had a deployment strategy for containers to an AWS environment.

Unfortunately we don’t have very much experience with web sockets, we had no idea how much load a single server could take, so we went to the internet and asked. Luckily I came across this article by Bocoup and thought phew they have done all the work for us. Our initial use case for this will only be used by a few hundred customers so when Bocoup said they saw no degradation of performance until around 17K concurrent connections, I was sure no one would make me investigate this further. Unfortunately for me, someone thought it was wise to run some stress tests ourselves and see if this would be true for us.

Our resident Gatling expert was going on PTO and to be honest, stress & load testing is usually something of a lower priority in the development process. So I put on my thinking cap and tried to figure out how to go about this. We had a few technologies to chose from:

Gatling

  • we already had a few stress test packages written using this framework
  • we knew how to deploy these packages to vm’s to run on multiple instances to prevent the test runner from overloading
  • Java/Scala

Bees with Machine Guns

  • This is the project used by the Bocoup
  • Written in python

Artillery.io

  • Written Javascript
  • Built-in web socket logic
  • we have a winner

In a matter of minutes I had a stress test package all set up and we were running with artillery.io. Running tests against our server with my laptop wasn’t really a good idea, trying to hit capacity of 10,000 connections showed a lot of failure real quick when my computer became unresponsive. :( I had seen this coming since we had run into this issue before with our gatling tests. I quickly went about creating a dockerfile and packaged my test script as a container. I decided to put the containers in Google Cloud because I thought it would be good to keep it out of our AWS infrastructure we were testing and it is a million time easier to do. I created a new project in GCE and pushed my image and wrote some deployment scripts you can see here[insert links]. This took the most amount of time to tweak, but in the end we have a project this with connect to a GCE project, create a cluster in 3 regions (us-east, us-west, europe-west), scale those clusters, and deploy my image as daemonset to those cluster. The test image will start-up, immediately start running then wait to be shutdown.

We ramped up several times to load and tried a few different configurations, but in the end we feel like we got to prove similar result to Bocoup. We stopped our test when we saw no performance degradation at 10.5K concurrent connections. Conclusion

Find the source here: https://github.com/bvkimball/stress-test

Hopefully if you needs to do something similar this article helps you out.