Scaling JBoss A-MQ on OpenShift

I frequently get asked by customers if it’s possible to run Red Hat JBoss A-MQ on Red Hat OpenShift, and while the answer has been “yes” for quite a while now, it has always been followed by a few caveats. In particular, the issue of scaling…

But before we get into the issue of scaling, let’s talk a little about how the official image template works. Basically, it can operate in three different modes (as of this blog date).

The first, is persistent with no scaling. This is the equivalent of a single master/slave setup. Only, there is no need for an actual “slave” instance. If the master goes down, OpenShift will detect it and will spin up a new instance on the same or another node. And since the new instance/pod will have the same PersistentVolumeClaim, the broker will come online and see all of its in-flight messages exactly as they were. If I were to attempt to scale-up the instance in this mode, I would basically just spin up a bunch of passive “slaves” since they’ll all try to mount the same PersistentVolumeClaim/KahaDB, and will be unable to get the file-lock (and will resort to polling it until they can). And as described above, the slaves serve no real purpose because OpenShift is already monitoring and will bring up a new instance if needed. But what if I want to scale? What if I want to have many active instances sharing the load?

That brings us to the next mode. Which is non-persistent with scaling. In this mode, all of the instances share the same Service, but have no PersistentVolume attached. That means that clients (both producers and consumers) can be distributed across all of the instances. And, since each of the instances is networked together, the messages will find their way to a valid consumer. The instances will be automatically network together in a mesh configuration, and discover eachother using the same Kubernetes Service abstraction that the clients can use. So theoretically, I can scale this up as large as I need to handle my client load. But as I stated above, there is no PersistentVolume attached. Which means that my in-flight messages could potentially be lost if the owning broker goes down. So what if I want it all? What if I need the ability to scale-up, but also need persistence?

In that case, we would use the third mode. Which is persistent with scaling. In this mode, all of the brokers are networked together (using the same Kubernetes Service discovery mechanism as above), but they all also mount the same PersistentVolumeClaim. So how do they have separate KahaDB’s (and prevent all trying to lock the same one)? It’s actually quite simple… In this mode, they will all mount the same volume, but will use subdirectories inside that mount. So within the mount, you will get a bunch of directories called “split-1”, “split-2”, … and so on. If you want to see exactly how this works, you can open up a remote shell to one of the pods (ie, oc rsh <POD_NAME>) and take a look at the A-MQ start script. It just loops through (starting at 1) each of the directories until it finds one that it can get a file-lock on. Once it does, it starts up a broker instance and uses that sub-directory to store its KahaDB. It’s worth noting here that, since all of the instances will share the same actual PersistentVolume (and it’s underlying filesystem), you will need to use a distributed filesystem (ie, GlusterFS) with ReadWriteMany access so you don’t hit a storage performance bottleneck. Now I can run A-MQ on OpenShift and scale-up as much as I want (or as much as I have resources for anyway).

So what’s the “scaling problem” I mentioned earlier? Well… if I want to scale-up, I’ll probably also want to scale-down at some point. And if I scale down, I now have KahaDB’s sitting in “split-x” folders that could potentially have in-flight messages. And those messages likely can’t wait until I scale back up and happen to get an instance that mounts that particular “split” directory. So really, I need to drain those messages out of that stale KahaDB and push them to one of my remaining active brokers. So how might I go about that?

One solution might be to use a broker plugin that (on shutdown) will automatically drain messages off to other brokers. This could work, but would probably be problematic. First, you would have to make sure that all of your ActiveMQ TransportConnectors are shutdown before you start draining the messages. If you fail to do this, you could potentially be accepting new messages from clients and might never finish actually draining. The second (and probably more important problem) is that you don’t have an infinite amount of time to finish your work. When Kubernetes schedules a pod for shutdown, it does give a max time for it to complete its graceful shutdown. But if you exceed this timeout, it will forcefully shut down. So exactly how much time do you need to drain off all of your messages? It depends… It depends on how many in-flight messages you have stored in that KahaDB. It depends on how fast you can send those messages (maybe they’re large messages). It depends on how much space you have available on the other brokers (because of Producer Flow Control). The point is, there is no valid number for “shutdown timeout”. You need as much time as it takes. So what do we do?

Luckily for you, I’m sure you’ve already read my previous blog on Decommissioning JBoss A-MQ brokers. And in there, you’ve already seen my final proposed solution (and code example) for draining those messages. So really, all we have to do is make that code work on OpenShift. Here’s a first cut at it [https://github.com/joshdreagan/activemq-pv-monitor]. In this example, I use the FIS 2.0 tools to create a simple Red Hat JBoss Fuse app that will monitor the A-MQ “split-x” directories. If it finds a KahaDB that it’s able to get a file-lock on, it will drain its messages to another available broker (which it discovers using the same Kubernetes Service discovery mechanism described above). And since it’s a separate Pod, it can run for as long as it needs to. So no need to worry about pesky timeouts. The example could use some more error handling and various other QA, but it should be a good starting point.

So now we can scale-up, we can scale-down, and if we’re feeling lazy, we can even auto-scale. Cool beans! As always, hopefully you find this useful. And if so, buy me a beer this year at Red Hat Summit. :)

Share