Overloading the Eclipse Job Management System
We’ve had an issue with our Eclipse-based application which would cause it to slow down exponentially under stress test. The problem lie in the job management system: new fast code could essentially cause it to bog down under the weight of update jobs.
Eclipse is one of those pieces of software which is rather difficult to explain accurately: it’s a platform for building GUI applications, it’s written in Java. It provides lots of services to make writing applications easier. It’s a pretty good choice for building cross-platform GUI apps. If you are a Java programmer, chances are that you either work inside Eclipse’s Java Development Tools (JDT) plugins or you’ve at least evaluated the platform.
One of our products runs as a plugin within Eclipse, and overall we’re very happy with the platform. Lately, though, we’ve been through a refactoring process to split out the core of our product and some of the dependencies. Coupled with some other fixes, we’ve managed to obtain quite a significant speed increase in how our backend works.
When we came to stress-test the application in order to check it performed under load, we found that instead of an overall increase in speed, we were seeing quite a significant decrease.
To explain this we had to look at the Eclipse job system. Internally, the Eclipse platform schedules events from the UI (button presses, pending window updates etc.) as jobs. Corresponding updates to UI elements are also scheduled in the same way. When a worker thread becomes free, the job manager selects a suitable job (based on priority) and allows it to run. Ordinarily there are a handfull of jobs in the job queue and they are cleared quickly, and the UI remains responsive.
Our refactoring had produced such a large increase in speed, that the resulting update jobs were flooding the job manager. After a two-minute stress test, it would take the platform somewhere in the region of 20 seconds (depending on the CPU cycles available) to clear the job queue and return to a responsive state.
We added debugging to show us the size of the job queue, and figures upwards of 3,000 pending jobs were not uncommon. A single (non-stress) command event telling our backend to do something would generate around 9 internal Eclipse update jobs to update various parts of the UI. With a suitably high injection speed (which the new code was handling fine), the number of corresponding UI update jobs quickly went through the roof.
The solution, in the end, was uncomplicated: we throttled the incoming command events (by abandoning some) if the job queue started to get too large. With a suitable throttle value, the backend could be kept busy (but not too busy) without overloading the job manager.
Kudos to my teammate Peter Bailey for doing excellent work on this problem.
Yeah I remember the headaches with eclipse plugin’s myself when we started out on it. It was one of things that is very powerful if you just knew how to use it properly, because there was just so little documentation about it at the time. Long live the web and code snippets or else we would never have had the first version :)
Right. I think the biggest learning experience from that is that you can’t second guess the Eclipse Java contracts – you have to implement them exactly as specified. And even then you end up making faulty assumptions about the order things are called in. It’s a great platform, but you pay for it in complexity.