AWS Elastic Beanstalk – Pitfalls

It’s pretty easy to get lost in the immense product palette offered by Amazon Web Services.  There’s everything from bare-bones virtual machines (EC2) through fast database provisioning (RDS) to Machine Learning (AML).  For any given requirement, you’re probably going to find yourself spoiled for choice and a little overwhelmed at the sheer number of building blocks available to you in AWS.

Two of those components are CloudFormation and ElasticBeanstalk.  Both are orchestration systems used to configure and manage AWS resources.

CloudFormation allows fine-grained control of pretty much every aspect of every component available in AWS.  It’s the Rolls-Royce of orchestrators, but it’s pretty difficult to use.  Enter ElasticBeanstalk, or EB.

EB allows you to set up a cluster of servers, an autoscaler, a load balancer, a software source for your deployment, a database tier, basic health checks and simple notifications.  That’s basically everything to get you going with a fairly sophisticated failure-tolerant environment.  Under the hood, EB translates your requirements into CloudFormation templates, and uses CloudFormation to actually deploy resources.

We use EB deployments extensively in the Multicontainer Docker mode.  EB is somewhat biased toward web services, but in this mode, provided you can get your software into a docker container, you can have it managed by EB.  EB even offers parallel environments (sharing the same pool of source software) and URL switching for Blue/Green deployments.

There are a couple of pitfalls we ran into, which you should be aware of before you go for Beanstalk.

Don’t use the built-in database tier.

The built-in database tier defers the actual data storage to AWS’ Relational Database Service (RDS) product.  This is a great product, but attaching it to a Beanstalk is a bad idea. When you delete the Beanstalk, it will delete the database too.  There’s an option to also snapshot the DB, but you can imagine the chaos which ensues if you do this by accident.

As your environment evolves, you’re going to want to have other resources access the DB too.  This can be achieved by adding appropriate security groups to the DB and the client EC2 resources, but that raises a logical issue in your architecture: now the DB is in the wrong place.  It should be a standalone RDS instance, not tied to a Beanstalk.

Adding security groups to the RDS database also brings us to the second pitfall, namely…

Don’t mess around with Beanstalk-managed resources.

When you use Beanstalk to orchestrate your deployment, the authoritative source of configuration for these resources is Beanstalk.  Any changes you make to these resources will – at best – be overwritten during the next deployment.

At worst, they will cause the deployment to fail; possibly into a state you can’t get back out of (more later).

Things like tweaking the Auto Scaling Groups or Load Balancer are pretty much “at your own risk” operations.  Beanstalk is at liberty to revert these changes during the next deploy.  Unfortunately, because Beanstalk simplifies a lot of things, there are many configuration elements of these resources that can’t be configured by Beanstalk.  You’ll need to evaluate Beanstalk carefully to see if you can live with this.

Cases where it actually went wrong

Load Balancer Certificate:  I changed the certificate used by the load balancer in the balancer itself, rather than in the Beanstalk.  During the next deploy (which naturally was an urgent fix), Beanstalk failed the deploy action saying the balancer was out of sync with the configuration (technically true) and placed the deployment into the dreaded Grey-State-Of-Death (deployment failed, application not ready).  Unfortunately, changing the configuration requires the application to be in “Ready” state.  This catch-22 situation means we had to delete the whole Beanstalk application, and recreate it.

RDS Scale-Up.  I wanted to scale up the DB attached to a Beanstalk, so I made the configuration change in the Data Tier of the Beanstalk and committed that.  While I was waiting for that operation to complete (which was taking of the order of thirty minutes), I configured a couple of security groups on the database.  Beanstalk (for some reason) then marked the whole deployment as having failed, and rolled it back — which in this case meant scaling the DB back to its original size (another thirty minutes).  The solution here was to snapshot the DB, then create a new standalone (not Beanstalk-attached) RDS instance out of the snapshot.  This gives us an independent RDS DB that we can configure and scale without worrying about beanstalks.

Conclusion

EB is a great balance between simplicity and power.  You get all the power of the underlying technologies (EC2 Container Service; Auto Scaling Groups; Elastic Load Balancer etc.) without having to orchestrate them yourself.  If you can live with the compromises (the Docker beanstalk types are brilliant, by the way, and will get even better when the EC2 Container Registry services starts in more regions), then EB is a first-rate solution.

If you can’t live with the compromises, you’ll need to dig into CloudFormation or OpsWorks (Chef) to put your environments together.  Either way, AWS is a great choice for cloud environments.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.