Queue Management Using Amazon SQS

Personal Capital aggregates financial data from a wide range of third-party financial institutions using web service API calls. Each financial institution can have its own response time interval and polling time to retry later. Since each web service call might take more than 5-30 seconds to respond, we don’t want the user threads in our front-end application to block on the request. These requirements make our financial data aggregation service a good candidate for using asynchronous communication.  Asynchronous communication allows clients (user agents, like browsers or mobile apps) to initiate aggregation requests that proceed asynchronously without blocking user interaction, and then when the aggregated data is available, polling threads in the client software will update the data displayed to the user.  One widely used architectural pattern for asynchronous communication is messaged-based processing, using message queues to decouple processing steps.

In our first implementation of our message queues, we used a MySQL database as a simple way to accept the request from the front-end application and let the back-end workload application to pick up the request and process. Over time, as load increased, both from the messaging infrastructure, and from application code accessing the shared database, we had concurrent request problems and the “queue” did not scale well.  We then evaluated two purpose-built message queue systems: Apache ActiveMQ, and Amazon Simple Queue Service (SQS). In our evaluation we found that Amazon SQS works well, and we got it running within 10 minutes.   Amazon SQS offers a reliable, highly scalable, hosted queue for securely sending/receiving/deleting/storing messages as they travel between application servers.  And it operates as a service, with no need to deploy or maintain server instances or software. ActiveMQ is a mature framework, and is one of the leading message queue frameworks in the open-source world.  It is scalable and high-performance, and offers a more sophisticated feature set than Amazon SQS. However, ActiveMQ needs separate server, configuration, and queue maintenance for each environment and associated tuning of the system and infrastructure stacks for performance and scalability.   Since our infrastructure is hosted in Amazon WS, we decided to start with SQS for simplicity and see how far it could take us.

Our requirements for our message queue system include:  delayed processing of messages; ordered sequence; batch processing; and message priority. Amazon SQS met all of our functional requirements except for priority queues.  We achieved equivalent functionality of priority queues by using a pair of queues, QUEUE_TOP and QUEUE_NORMAL, for each application queue. Our front-end applications send messages to QUEUE_TOP and batch applications send messages to QUEUE_NORMAL.  We wrote a generic receiver endpoint, which listens to both queues in a pair, processing the QUEUE_TOP messages first and then processing QUEUE_NORMAL messages only when QUEUE_TOP is empty.

When user logs in to our system, the front-end application accepts the request and creates N number of queue messages and put in Amazon SQS.  We use JSON as the data format for the message payloads, and the Command pattern for handling messages that invoke specific workload tasks.   The queue server checks for available threads and reads available messages from the SQS queue(s), up to a configurable limit (typically 10).  The queue server checks the top-priority queue first and then the normal-priority queue, so that top priority messages are always processed faster.   One challenge with using SQS is that it does not support blocking reads, so an empty queue must be polled by your application code.  In order to not have our app servers fast-poll an empty queue (and use up all the CPU), we use a configurable polling interval (typically from 100 to 1000 msecs) with some internal buffering of messages in the app server, to ensure that worker threads stay busy between polling calls to the SQS queue.

Our implementation has shown us that Amazon SQS is solid alternative to ActiveMQ for message queues that are used for simple workflow decomposition.   Of course, one fundamental constraint is that Amazon SQS is only available if your infrastructure is deployed in Amazon Web Services.  For other hosting providers, or if operating your own data center, open-source software like ActiveMQ or RabbitMQ are more appropriate choices.

Since introducing SQS into our architecture, we have extended our use of dedicated workload servers, reading SQS queues, for:  integration with financial services partners; integration with SalesForce.com; and compute- and DB-intensive batch processes.  Stepping into the use of message queues to solve a specific problem (asynchronous processing of account aggregation), and succeeding there, has led us to a queue-based architecture for most of our back-end processing, with great improvements in scalability, flexibility, and reliability.