[image]

Deploying Distributed J2EE Applications Using Amazon EC2

[image]

Adriaan de Jonge provides some tips on configuring Apache and Tomcat to support load balancing J2EE applications across multiple Amazon EC2 instances.

AWS Products Used: Amazon EC2
Language(s): Java
Date Published: 2007-12-07
[image] [image]
[image]

By Adriaan de Jonge of SDB Java, The Netherlands

How do you configure your Java 2 Platform, Enterprise Edition (J2EE) servers to offer the scalability of Amazon EC2 to your applications? If you have the hardware, J2EE servers already offer support for load balancing. However, in practice, this solution isn't always used to its full potential because of hardware costs. Using Amazon EC2, scalability is standard and servers should be set up for load balancing by default. This tutorial explains the basic procedures for using Amazon EC2 to deploy distributed J2EE applications.

Architecture: Apache On Top Of Multiple Tomcat or JBoss Instances

Combining Java with the Tomcat web server is a popular approach for shops that use heavy web applications. If you are running a single server, you have a powerful engine, up to a certain number of concurrent visitors. If you receive more visitors, however, you could run into trouble; Tomcat has a bad reputation for performing poorly under pressure. Connections might fail to respond in time, resulting in many timeouts and errors because the maximum connection limit has been surpassed. Although there are alternatives to Tomcat--such as Jetty--that are better at handling stress, they don't solve structural problems if the capacity is too low.

The best way to handle structural capacity shortages is to employ load balancing techniques. This means running several parallel instances of Tomcat, each of which handles part of the traffic. The nice thing about the Amazon EC2 service model is that it lets you start and stop additional servers as needed without having to pay for the hardware when the capacity is not used.

For example, suppose you are running an online store, requiring as much as five times the usual capacity during the Christmas season, and half the capacity during summer. Your requirements might look like this:

January - May: two parallel Tomcat servers are required June - August: one Tomcat server is sufficient September - November: three parallel Tomcat servers are required December: ten parallel Tomcat servers are required

You also have these two additional requirements:

We want to handle this the simplest way possible. Switching on and off additional servers should not require a restart of Apache (which would interfere with the online traffic).

What is the simplest possible way to approach this problem? Some people might think it would be to have a graphical user interface (GUI), in which system administrators can simply click to add or remove servers. I must admit that I don't have such a tool, and it wouldn't be simple to invent one for this purpose right now. In my definition of simple, all I need to set things up and modify them is my favorite text editor. You might do this manually the first few times. A major advantage of the text editor approach is that it allows you to script any manual actions, automating the process over time.

When running parallel Tomcat servers, Apache HTTPd is required on top to distribute the workload over the underlying Tomcat instances. The architecture for two or more instances looks like this:

[image]

Translating this picture directly to Amazon EC2 instances, the Apache top layer would require its own dedicated Amazon EC2 instance. In practice, it turns out that the resources required by the Apache top layer are of a different proportion than the Tomcat services. The Amazon EC2 services provide an economical solution to these varying system requirements with heterogeneous instance sizing. For the Apache instance, you can use a small instance, and pay $0.10 per hour. For the Tomcat instances, you could use extra large instances, and pay $0.80 per hour. Considering the minimal system resources required by Apache, you could also share an extra large instance with Tomcat on one of the servers. In this case, the physical architecture of the instances looks like this:

[image]

Activating your web application running on the Amazon EC2 servers is a matter of pointing your domain name--for example, www.yourdomain.com--to the Amazon EC2 instance running Apache. To do this, you need to configure the Domain Name System (DNS), to tell it which computer belongs to the domain. In many cases, this configuration is accomplished by specifying the IP address of the computer in an A record for the domain. The alternative is to specify an alias, or CNAME, to another host name that points to your computer. Because the Amazon servers already have their own host names, it is easiest to use CNAMES. For more information on A records and CNAMES, visit Zytrax.com.

The Apache instance running on the computer you are pointing to with the CNAME takes care of spreading the load over the underlying Tomcat instances. An important detail of the CNAME setting is the Time To Live (TTL) value you assign to it. Preferably in this case, you'll set this value to a shorter time than usual: one hour, for example, or maybe even less. You will see why as we continue.

Usually, when you change the load balancing configuration in Apache, you are required to restart Apache to activate the new configuration. This restart would be noticeable to at least a few of your customers, so that is not the best way to switch to a new configuration. Starting an additional instance of Tomcat, or shutting one down, is a change in the load balancing configuration. With the CNAME and short TTL, you have a practical way to avoid restarting Apache and still switch over to a new configuration.

If you are running Apache on a small Amazon EC2 instance, you can start up a second small Amazon EC2 instance running Apache with the new load balancing configuration. If you change your DNS settings to let the CNAME point to the second Amazon EC2 instance, your visitors are gradually moved to the second instance. The time this takes should be roughly equal to the TTL value of your CNAME record. So, if this value is an hour, you need to run the two Amazon EC2 instances in parallel for one hour. This would cost you only $0.10 cents, because you are paying for the Amazon EC2 instance by the hour. After this, you can check the access log of the first Amazon EC2 instance, to see whether any visitors are still using that instance. If there are no visitors, you can shut down this instance and assume the second Amazon EC2 instance is your Apache node from now on.

You can do something similar when running both Apache and Tomcat on an extra large node. Suppose Server1 is running Apache, load balancing over Tomcats on Server1 and Server2. You want to add an additional Tomcat on Server3, the server you just turned on using the Amazon EC2 command-line utilities. There is a simple way to do this without restarting the Apache instance on Server1: Set up the Apache instance on Server2 to balance between Server1, Server2, and Server3. Start Apache on Server2. Then, change your DNS settings to point your domain name to Server2 instead of to Server1. From that point, it should take approximately an hour for all clients to switch from Server1 to Server2. You can check your Apache access logs to see that at some point there are no longer any requests for Server1. At that point, it's safe to reconfigure Apache on Server1 for the next change you want to make. Alternatively, you can set Server1's configuration to match the configuration on Server2 for consistency.

Configuration: Apache and the J2EE Server in an EC2 Environment

There are many ways to connect Apache with Tomcat. Over the past few years, native Tomcat connectors have quickly superseded each other and there have been options to connect by using the generic HTTP protocol. You will find protocol names such as mod_jk, mod_jk2, mod_ajp, mod_proxy, mod_rewrite, and other variations. Many online help texts are giving outdated or contradictory advice, making it hard to choose the proper connection protocol and settings.

Having tried most of the available connection choices, I know that each connection protocol has its challenges. It turns out that the option that's the simplest and the most consistent with other similar Apache features is the newest option: mod_proxy_ajp. The mod_proxy_ajp protocol is similar to mod_proxy_http except that it saves you from providing ProxyPassReverse lines in addition to ProxyPass, which means the AJP protocol is easier to maintain than the HTTP protocol.

Note: See http://developer.amazonwebservices.com/connect/entry.jspa?entryID=1015 for an Amazon Machine Image (AMI) with Apache and Tomcat preinstalled.

Before setting up Apache, you should start at the bottom: configure Tomcat's [TOMCAT_HOME]/conf/server.xml file. For Tomcat versions up to 5.5, it is wise to start with the example content from server-minimal.xml. Tomcat 6.0 doesn't provide this example, but the content of the default server.xml file is fairly minimal in 6.0. Find the AJP connector declaration:

<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />

Then, decide whether port 8009 is suitable for your situation. Let's assume you are running only one Tomcat per server instance. In some situations there are reasons to do otherwise, but they'd complicate this explanation.

Now, it's time to change the Apache configuration, [APACHE_HOME]/conf/httpd.conf. First, find the lines beginning with LoadModule and uncomment them:

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_http_module modules/mod_proxy_ajp.so

Then, at the bottom, add the following:

<Proxy balancer://mycluster>
        Order deny,allow
        Allow from all
        
                BalancerMember ajp://[SERVER1].amazon.com:8009/myjavaapp
        BalancerMember ajp://[SERVER2].amazon.com:8009/myjavaapp
        #... as many as you need here ...</Proxy> 

ProxyPass /myjavaapp balancer://mycluster

The last configuration detail is the CNAME in your DNS settings. Who your provider is will determine how you set these. Many larger application service providers give you your own dashboard where you can create A records, MX records, and CNAMES. Please refer to the documentation supplied by your provider for details.

Programming: Mind the Details: Session Variables and Serializing Objectsh2

Up to this point, we've only looked at load balancing on a request basis, assuming a stateless protocol. In practice, most web applications store session data that covers a series of requests from a single customer--for example, storing items in a shopping basket.

If these items are stored on the Tomcat instance, they would be lost if the next request is referred to another Tomcat instance. There are several ways to resolve this problem. One way is to configure a jvmRoute attribute in the Engine declaration in a Tomcat server. The jvmRoute attribute is appended to the session ID in the cookie sent to the client. In the next request, the Apache load balancer recognizes the jvmRoute and directs the request to the same Tomcat as the last request. This approach is called a "sticky" session.

There is one exception, though, in which using a sticky session would go wrong: After shutting down a Tomcat instance, the session data would be lost. The solution is to allow session data to be copied from one Tomcat server to another. This means that, during the programming phase, any object stored in a session should implement the Serializable interface. However, be careful of references to other, unrelated objects, which would require additional configuration that is beyond the scope of this article.

An alternative is to store session data in the back-end database that's shared by all Tomcat instances. This approach simplifies configuration and programming, but might complicate database management. You should make sure this data is cleaned out after sessions expire, and the database should be optimized for handling the data with a short life.

Learning More About AWS

This article highlights a few aspects of working with AWS. Here are a few more resources available to Java developers to help you learn more.

Common Resources on AWS

AWS web site - Learn more about each web service on the AWS web site. Developer Connection web site - The community web site for AWS developers includes forums on AWS, a Solutions Catalog for examples of what your peers have built, and more. Resource Center - Part of the Developer Connection web site, the Resource Center has links to tutorials, code samples, technical documentation, and other resources for building your application on AWS.

Great Resources for Java Developers

Download and learn how to use the JetS3t open source Java library for Amazon S3. Download and learn how to use the typica open source Java libraries for Amazon SQS and Amazon EC2.

Java and AWS In Action

Here are some web sites using Amazon Web Services and Java:

About the Author

Adriaan de Jonge is part of a team of Java specialists at SDB Java in The Hague, The Netherlands. His writing career began with a comparison of XForms and Ruby on Rails before he started writing for IBM developerWorks. As a Java developer, he is especially interested in front-end technology, both web-based and client-side. You can reach Adriaan at adriaandejonge@gmail.com.



Related Documents

Discussion
[image]

The 5 most recent discussion messages. View full discussion.

Posts: 4
Registered: 1/24/08
Deploying Distributed J2EE Applications Using Amazon EC2 - multi-tenancy?
Posted: Apr 4, 2008 4:47 AM PDT
 

Is mod_proxy preferred in case of a multi-tenancy application? If each of the instances is serving a separate tenant, would you still recommend the mod_proxy_ajp. I am currently using the mod_jk (worker.properties hell :) ).

thanks



[image] [image] [image] [image] [image]
not enough of HA, Dec 8, 2007 12:28 PM
Reviewer: yokinator
I am studying this issue deeply and I have some doubts not resolved with the Amazon studying Framework. In this article you are talking to use Apache as Load Balancer and that sound like a great idea, but when I read about the problems of restarting apache each time I need a new configuration, i mean, adding or deleting new java application servers...I start to get worried... I think we need another solutions, that avoid the need of restarting the Load Balancer...maybe studying could solve this. The other issue, is .... only one load balancer?... How many request can support? and when it goes down what studying? ... I have been reading about using dnsmadeeasy.comn api to register the new ip when our load balancer goes down, and when we need to start a new one....using a very low TTL, but not every ISP take care of this TTL.. so....when I read this I get a like afraid.. I down know if the amazon people it is working in this issues, but some of us only need something like that to be sure of putting our applications in the EC2 platform.
[image] [image] [image] [image] [image]
Great starting point, Dec 26, 2007 10:23 PM
Reviewer: Alan Ho
This is a great starting point. For High Availability, you might want to suggest to readers that they can run two instances of apache on two different nodes. You can setup the DNS to point to both nodes, so in case one instance goes down (e.g. rebooting an apache in order to add more tomcat instances), the other node will still operate.
[image] [image] [image] [image] [image]
No restart required, Jan 2, 2008 4:40 AM
Reviewer: sergek
You shouldn't need any downtime to add or remove app servers from the load balancing cluster. Instead of httpd restart, do httpd graceful. This will reload the configuration within a few seconds without any users able to notice the change. Also make sure you are using worker threads, otherwise each individual apache process is going to maintain its own independent pool of connections to the backend app servers, which with any sizeable amount of traffic can become an undue number of TCP connections. If you do this in a production environment, you should read up on Apache tuning since the author of this post seems more familiar with Tomcat and not very familiar with features of Apache. For more information on the new load balancing features in Apache 2.2, check out the docs at http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html This load balancing approach can work non-Tomcat apps including Resin and can do sticky load balancing as well.
[image] [image] [image] [image] [image]
All good but what about DB?, Jan 2, 2008 7:04 AM
Reviewer: Mateusz Krzeszowiec
I'm load balancing newbie and I like this intro article, at least (good comments!) I know where to look for more info. I just wonder: what about database? Oke: we got few tomcat instances but I bet it wouldn't be smart to connect to only one server instance dedicated for DBMS, right? Should there be just another database cluster not mentioned here?
Welcome, Guest Help





You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser