Friday 23 August 2013

How to use Apache as Software Load Balancer [ mod_jk implementation of Apache ]

Using Apache as Software Load Balancer


Here, I will be explaining on how 'mod_jk + Apache' can be used together to work as a software Load Balancer.

Note: I am assuming that whoever is reading this has basic knowledge of Apache and its use and implementation.

There are mainly three files which you can put inside conf.d folder lying inside Apache main configuration (/etc/httpd/conf.d).
  1. mod_jk.conf
  2. ssl.conf [For https content]
  3. workers.properties
I am excluding "httpd.conf " file here. You can leave it with original settings . I am also excluding "uriworkermap.properties" file because instead of defining the context path or the name of your war file here, you can simply run you application as ROOT.war. Meaning, rename your "xyz.war" to "ROOT.war" and run your tomcat server. After renaming your application as ROOT.war ,you don't have to write the context path while accessing your application. And you have excluded one file ("uriworkermap.properties") while setting up Apache as Software Load Balancer. Isn't this a good deal :).

E.g:

How to access your application if the name of your war file is:

xyz.war --> http://localhost:8080/xyz
ROOT.war --> http://localhost:8080  [No need to explicitly mention the context path ; port 8080 is the port on which your tomcat server is running]

Anyhow , I will write down the settings of "uriworkermap.properties" too, incase anybody still needs that.

# worker configuration file
# Mount the Servlet context to the ajp13 worker
/xyz/=loadbalancer
/xyz/*=loadbalancer

Below are the content of the 3 files mentioned above which should be used for setting up Apache to make it work as Software Load Balancer.

1) mod_jk.conf

# Load mod_jk module
LoadModule jk_module modules/mod_jk.so

#It's better to define Listen port and comment out that in httpd.conf file
Listen 80  
<VirtualHost *:80>
#ServerName 10.34.24.54
DocumentRoot /var/www/html

RewriteEngine On
RewriteLogLevel 1
RewriteLog logs/rewrite.log
  
JkMount /jkmanager/* jkstatus
JkMount /* loadbalancer
JkUnMount /siteimages/* loadbalancer # Static images served from here
JkUnMount /robots.txt loadbalancer
</VirtualHost>

# Where to find workers.properties
JkWorkersFile conf.d/workers.properties
# Where to put jk logs
JkLogFile logs/mod_jk.log
# Set the jk log level [debug/error/info]
JkLogLevel info
# Select the log format
JkLogStampFormat  "[%a %b %d %H: %M: %S %Y]"
# JkOptions indicates to send SSK KEY SIZE
JkOptions +ForwardKeySize -ForwardDirectories
# JkRequestLogFormat
JkRequestLogFormat "%w %V %T"
# Mount your applications
#JkMount /* loadbalancer
# You can use external file for mount points.
# It will be checked for updates each 60 seconds.
# The format of the file is: /url=worker
# /examples/*=loadbalancer
#JkMountFile conf.d/uriworkermap.properties
# Add shared memory.
# This directive is present with 1.2.10 and
# later versions of mod_jk, and is needed for
# for load balancing to work properly
JkShmFile logs/jk.shm
# Add jkstatus for managing runtime data

#Add the jkstatus mount point

# Add jkstatus for managing runtime data
#<Location /jkstatus/>
#    JkMount status
#    JkUnMount status
#    Order deny,allow
#    Allow from all
#</Location>

# Add the jkstatus mount point
#JkMount /jkmanager/* jkstatus
#Enable the JK manager access from localhost only
<Location /jkmanager/>
  JkMount jkstatus
  Order deny,allow
  Allow from all
</Location>

<IfModule mod_expires.c>
        <FilesMatch "\.(ico|pdf|flv|jpe?g|png|gif|js|css)$">
                ExpiresActive On
                ExpiresDefault "access plus 2 year"
        </FilesMatch>
</IfModule>

#SetEnvIf Request_URI "^/check.txt$" dontlog
SetEnvIf Request_URI "(\.gif$|\.jpg$|\.JPG$|\.Jpg$|\.png$|\.js$|\.css$|\.woff$|\.ttf$|\.eot$|\.ico$|server-status$|jkmanager)" dontlog

LogFormat "@%{X-Forwarded-For}i@ %h %l %u %t %T \"%r\" %>s %b %T \"%{Referer}i\" \"%{User-Agent}i\" \"%{JSESSIONID}C\" \"%{HS_ID}C\" " custom

CustomLog "|/usr/local/sbin/cronolog  -S /var/log/httpd/access.log  --period='1 days' /var/log/httpd/access.%Y%m%d.log" custom   env=!dontlog




2) ssl.conf


Not going to explain this in detail. Just take care of your certificates path. Get a self signed certificate if you are setting this up in dev/preprod environment.

And don't forget to add this line in this file too.
JkMount /* loadbalancer


3 ) workers.properties


# Define list of workers that will be used for mapping requests

worker.list=loadbalancer,jkstatus

# Define the first member worker
worker.jvmnode1.port=8010                     
#8010 is the ajp port and not http port
worker.jvmnode1.host=10.34.24.54
worker.jvmnode1.type=ajp13
worker.jvmnode1.lbfactor=1            
#LBFactor is very crucial parameter. It decides load distribution. In this scenario it's 50:50


# Define the second member worker
worker.jvmnode2.port=8010              
worker.jvmnode2.host=10.34.24.55
worker.jvmnode2.type=ajp13
worker.jvmnode2.lbfactor=1


worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=jvmnode1, jvmnode2

worker.loadbalancer.sticky_session=1
worker.jkstatus.type=status


Software Vs Hardware Load Balancer ?


Load balancing your application is crucial piece of your whole architecture. Load balancer plays an important role to maintain application & network performance; it helps in dynamic load balancing which is very important for ensuring that our requests and resources are evenly distributed across available servers and in an optimal manner.

The basic question which haunts us frequently is ...........How to decide that we should use hardware load balancer or software load balancer and on what criteria the decision should be taken?

This question in itself is very complicated and the answer would differ from person to person and upon requirement. There is no end to any technical argument if you don't have any data to validate about what you are suggesting / interpreting ;). So, whatever I will be proposing in this blog is my personal view and I leave it up to you to agree/disagree.

We should be cognizant about our exact requirement (never do over engineering) and henceforth use your technical skills and experience in appropriate direction to get the utmost out of it. If you are a novice to Load balancer and its use/implementation then hopefully this would be a good place to kick start. You can drop me a mail if you have any doubts regarding below explanation, will try my best to clarify your doubts.

Now coming back to original question………….How to make a decision on whether we should go ahead with HLB or SLB (Hardware Load Balancer or Software Load Balancer)? 

The obvious difference is that one is software and so needs to be installed on a server. That server can be of any configuration and therefore could be fast or slow, reliable or not etc etc (Totally depends on server configuration such has Ram size, CPU cores etc etc…Google it  know more). A hardware load balancer is a dedicated piece of hardware that you would install in your server rack.

The question comes down to your budget, experience and the importance of your product. If you have the money to spend, dedicated load balancers are the easy and safe way to go. But in a pinch, a dedicated server running load balancing software should be fine if you're able to configure it (depending on the hardware and traffic demands). Some of the things which should be kept in mind before taking any decision is mentioned below:
 
  1. Concurrent requests on your site.
  2. HA is required for your site or not (HA -> High availability)
  3. Performance/speed (HLB definitely enhance the performance of the site in many ways)
  4. Costs
  5. Downtime (Can you afford this?)
  6. In-house skill sets to setup and troubleshoot issues if you are setting up SLB yourself.
  7. Scalability (The most important parameter, IMO)

To terminate SSL most of the companies run SSL connections into Linux boxes running Apache. This setup is convenient and easy to setup but it's not optimized for SSL, so it's slow and costly operation. Much of the capacities of these servers are unnecessarily consumed processing SSL. Load balancers on the other hand have crypto cards that terminate SSL very efficiently in hardware. Getting rid of the Linux boxes and moving SSL to HLB will decrease load on Apache and eventually lower the number of request being processed by Apache which would eventually enhance Apache performance and a whole lot of CPU can easily be reclaimed. Client performance will also be greatly increased because SSL accelerators are faster at SSL than generic LINUX boxes. Though you can argue that Nginx can be used for the SSL and rest of the operations could be left with Apache, but again Nginx needs a dedicated Linux box to be set up.

SLB is a single point of failure. We can use Haproxy in front of apache. Haproxy and Apache both can act as SLB but Haproxy is more robust than Apache and is a reliable, High performance TCP/HTTP Load Balancer. It will not make you let you down. And the most important thing is it's open source or to be more specific it's FREE!!!!!FREE!!!!!FREE!!!!!!!.........And you know FREE is always better and give eternal happiness to us :) . Moreover you don't have to depend on 3rd Party for making any changes at LB level. Generally LB are maintained by third party or sysadmin and you will have limited or no access to it. So better go for Haproxy as Load balancer and Nginx for the serving https requests.

But as you organization grows bigger and bigger than you have to go for some robust and scalable architecture. Scalability is the biggest factor for any growing organization.HLB are dedicated LB and they were meant for some purpose and not merely for earning money by big fish like Cisco. A normal person can't even think to buy Cisco/F5/F20 load balancer because they are so costly. Definitely there are open source LB available but it doesn't solve your purpose solely ..........To explain in simple words ......You can cook doesn't mean you are a chef ;).So definitely hardware has its own 1001 advantages but you should know when to go for it ;until and unless you have enough money in pocket to shed off ;) . To be very practical, here the debate isn't really on "HLB" vs "SLB". It is on "buy a proven/tested/certified technology stack as an appliance” versus “builds it yourself using open source". 

I had tried my best not be biased and had given an overview about SLB as well as HLB. Now, the call is completely yours. 

My personal point of view:

Having a bit of knowledge about performance engineering (I am aware that HLB will definitely boost performance of application ;) ) ,being a bit of developer, a bit knowledge of Sysadmin , a bit knowledge of supporting & troubleshooting production issues, and doing production deployment frequently .......I would give my vote to Software Load Balancer. It’s a onetime pain to establish everything initially but once set up fully you would be the undisputed owner of it and can play around with it. Things are always easy if you are controlling it. Believe me, telling my personal experience, it's a pain in a** to go to sysadmin and request changes at LB level. These devices (HLB) are out of your control and are difficult to debug, provision, and test. So my vote goes to Software Load Balancer (SLB) and to be more specific HAProxy.

Had tried to cover up on how to use Apache as Software Load Balancer (using mod_jk configuration). Check this out: http://kulshresht-gautam.blogspot.in/2013/08/how-to-use-apache-as-software-load.html


Abbreviation used in this Blog : 

Load Balancer: LB
Hardware Load Balancer: HLB
Software Load Balancer : SLB

Wednesday 7 August 2013

How to Monitor And Debug Memcached Server Using phpMemCachedAdmin


Download phpMemCachedAdmin tarball


#wget http://phpmemcacheadmin.googlecode.com/files/phpMemcachedAdmin-1.2.2-r262.tar.gz


Install phpMemCachedAdmin in `/var/www/html/memcached/` directory, create a folder memcached inside /var/ww/html/. Extract phpMemCachedAdmin.tar.gz and change permission for Memcache.php

# mkdir -p /var/www/html/memcached
# tar -xvzf phpMemcachedAdmin-1.2.2-r262.tar.gz -C /var/www/html/memcached/
# chmod a+rwx /var/www/html/memcached/Config/Memcache.php

Apache Configuration

Create a file "memcached.conf" inside /etc/httpd/conf.d and add below configuration in that file.

# cd /etc/httpd/conf.d
# vim memcached.conf

Add below configuration in memcached.conf

Listen 85
<VirtualHost *:85>
    ServerName   10.23.11.11
    UseCanonicalName Off
    ServerAdmin  "kulshresht@xyz.com"
    DocumentRoot "/var/www/html/memcached"
    CustomLog  /var/log/httpd/memcached-access_log common
    ErrorLog   /var/log/httpd/memcached-error_log
</VirtualHost>

P.S: You can use any port for phpAdmin , whichever is free for you. In mine case 80 and 81 were already used , therefore i had used 85.

Restart Apache: /etc/init.d/httpd restart
P.S: Do not forget to install php to see the UI interface -> [ sudo yum install php ]

Few snapshots are attached below for reference:












If you want to explore further on Memcache installation steps then go to below link:
http://kulshresht-gautam.blogspot.in/2013/07/memcached-installation.html


Thursday 1 August 2013

Shell script to find total request vs error count in Apache logs

Tweak the below script to suit your log format .

#!/bin/sh
echo "" > /tmp/error.csv

outFile="/tmp/Error.txt"
outFile1="/tmp/Error_Count.txt"
outFile2="/tmp/Total.txt"
outFile3="/tmp/Total_Request.txt"

a=`cat /var/log/access.log|awk '{print $10}' | wc -l`
cat /var/log/access.log|awk '{print $4":"$10}' | awk -F':' '{print $2}' | sort | uniq -c > $outFile2

cat $outFile2 | awk '{for(i=1;i<NF;i++){ print $2 "  " $i*3}}' > $outFile3

b=`cat /var/log/access.log|awk '{print $10}' | grep "^5" | wc -l`
cat /var/log/access.log| awk '{print $4":"$10}' | awk -F':' '{if($NF>=500){print $2}}' | sort | uniq -c > $outFile

cat $outFile | awk '{for(i=1;i<NF;i++){ print $2 "  " $i*3}}' > $outFile1

mailsend -smtp mailserver.production.xyz.lan -port 25 -M "Total request count/Error count for `date +%m:%d:%Y` : $(expr $a \* 3) / $(expr $b \* 3) " -t kulshresht.gautam@xyz.com -f kulshresht.gautam@xyz.com -sub "Request / Error count" -attach $outFile1,text,a -attach $outFile3,text,a

Log format Sample:

1.2.3.4 - - [31/Jul/2013:00:00:00 +0530] 0 "GET  HTTP/1.1" 200 20 0 "http://www.xyz.com/adsjskds" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2)" "8D8E535ED8CBE15ED60C365.jvmnode1"



**Mailsend should be configured on your system.

**I am mutlipying the output by 3 because we have 3 apache web servers , and all 3 are under one Load Balancer.