The requirement was to setup a HA application environment. We had two tomcat servers as backend nodes (application hosting servers). An nginx server was put in front of these two servers to give two functinalities: load-balancing and reverse proxy.

Two Nginx servers were setup. One would be acting as a backup node if the primary server fails. This failover mechanism was achieved using keepalived tool.

Keepalived can be installed using yum in centos7. The current version of keepalived provided through centos7 is v1.2.13.

Environment diagram:

 

 

VIP is 192.168.7.22

Master node is 192.168.7.47

Backup node is 192.168.7.44

 

In keepalived we have a master server and backup servers. Backup servers act as a failover point depending on the priority set for them. keepalived uses a protocol called VRRP (virtual Router Redundancy Protocol) to communicate between the master node and the backup nodes. So it is important we make sure VRRP traffic is allowed between the servers in firewalld (to and fro communication should be allowed). The master server at an interval of 1 sec (default value) will multicast packets to the network which is identified by the backup nodes in the same network using a parameter in keepalived.conf called “virtual_router_id”. It is just a unique number (between 0 … 255) that identifies the packets in the network. So make sure this value is kept the same in Master and backup nodes.

We will need to setup a Virtual IP address (VIP) for keepalived to failover to the backup node if the Master fails. This is not something we need to get from the network admin, but we just need to mention a free IP in the keepalived conf and keepalived will start using it as a VIP. Note that we do not need to configure this IP as a new interface in the server, as the linux systems can add multiple IPs to the same ethernet card virtually. You can view the VIP ip getting assigned to the active node automatically whenever a failover happens using the command:

# ip a s

In the output look for the line like this:

inet 192.168.7.22/25 scope global secondary eth0

 

 

How to open VRRP traffic between centos servers ?

Use the below commands:

firewall-cmd –direct –add-rule ipv4 filter INPUT 0 -i eth0 -d 224.0.0.0/8 -j ACCEPT
firewall-cmd –direct –perm –add-rule ipv4 filter INPUT 0 -i eth0 -d 224.0.0.0/8 -j ACCEPT
firewall-cmd –direct –add-rule ipv4 filter INPUT 0 -p vrrp -i eth0 -j ACCEPT
firewall-cmd –direct –perm –add-rule ipv4 filter INPUT 0 -p vrrp -i eth0 -j ACCEPT
firewall-cmd –direct –add-rule ipv4 filter OUTPUT 0 -p vrrp -o eth0 -j ACCEPT
firewall-cmd –direct –perm –add-rule ipv4 filter OUTPUT 0 -p vrrp -o eth0 -j ACCEPT

firewall-cmd –direct –permanent –add-rule ipv4 filter INPUT 0 –in-interface eth0 –destination 224.0.0.18 –protocol vrrp -j ACCEPT
firewall-cmd –direct –permanent –add-rule ipv4 filter OUTPUT 0 –out-interface eth0 –destination 224.0.0.18 –protocol vrrp -j ACCEPT
firewall-cmd –reload

(change the interface from eth0 to whatever according to that available in your server )

That pretty much covers all the rules required to communicate. Note that “224.0.0.18” is the IP address representing multicast. 224.0.0.0/8 is the entire multicast network range.

Now, once we have all the firewalld rules in place install the keepalived package using yum:

# yum install keepalived -y

(install keepalived in both of the nginx nodes)

Now, place the keepalived.conf in Master server with the below configuration:

# vi /etc/keepalived/keepalived.conf

vrrp_script chk_nginx {                                                      # Requires keepalived-1.1.13
script “pidof nginx”
interval 2                                                                              # check every 2 seconds
weight 4
}

vrrp_instance monitor {
state MASTER
interface eth0
virtual_router_id 77
priority 105                                                                          # 105 on master, 104 on backup
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.7.22/25
}
track_script {
chk_nginx
}
}

place the keepalived.conf in Backup server with the below configuration:

vrrp_script chk_nginx {                                                                      # Requires keepalived-1.1.13
script “pidof nginx”
interval 2                                                                                               # check every 2 seconds
weight 4
}

vrrp_instance monitor {
state BACKUP
interface eth0
virtual_router_id 77
priority 104                                                                                                # 105 on master, 104 on backup
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.7.22/25
}
track_script {
chk_nginx
}
}

 

Note: Note that the only difference between the backup and master node configuration is the “state” and “priority” directive. The priority directive value in master server should be always greater than the backup server priority for the failover to happen. 

 

Start and enable keepalived in the servers:

#systemctl start keepalived.service

#systemctl enable keepalived.service

How do you know it is working?

keepalived by default do not have a separate log file. The logs are written to message log in the server and unfortunately do not provide very detailed logging in case if the HA feature is not working as expected.

To view the multicast packets being send out from the master server using tcpdump:

~]# tcpdump -i eth0 vrrp

output:

18:03:30.807308 IP 192.168.7.47 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 77, prio 108, authtype simple, intvl 1s, length 20
18:03:31.807212 IP 192.168.7.47 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 77, prio 108, authtype simple, intvl 1s, length 20
How does the failover happen ?

In my configuration I use “vrrp_script ” which uses the command “pidof nginx”. This command gives an exit value of “1” if no nginx pids are present. This command can be replaced by any script according to our need and keepalived will look only for the exit value on running that script.

When the exit value is “1”, it instantly switches the VIP ip to the backup server and the backup server is made the new Master.

The message log in Master server displays something like this :

—————————————-

Mar 14 18:03:27 puppetserver Keepalived_vrrp[2874]: VRRP_Script(chk_nginx) failed
Mar 14 18:03:28 puppetserver Keepalived_vrrp[2874]: VRRP_Instance(monitor) Received higher prio advert
Mar 14 18:03:28 puppetserver Keepalived_vrrp[2874]: VRRP_Instance(monitor) Entering BACKUP STATE
Mar 14 18:03:28 puppetserver Keepalived_vrrp[2874]: VRRP_Instance(monitor) removing protocol VIPs.
Mar 14 18:03:28 puppetserver avahi-daemon[669]: Withdrawing address record for 192.168.7.22 on eth0.
Mar 14 18:03:28 puppetserver Keepalived_healthcheckers[2873]: Netlink reflector reports IP 192.168.7.22 removed

——————————————-

After the Backup server becomes the new master it continuously monitors the original Master server if the nginx process is coming back online there. As soon as it comes back online the backup server transfers the control back to the Master server and we have the Master server back active. If you want the active server to stay as the backup node even after the Master node service has come online, you need to add the directive “nopreempt” in the backup server keepalived configuration file. So “nopreempt” allows the lower priority machine to maintain the master role, even when a higher priority machine comes back online.
# NOTE: For this to work, the initial state of this entry must be BACKUP.

Important notes:

  1. For keepalived to work, make sure that you have the below values added to sysctl.conf file :
    net.ipv4.ip_nonlocal_bind = 1
    net.ipv4.ip_forward net.ipv4.ip_forward = 1
  2. In some odd cases, for the VIP ip to function properly you need to reboot the master and backup nodes when keepalived is configured for the first time. Also make sure that the keepalived daemon is being started along with the system boot by enabling it:
    systemctl enable keepalived.service
  3. To get more details on the keepalived settings that can be used check out the man page :
    #man keepalived.conf
  4. We don’t want the VIP IP to get recorded in the arp table of the servers. To avoid that, it is recommended to block the vip IP in each server. You can refer this page –> http://kb.linuxvirtualserver.org/wiki/Using_arptables_to_disable_ARP .  List arp table using the command : “arp -n”

 

(Content written after referring many blogs. Please free to correct if any of the information provided is incorrect/incomplete)

Good Luck!