|
Kai-Steffen Hielscher 22/01/02
|
Our cluster consists of 13 PCs that were bulit using commodity of-the-shelf components and serve as a distributed webserver running under Linux with kernel modifications from the Linux Virtual Server-Project . The primary goal of this project is to provide a model of a distributed webserver that can be verified and validated using measurements to obtain real world data.
Two of the PCs are used as load balancers, ten work as real servers where the http server software is running and one is used for clock synchronization.
The cluster is connected to the internet using three 1Gbit/s Ethernet interfaces. Incomming traffic is routed to one of the real servers by one of the load balancers. The other machine is a standby PC that monitors the state of the active load balancer to detect failures and replace the defective computer. The route of the responses of the real servers depends on the technology used for load distribution in the cluster and thus on the topology of the network. We can realize three different methods for this purpose: NAT (Network Address Tranlation), Tunneling or Direct Routing.|
NAT |
When
using NAT for load balancing, the packets with VIP as the destination IP
address are passed to the load balancer (VIP is the IP adress of the external
network interface of the load balancer) and the destination IP address
of incomming requests is changed by the load balancer to point to one
real server determined by a scheduling algorithm. This can be accomplished
using standard NAT methods. The real server processes the request and
passes the responses back to the load balancer which in turn changes the
source IP address of the answer packets to its own IP address. The load
balancer works as a default gateway for all real servers. Both requests
and responses pass the load balancer and the IP addresses of both types
must be changed using NAT. Thus this method is referred to as double
rewriting.
The advantage
of using NAT is that the real server can run any operating system without
modification, they can use private IP addresses and only one public IP
address, the VIP assigned to the load balancer, is needed. The disadvantage
of this approach is the need for all packets to pass the load balancer
twice, so the load on the load balancer can become high and the number of
real servers should not exceed a certain not to degrade the performance
of the system. All machines have to be in one (logical) IP network, preferably
with private addresses (e.g. 192.168.x.0/24).
|
|
Tunneling |
The
tunneling method requires all real servers to have tunneling interfaces
set up with the IP address VIP. One interface of the load balancer has
the VIP assigned to it also. Since the tunneling interfaces do not respond
to ARP requests, the MAC address of one interface of the load balancer
will be in the ARP table of the router connecting the system to the internet.
This is the reason that incoming requests arrive at the load balancer which
encapsulates them in an IP packet with the destination IP address of one
of the real servers. There the request is decapsulated and processed by
the server software. The answers can be passed back to the client without
any modification by the load balancer because the real server can fill in
the VIP as the source address of the response packets since it assigned
to the tunneling interface. This method is called a single rewriting
method.
By routing
the respone packets directly to to the client this architecture can handle
a large amount of requests per second and can be used with a large number
of real servers for high performance applications. However, the real server
operating system needs to support IP tunneling. The real servers can be
geographically distributed.
|
|
Direct Routing |
Here
the VIP is assigned to one interface of the load balancer and to alias
devices for the network devices of the real servers. The alias interfaces
must not answer to ARP requests. For this purpose, a special patch has
to be applied for modern linux kernels. Since the MAC address of the interface
of the load balancer is the one in the ARP table entry for the VIP in the
internet router, request packets arrive at the load balancer first. The
load balancer neither has to rewrite nor to encapsulate the packet. It passes
the response to the real server determined by the scheduling algorithm
by insering the MAC address of this real server as the destination of the
generated link layer frame (e.g. Ethernet frame). When the packet arrives
at the real server, it is accepted there beacuse of the alias carrying the
VIP. The responses generated by the server software can be sent to the client
using the VIP as the source address without having to pass the load balancer
a second time.
This method
of load balancing works without the tunneling overhead and gives the highest
performance of all three mechanisms described. There is no real packet rewriting
involved, only the translation of IP addresses to MAC addresses is dynamically
changed. One drawback is that all machnies must be in the same physical
network segment.
|
|
Time Synchronization |
| We can obtain event traces either by using hybrid monitoring with the ZM4 system developed at the chair (100ns resolution) or by distributed software monitoring. Since the hardware monitor is more suitable for high precision, low volume measurements, most of the timestamps for performance analyses of the webserver are obatined using software monitoring. We use a combination of NTP and the PPS API to synchronize the clocks of the object system to the time obtained by one of our four GPS receivers. We found out that reading the time directly from the GPS clock was way to slow for our purposes. NTP over the network does not synchronize our machines' clocks accurately enough. By feeding the PPS (pulse per second) signal generated by our GPS receiver to the serial port of all PCs and using the PPS Linux kernel modifications and NTP, we can improve the accuracy of the synchronization. [Details ] |
|
Components |
||||||||
|
|
Load Balancers |
||||||||||
|
|
Real Servers |
||||||
|
|
Main Ethernet Switch |
| Cisco
Catalyst 3500XL with 2 1000-Base-FX GBIC modules |
|
GPS receiver system |
||||||||
|
| Contact |
|
Dipl.-Inf. Kai-Steffen Hielscher
|