Page tree
Skip to end of metadata
Go to start of metadata

Overview

A hardware or software load balancer can be used to scale the query capacity of the Attivio Intelligence Engine (AIE) and to provide high availability querying against a multi-node (AIE) installation. Both of these use cases can be accomplished by setting up a virtual IP address which provides HTTP load balancing across multiple query receiver IP addresses/ports.

View incoming links.

 

Query Load Balancing Architecture

The load balancer should provide HTTP load balancing between the query receiver node URLs. A virtual IP address should be created that maps to the query receiver nodes.

The query receiver port is typically attivio.baseport+1 for an AIE install (i.e. 17001 for AIE started on port 17000).

The query client should be configured to send query requests to the virtual IP address. As new queries come in, the load balancer handles distributing the queries to all of the AIE query receiver nodes that are running.

Example

  • 3 server installation
  • All instances are running on the default baseport of 17000.
  • A load balancer should be set up to round robin dispatch requests over the following hosts and ports:

LoadBalancer

http://host1:17001/ http://host2:17001/ http://host3:17001/

The load balancer would then expose those 3 endpoints as a new virtual ip address

http://vip:17001/

The front end application and corresponding query client should be updated to use the virtual IP address for sending query requests to the AIE installation.

Configuration

There are a variety of load-balancing solutions available; this example demonstrates configuring a software load balancer using the Apache web server and the mod_proxy_balancer module.

A dedicated hardware load balancer is recommended over a software load balancer.

The Apache web server configuration file is typically located in /etc/apache2/http.conf or /etc/httpd/conf/httpd.conf.

Ensure that the mod_status, mod_proxy, and mod_proxy_balancer modules are loaded by adding or uncommenting the following to the httpd.conf file:

httpd.conf
LoadModule status_module modules/mod_status.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule proxy_balancer_module libexec/apache2/mod_proxy_balancer.so

Set the Apache web server to listen on the AIE searcher nodes' baseport and HTTP receiver port (baseport + 1) rather than the default HTTP port by, for example, replacing the 'Listen 80' line with these entries:

httpd.conf
Listen 17000
Listen 17001

Next, define load-balancing groups for the /queryReceiver (Java Client API) and /query (XML REST API) endpoints on the HTTP receiver port and the /rest (JSON REST API) endpoint on the baseport, directing requests to the matching endpoints on your searcher nodes. You can omit one or more of these groups if you do not intend to use the associated query API.

httpd.conf
# Used for Java Client API's SearchClient class
<Proxy "balancer://queryReceivers">
  BalancerMember "http://host1:17001/queryReceiver"
  BalancerMember "http://host2:17001/queryReceiver"
  BalancerMember "http://host3:17001/queryReceiver"
</Proxy>
 
# Used for XML REST API
<Proxy "balancer://query">
  BalancerMember "http://host1:17001/query"
  BalancerMember "http://host2:17001/query"
  BalancerMember "http://host3:17001/query"
</Proxy>
 
# Used for JSON REST APIs
<Proxy "balancer://rest">
  BalancerMember "http://host1:17000/rest"
  BalancerMember "http://host2:17000/rest"
  BalancerMember "http://host3:17000/rest"
</Proxy>

Configure a management URI. Be sure to secure this location properly - this example restricts access to the localhost.

httpd.conf
<Location /balancer-manager>
  SetHandler balancer-manager

  Order Deny,Allow
  Deny from all
  Allow from localhost
</Location>

Finally, redirect all /queryReceiver/query, and rest requests to the appropriate load-balancing group:

ProxyPass /balancer-manager !
 
# Used for Java Client API's SearchClient class
ProxyPass "/queryReceiver" "balancer://queryReceivers"
ProxyPassReverse "/queryReceiver" "balancer://queryReceivers
 
# Used for XML REST API
ProxyPass "/query" "balancer://query"
ProxyPassReverse "/query" "balancer://query
 
# Used for JSON REST APIs
ProxyPass "/rest" "balancer://rest"
ProxyPassReverse "/rest" "balancer://rest"

Requests to http://vip:17001/balancer-manager will now display the status of clustered machines, balancing statistics, and the algorithm used. Requests to http://vip:17001/queryReceiver or http://vip:17001/query or http://vip:17000/rest will be distributed amongst the 3 clustered searcher nodes.

Monitoring Nodes

Most load balancers - including the above example - are capable of detecting a connection failure, and will adjust their balancing accordingly. More sophisticated load balancers are also capable of periodically examining the output from nodes in addition to verifying the connection. If the load balancer supports sending an HTTP request and examining the response to determine endpoint state, the load balancer should be configured to check the following URL:

http://<attivio_host>:<baseport+1>/query

The load balancer can verify that AIE is up and accepting queries by confirming that the HTTP response includes the string "Attivio CGI Interface".

Main Article: System Monitoring

High Availability Load Balancers

Load balancing amongst several nodes provides scalability and allows individual query receivers to fail without interrupting service to end users; however, for true high availability, the load balancer itself must not be a single point of failure.

Redundancy at the load balancer is accomplished by configuring two or more machines to be capable of managing the virtual IP address described above. One serves as the master, and the other(s) will monitor the master and assume control of the IP resource should a failure be detected.

Configuration

This example demonstrates the heartbeat software provided by the High Availability Linux Projectproject to compliment the Apache-based load balancer described above.

First, both load balancer machines should be configured identically as described in the previous example. Ideally, they should be connected locally via crossover cable (or serial) in addition to their primary network adapters.

Install the heartbeat software. The Heartbeat software is in most standard yum or apt repositories, or it may be downloaded here.

Next, edit the main heartbeat service configuration file - typically /etc/ha.d/ha.cf. Assuming that 'uname -n' on one node is 'node01' and the other is 'node02', the configuration should look like this:

/etc/ha.d/ha.cf
# Node communication
bcast		eth0
udpport	        694
autojoin	any

# Logging
logfile         /var/log/ha-log
logfacility     local0

# Timeouts
keepalive	1
warntime	6
deadtime	10
initdead	15

# Nodes
auto_failback on
node node01
node node02

This configuration tells heartbeat service that there is a two node cluster communicating over UDP port 694. In order to secure communication between the two nodes, edit /etc/ha.d/authkeys and specify an encryption scheme and password:

/etc/ha.d/authkeys
auth 1
1 sha HeartbeatPassword

Ensure that the authkeys file is using the correct permissions - 'chmod 600 /etc/ha.d/authkeys'.

Finally, tell the heartbeat service that it needs to manage the virtual IP HTTP service, with node01 as the preferred node. Assuming that 10.0.0.1 is the virtual IP address which corresponds to vip, add the following to /etc/ha.d/haresources:

/etc/ha.d/haresources
node01 10.0.0.1:17001 httpd

Modify the 'Listen 17001' line added above to listen on the virtual address:

/etc/ha.d/haresources
Listen 10.0.0.1:17001:17001

There is no need to explicitly create a virtual network interface for the shared IP - the heartbeat service does this for you.

Ensure both machines are configured identically, and start heartbeat on both of them:

/etc/init.d/heartbeat start

Now either load balancer can fail, but the rest of the cluster continues to serve traffic uninterrupted.

  • No labels