Overview
A hardware or software load balancer can be used to scale the query capacity of the Attivio Intelligence Engine (AIE) and to provide high availability querying against a multi-node (AIE) installation. Both of these use cases can be accomplished by setting up a virtual IP address which provides HTTP load balancing across multiple query receiver IP addresses/ports.
View incoming links.
Query Load Balancing Architecture
The load balancer should provide HTTP load balancing between the query receiver node URLs. A virtual IP address should be created that maps to the query receiver nodes.
The query receiver port is typically attivio.baseport
+1 for an AIE install (i.e. 17001 for AIE started on port 17000).
The query client should be configured to send query requests to the virtual IP address. As new queries come in, the load balancer handles distributing the queries to all of the AIE query receiver nodes that are running.
Example
- 3 server installation
- All instances are running on the default baseport of 17000.
- A load balancer should be set up to round robin dispatch requests over the following hosts and ports:
http://host1:17001/ http://host2:17001/ http://host3:17001/
The load balancer would then expose those 3 endpoints as a new virtual ip address
The front end application and corresponding query client should be updated to use the virtual IP address for sending query requests to the AIE installation.
Configuration
There are a variety of load-balancing solutions available; this example demonstrates configuring a software load balancer using the Apache web server and the mod_proxy_balancer module.
A dedicated hardware load balancer is recommended over a software load balancer.
The Apache web server configuration file is typically located in /etc/apache2/http.conf or /etc/httpd/conf/httpd.conf.
Ensure that the mod_status, mod_proxy, and mod_proxy_balancer modules are loaded by adding or uncommenting the following to the httpd.conf file:
LoadModule status_module modules/mod_status.so LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so LoadModule proxy_connect_module modules/mod_proxy_connect.so LoadModule proxy_balancer_module libexec/apache2/mod_proxy_balancer.so
Set the Apache web server to listen on the AIE searcher nodes' baseport and HTTP receiver port (baseport + 1) rather than the default HTTP port by, for example, replacing the 'Listen 80' line with these entries:
Listen 17000 Listen 17001
Next, define load-balancing groups for the /queryReceiver
(Java Client API) and /query
(XML REST API) endpoints on the HTTP receiver port and the /rest
(JSON REST API) endpoint on the baseport, directing requests to the matching endpoints on your searcher nodes. You can omit one or more of these groups if you do not intend to use the associated query API.
# Used for Java Client API's SearchClient class <Proxy "balancer://queryReceivers"> BalancerMember "http://host1:17001/queryReceiver" BalancerMember "http://host2:17001/queryReceiver" BalancerMember "http://host3:17001/queryReceiver" </Proxy> # Used for XML REST API <Proxy "balancer://query"> BalancerMember "http://host1:17001/query" BalancerMember "http://host2:17001/query" BalancerMember "http://host3:17001/query" </Proxy> # Used for JSON REST APIs <Proxy "balancer://rest"> BalancerMember "http://host1:17000/rest" BalancerMember "http://host2:17000/rest" BalancerMember "http://host3:17000/rest" </Proxy>
Configure a management URI. Be sure to secure this location properly - this example restricts access to the localhost.
<Location /balancer-manager> SetHandler balancer-manager Order Deny,Allow Deny from all Allow from localhost </Location>
Finally, redirect all /queryReceiver
, /query
, and rest
requests to the appropriate load-balancing group:
ProxyPass /balancer-manager ! # Used for Java Client API's SearchClient class ProxyPass "/queryReceiver" "balancer://queryReceivers" ProxyPassReverse "/queryReceiver" "balancer://queryReceivers # Used for XML REST API ProxyPass "/query" "balancer://query" ProxyPassReverse "/query" "balancer://query # Used for JSON REST APIs ProxyPass "/rest" "balancer://rest" ProxyPassReverse "/rest" "balancer://rest"
Requests to http://vip:17001/balancer-manager will now display the status of clustered machines, balancing statistics, and the algorithm used. Requests to http://vip:17001/queryReceiver or http://vip:17001/query or http://vip:17000/rest will be distributed amongst the 3 clustered searcher nodes.
Monitoring Nodes
Most load balancers - including the above example - are capable of detecting a connection failure, and will adjust their balancing accordingly. More sophisticated load balancers are also capable of periodically examining the output from nodes in addition to verifying the connection. If the load balancer supports sending an HTTP request and examining the response to determine endpoint state, the load balancer should be configured to check the following URL:
http://<attivio_host>:<baseport+1>/query
The load balancer can verify that AIE is up and accepting queries by confirming that the HTTP response includes the string "Attivio CGI Interface".
Main Article: System Monitoring
High Availability Load Balancers
Load balancing amongst several nodes provides scalability and allows individual query receivers to fail without interrupting service to end users; however, for true high availability, the load balancer itself must not be a single point of failure.
Redundancy at the load balancer is accomplished by configuring two or more machines to be capable of managing the virtual IP address described above. One serves as the master, and the other(s) will monitor the master and assume control of the IP resource should a failure be detected.
Configuration
This example demonstrates the heartbeat software provided by the High Availability Linux Projectproject to compliment the Apache-based load balancer described above.
First, both load balancer machines should be configured identically as described in the previous example. Ideally, they should be connected locally via crossover cable (or serial) in addition to their primary network adapters.
Install the heartbeat software. The Heartbeat software is in most standard yum or apt repositories, or it may be downloaded here.
Next, edit the main heartbeat service configuration file - typically /etc/ha.d/ha.cf. Assuming that 'uname -n' on one node is 'node01' and the other is 'node02', the configuration should look like this:
# Node communication bcast eth0 udpport 694 autojoin any # Logging logfile /var/log/ha-log logfacility local0 # Timeouts keepalive 1 warntime 6 deadtime 10 initdead 15 # Nodes auto_failback on node node01 node node02
This configuration tells heartbeat service that there is a two node cluster communicating over UDP port 694. In order to secure communication between the two nodes, edit /etc/ha.d/authkeys and specify an encryption scheme and password:
auth 1 1 sha HeartbeatPassword
Ensure that the authkeys file is using the correct permissions - 'chmod 600 /etc/ha.d/authkeys'.
Finally, tell the heartbeat service that it needs to manage the virtual IP HTTP service, with node01 as the preferred node. Assuming that 10.0.0.1 is the virtual IP address which corresponds to vip, add the following to /etc/ha.d/haresources:
node01 10.0.0.1:17001 httpd
Modify the 'Listen 17001' line added above to listen on the virtual address:
Listen 10.0.0.1:17001:17001
There is no need to explicitly create a virtual network interface for the shared IP - the heartbeat service does this for you.
Ensure both machines are configured identically, and start heartbeat on both of them:
/etc/init.d/heartbeat start
Now either load balancer can fail, but the rest of the cluster continues to serve traffic uninterrupted.