Connection Manager sqlhosts File

Connection Manager sqlhosts File

While playing with the IDS 11.5 Connection Manager, I realized that the examples, regarding the setup of the sqlhosts file in the IDS V11.5 Information Center, are suboptimal. Here is the setup graphic taken from documentation:

The weak point is the sqlhosts file on the left side (connection manager and client machines). If there has been a failover of the primary instance in the MACH cluster, things will still work as long as you don't restart your Connection Manager. The Connection Manager is already connected to every cluster node and routes the incoming connection requests - according to the configured SLA's - to the surviving cluster nodes.

But guess what happens if the Connection Manager has to be restarted for whatever reason ?

He will not able to connect to the MACH-Cluster as long as the original primary node (instance name ids in this example) is down. You would need to manually add the new primary node to the sqlhosts file. This is not desirable for a high availability cluster.

The better approach would be to re-write the sqlhosts file for the connection manager as well as the client machines (left side of the graphic) including all nodes that are possible failover targets according to your FOC-Policy. Those nodes will be combined through a group entry named g_mach. Below you'll find the revised sqlhosts file:

#DBSERVER   PROTOCOL    HOSTNAME   SERVICE    GROUP
g_mach      group       -          -          e=ids_sds
ids         onsoctcp    host1      port1      g=g_mach
ids_hdr     onsoctcp    host2      port2      g=g_mach
ids_sds     onsoctcp    host3      port3      g=g_mach
oltp        onsoctcp    cmhost1    cmport1
report      onsoctcp    cmhost1    cmport2 
payroll     onsoctcp    cmhost1    cmport3 

When starting the Connection Manager, you need to set your INFORMIXSERVER environment variable to the new group entry: g_mach.

With this configuration, it doesn't matter which node of the MACH-Cluster is currently the primary instance. The Connection Manager will always connect via the group entry g_mach and down nodes will be automatically skipped by the Informix network driver.

Remember that it is also possible to configure several independent Connection Managers and combine them through a group entry as well. This makes the Connection Manager highly available.

I love this technology ! Measured on what you achieve with the deployment of the Connection Manager in combination with an IDS MACH-Cluster, the configuration is a cakewalk.

Connection failover

Hi Eric,

have you actually tried failing over to some of the secondary servers through CM?

I have just tested basic scenario: 1 primary, 1 HDR secondary, 1 RSS. Each of them running on separate virtual machine (to fully simulate server crash).

Everything else being identical I tested two cases, with different results:

1. Primary is shut down gracefully (onmode -ky). Result: CM acknowledges this and redirects client connections successfully to HDR secondary as configured.
2. The primary "server" crashes (virtual machine processes killed). Result: CM _doesn't_ acknowledge the event and tries to connect the client to crashed primary, with no success of course.

I have waited for an hour or so to see if there's any hidden timeout, but no.

OS: Linux 32-bit
IDS: 11.50.UC2
CSDK: 3.50.UC2

Anyone tested a similar scenario?
I can post relevant config (although it doesn't seem to be the issue since all works as expected in the first case) and log files.

Thanks

Davorin

Re: Connection failover

Hi Davorin,

I tested it with the following setup:

  • Connection Manager: gepard_cm
  • Primary: gepard
  • HDR: gepard_hdr
  • SDS: gepard_sds1
  • RSS: gepard_rss1

CM as well as the MACH-Nodes all run on a single physical machine.
CM FOC is: FOC HDR+SDS,5.

I killed the CPU-VP of the Primary (kill -9) and the Connection Manager
writes the following information to it's logfile:

12:41:46 IBM Informix Connection Manager
12:41:46 Connection Manager name is gepard_cm
12:41:46 Starting Connection Manager...
12:41:46 Current max open fd is 1024
12:41:46 dbservername = gepard
12:41:46 nettype      = onsoctcp
12:41:46 hostname     = apollo4
12:41:46 servicename  = ids_gepard
12:41:46 options      = g=g_mach
12:41:46 switch to daemon mode
12:41:46 listener cm_oltp1 initializing
12:41:46 listener cm_report1 initializing
12:41:46 Listener cm_oltp1=(PRI+SDS) is active with 8 worker threads
12:41:46 Listener cm_report1=(HDR+RSS) is active with 8 worker threads
12:41:46 Connection Manager successfully connected to gepard
12:41:46 Arbitrator FOC string = HDR+SDS,5
12:41:46 FOC[0] = HDR
12:41:46 FOC[1] = SDS
12:41:46 FOC timeout = 5
12:41:46 Arbitrator is active on CM = gepard_cm
12:41:47 Connection Manager successfully connected to gepard_hdr
12:41:47 Connection Manager successfully connected to gepard_rss1
12:41:47 Connection Manager successfully connected to gepard_sds1
12:41:47 Connection Manager started successfully
12:43:06 Connection Manager disconnected from gepard
12:43:18 Arbitrator make primary on node = gepard_hdr successful
12:43:18 Arbitrator FOC string = HDR+SDS,5
12:43:18 FOC[0] = HDR
12:43:18 FOC[1] = SDS
12:43:18 FOC timeout = 5
12:43:39 Connection Manager disconnected from gepard_sds1

CM successfully established the HDR-Node as new Primary. RSS node automatically re-established connection to new Primary. SDS-Node was shut down automatically, due to loss of (old) Primary.

Don't know what happened in your configuration. You may need to set DEBUG 1 in $INFORMIXDIR/etc/cmsm.cfg. My IDS Version is 11.50.FC2DE.

HTH.

Re: Connection failover

Hi Eric,

OK, I'll have to re-test it again. However, there is a difference between our tests, all your instances and CM being on the same server, and mine on separate. Although I'm not sure if this could be the cause.
Anyway, will post an update as soon as I have time to do some testing again.

Cheers

Davorin