Issue: Citrix ADM Database Streaming Channel Broken Between HA Nodes

citrix_adm_database_streaming_broken

Issue and Background

Recently we worked with a customer to deploy Citrix ADM (Application Delivery Management) in a high availability (HA) setup at one of their data centers. ADM was to be used for network reporting, SSL certificate management, and to provide for a single centralized location to manage their vast deployment of Citrix ADCs. We deployed ADM for their QA environment as a standalone deployment without any issues, but we encountered problems while deploying Citrix ADM (v13.0 b 82.41) in an HA pair. Floating IP for the Citrix ADM high availability deployment seemed to be up; however, the secondary node in the high availability deployment failed to come back on the network and was inaccessible via NSROOT credentials. On the primary ADM appliance, we saw the error “Database streaming Channel Broken between HA nodes”.

Root Cause

Upon further inspection, the root cause was determined to be corruption of the PostgreSQL database on the secondary Citrix ADM appliance.

Resolution 

Step 1

  • Our first step in troubleshooting this issue was to recover NSROOT access to the secondary appliance. We followed steps listed in our article on Recovering from Citrix ADM 13.0 database corruption . It was found that the connectivity with the PostgreSQL database was broken on the secondary Citrix ADM appliance. Also, the mas_recovery.py script did not seem to restore access to the PostgreSQL database.

Step 2

  • For our second step, we verified file system integrity of the ADM appliance as per this Citrix article.

Step 3

  • Once the file system integrity of the ADM appliance was verified, we checked if masd process was running on the secondary ADM appliance by executing the following commands:
Ps -ax | grep masd

masd process within Citrix ADM is the subsystem process responsible for the GUI/UI feature. The masd process was found to be in a stopped state. We tried running the masd start command to start the masd service, but it did not seem to start the service.

Step 4

  • We then restored the secondary Citrix ADM appliance to its default configuration by following steps listed in CTX216121 . The following commands were executed to restore the Citrix ADM appliance to default settings.
masd stop 
killall postgres 
sh /mps/scripts/pgsql/deleteuser.sh 
sh /mps/scripts/pgsql/createuser.sh 
chown -R mpspostgres /var/mps/db_pgsql/ 
rm -rf /var/mps/db_pgsql/data 
su -l mpspostgres -c "sh /mps/scripts/pgsql/initpgsql.sh" 
cp -f /mps/postgresql.conf /var/mps/db_pgsql/data/ 
su -l mpspostgres -c "sh /mps/scripts/pgsql/startpgsql.sh" 
su -l mpspostgres -c "sh /mps/scripts/pgsql/drop_pgsql_db.sh" 
su -l mpspostgres -c "sh /mps/scripts/pgsql/create_pgsql_db.sh" 
su -l mpspostgres -c "sh /mps/scripts/pgsql/drop_pgsql_user.sh" 
su -l mpspostgres -c "sh /mps/scripts/pgsql/create_pgsql_user.sh" 
su -l mpspostgres -c "sh /mps/scripts/pgsql/stoppgsql.sh” 
touch /mpsconfig/.recover 
masd start

Step 5

  • Once the restore was complete the secondary ADM appliance came back up on the network and was accessible through NSROOT credentials. However, the ADM appliance had to be added back to the HA pair as it was restored. We redeployed both ADM servers in high availability pair by going through the steps listed here and both appliances were rebooted.

Step 6

  • Finally, the secondary server was accessible through NSROOT credentials and the database status on the secondary ADM showed as up. However, the console still showed “Database streaming Channel Broken between HA nodes” error. We tried clicking the Sync Database tab under System > Deployment > High Availability Deployment in the ADM GUI, to restore the database, but this did not work. The database streaming channel between the Citrix ADM HA nodes was restored by running the following commands on the CLI for the primary Citrix ADM appliance using the nsrecover credentials:
cd /var/mps/db_pgsql/data/pg_log
chown mpspostgres *
nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP > /var/mps/log/join_streaming_replication_console.log 2>&1 &

To verify the database streaming channel was up, after a couple of hours of running the above commands, the following command was run:

ps -ax | grep -i wal

 

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x