Check for new replies
Thread Rating:
  • 97 Vote(s) - 2.97 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cluster information expiry and AMQRFDM – utility to generate cluster dump.
#1
Queue managers retain information about a clustered object for 30 days. The information about the objects in a cluster gets refreshed automatically.  This means information about objects shared in the cluster are automatically updated to the full repositories and indirectly to the the partial repositories which subscribe to the clustered object.  

When a queue manager is shared in the cluster, it sends out information about itself to the full repository. The full and partial repository stores this information for 30 days. If the full repository doesn't receive any update regarding the clustered object or the clustered queue manager even after 30 days, the information expires. However the information about the clustered queue manager or the clustered queue is not immediately removed from the repository. It will remain for an additional grace period of 60 days after which the information is completely removed.

Now, you might be wondering why should the information be removed at all?  If at all a queue manager or queue is removed from cluster, it does update the repository to remove all the entries regarding it and the concept of expiring doesn't come at all.

To understand this, consider the below cases:

•         If the queue manager is deleted without it being taken out of the clusters it is part of, the full repository will still think the queue manager is part of the cluster. So it becomes necessary for it to remove the information regarding the removed queue manager from its repository after a period of time and this time is 90 days as explained above.

•         If a queue manager is stopped (which it shouldn’t be for a long period if it is activeSmile) or if the cluster repository process (amqrrmfa) has issues or if the cluster channels are not running   , the full repository won’t receive an update and hence it will remove the information after 90 days.   This is the reason, which is why it is never suggested to stop the queue managers completely in DR regions which is sometimes done unknowingly.

To prevent the information from expiring, queue managers automatically resend all information about themselves to the full repository after 27 days.  And after 30 days if a queue manager finds out that it has not received any update, it still keeps the info for another 60 days before removing the information completely.

Also in some very rare cases, it is possible that a partial repository might not receive any information about a clustered queue from full repository or even a full repository might not receive updates even though the queue is being used actively.  In such cases a refresh cluster is required if you find out that it has not received any updates for 30 days. (I have seen this happening in version 6).

If the ‘refresh cluster’ is not done in such cases, then after 90 days, the applications trying to put or get messages from the clustered queue will get 2085(object not found) error since the info about the object would be completely removed.  

This can still be recovered using ‘refresh cluster’ command since the queue which has expired is still part of the cluster. The queue manager in which refresh cluster is required to be run will depend on where exactly the cluster info is not updated .The cases has been explained below.

•         If full repository itself has not received update about a clustered object, then the queue manager hosting the clustered object has to push the info again to full repository. Hence a refresh cluster is required in the queue manager where the clustered object is defined.
•         If a partial repository which subscribes to the queue has not received update , then refresh cluster is required in that queue manager since it needs to receive fresh updates from full repository again.


Refresh cluster topic has been explained in refresh cluster post. Please go through the post  since it’s important to know the consequences of running the command.

So now comes the question, how do we know whether a clustered queue is expiring? Because if it is expiring and it is not supposed to, then it will result in a problem.

• The MQ error logs always get updated with the info about expiring queues. So it is important for the monitoring tool to track it.
• There is an interesting utility called amqrfdm which comes with every MQ installation. This can be used to see some of the information which cannot be seen anywhere else or with any other utility.

However there is not a lot of documentation available for this command.  

Command –

amqrfdm -m Queuemanager

Result
Code:
#) Summary
q) Print Cluster Queues
T) Print Cluster Topics
S) Print Cluster TopicStrings
s) Print Subscriptions
m) Print Cluster Queue Managers
o) Output entire cache
a) Print Free Areas
r) Print Registrations
d) Dump Area of repository
t) Browse Transmission Queue
D) Disconnect
f) Set Filters (Off)
+) Set Details level
Enter option character                        Q – Quit

To dump the complete cluster info to a file, run the below command.

amqrfdm -m myqmgr < amqrfdm_options.txt > amqrfdm_results.txt

The amqrfdm_options.txt should have the below text.

Code:
+
>

s
q
m
#
t
Q

Scenario: To simulate the expiry of cluster info , Am going to change the system date to a future date which will force the queue managers to think that it has not received cluster info for past 90 days and hence the clustered queue info will be removed.

Setup: QMA, QMB and QMC queue managers are in cluster ACLUSTER with QMA queue manager as full repository.  A local queue ‘TEST’ is shared in cluster in QMC queue manager.


Step 1: Complete the cluster setup.

Cluster setup and the commands to do this is explained in refresh cluster post.

Step 2: Create a queue ‘TEST’ in QMC queue manager and share it in cluster.

Define ql(TEST) cluster(ACLUSTER)

Step 3: Put a blank message from QMB queue manager to clustered queue ‘TEST’ to expose the queue.

Amqsput TEST QMB
Runmqsc QMB
Dis qc(*)


[Image: 2mk0hLT.png]

Step 4: Run amqrfdm to check the last update of the queue ‘TEST’ and the expiry date.

Amqrfdm –m QMB
Type +
Type T  
 

Note: Capital ‘T’ displays the time of cluster update.

[Image: ENR6ZDK.png]

Hit enter key
Type q  

Note: q is to display clustered queues

Type m
Note: q is to display clustered queue managers

[Image: Xedj6eN.png]

Observe the last update and expiry date for the clustered queue ‘TEST’ and Clustered queue managers.
Observe that the queue expires exactly 30 days from the last update.

Step 5:  Change the system date to a future date > 90 days.

[Image: xhgEMvr.png]

Step 6: Put a message from QMB queue manager to clustered queue ‘TEST’ to confirm that the entry has been removed from its repository.

Note: It might need a restart for the queue managers to take into account the new time.  I had to restart QMB queue manager. This probably triggered the repository queue manager to take the new time and both QMB and QMA queue managers lost the entry for ‘TEST’ clustered queue.

Amqsput TEST QMB
Runmqsc QMB
Dis qc(*)


[Image: DnLuFkW.png]

Observe that we get 2085 error, which means that the queue is not found and hence we can infer that it is removed from its local repository.

Step 7:  Run amqrfdm to check the clustered queue ‘TEST’.

Follow the procedure given in step 4.

[Image: MWh1DkO.png]

Since the queue info has been removed, you can observe that amqrfdm gives a blank result for clustered queues (option q)

Step 8: Resolve the issue by running ‘REFRESH CLUSTER’ command

Since full repository itself has removed info about the clustered queue, Refresh cluster is required from QMC queue manager to propagate the cluster info to the full repository QMA once again.

Runmqsc QMC
Refresh cluster(ACLUSTER)


Step 9: Put a blank message from QMB queue manager to clustered queue ‘TEST’ to confirm that the clustered queue ‘TEST’ is visible again to QMB queue manager.

Amqsput TEST QMB
Runmqsc QMB
Dis qc(*)


Since the full repository QMA has the info about ‘TEST’ queue again (which it had removed), it is now visible in QMB queue manager.

Cheers,
Vinyas
Reply
#2
This is an excellent information and explained very well.Thank You for this
Reply
#3
Thank you vinay for your valuable information. Can you please provide information, what are the troubleshooting steps we need to take when the cluster sender channel gone to retying state....?
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)