[gpfsug-discuss] AFM cache rolling upgrade with minimal impact / no directory scan

Wed Aug 26 04:04:09 BST 2020

Billich,

>The cache filesets holds about 500M used inodes. Does a specific 
procedure exist, or is it good enough to just shutdown scale on the node I 
want to update? And maybe flush >the queues first as  far as possible? 

It is recommended to stop (mmafmctl device stop) the filesets and perform 
upgrade if the upgrade duration is short. But if the upgrade procedure 
takes too long,  gateway node can be shutdown, other active gateway 
node(s) runs recovery automatically for the filesets owned by the gateway 
which was shutdown.

>If a fileset has a zero length queue of pending transactions to home, 
will this avoid any policy scan when a second afm node takes 
responsibility for the fileset?

Active gateway node(s) always runs recovery with policy scan even though 
queue length was zero on other gateway node(s), so it is possible that 
recovery on multiple filesets (assuming that in this case 20 filesets) 
trigger at the same time and which may impact the system performance. You 
could limit the  number of parallel recoveries using the 
afmMaxParallelRecoveries option. For example set mmchconfig 
afmMaxParallelRecoveries=5 -i (default 0 means run recovery on all 
filesets parallelly), and reset to default later.

~Venkat (vpuvvada at in.ibm.com)

From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   08/25/2020 08:43 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM cache rolling upgrade with 
minimal impact / no     directory scan
Sent by:        gpfsug-discuss-bounces at spectrumscale.org

Hello,

We will upgrade a pair of AFM cache nodes which serve about 40 SW 
filesets. I want to do a rolling upgrade. I wonder if I can minimize the 
impact of the failover when filesets move to the other afm node.  I can't 
stop replication during the upgrade: The update will take too long (OS, 
mofed, FW, scale) and we want to preserve the ability to recall files (?). 
Mostly I want to avoid policy scans of all inodes on cache  (and maybe 
even lookups of files on home??) 
I can stop replication for a short time. Also the queues most of the time 
are empty or contain just a few 100 entries. The cache filesets holds 
about 500M used inodes. Does a specific procedure exist, or is it good 
enough to just shutdown scale on the node I want to update? And maybe 
flush the queues first as  far as possible? 

If a fileset has a zero length queue of pending transactions to home, will 
this avoid any policy scan when a second afm node takes responsibility for 
the fileset?

Maybe I did already ask this before. Unfortunately the manual isn't as 
explicit as I would prefer when it talks about rolling upgrades.

Thank you,

Heiner

-- 
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200826/d92f398d/attachment.htm>