[gpfsug-discuss] Tuning Spectrum Scale AFM for stability?
Venkateswara R Puvvada
vpuvvada at in.ibm.com
Tue Apr 28 12:37:24 BST 2020
Hi,
What is lock down of AFM fileset ? Are the messages in requeued state and
AFM won't replicate any data ? I would recommend opening a ticket by
collecting the logs and internaldump from the gateway node when the
replication is stuck.
You can also try increasing the value of afmAsyncOpWaitTimeout option and
see if this solves the issue.
mmchconfig afmAsyncOpWaitTimeout=3600 -i
~Venkat (vpuvvada at in.ibm.com)
From: Andi Christiansen <andi at christiansen.xxx>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 04/28/2020 12:04 PM
Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for
stability?
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Hi All,
Can anyone share some thoughts on how to tune AFM for stability? at the
moment we have ok performance between our sites (5-8Gbits with 34ms
latency) but we encounter a lock down of the cache fileset from week to
week, which was day to day before we tuned below settings.. is there any
way to tune AFM further i haven't found ?
Cache Site only:
TCP Settings:
sunrpc.tcp_slot_table_entries = 128
Home and Cache:
AFM / GPFS Settings:
maxBufferDescs=163840
afmHardMemThreshold=25G
afmMaxWriteMergeLen=30G
Cache fileset:
Attributes for fileset AFMFILESET:
================================
Status Linked
Path /mnt/fs02/AFMFILESET
Id 1
Root inode 524291
Parent Id 0
Created Tue Apr 14 15:57:43 2020
Comment
Inode space 1
Maximum number of inodes 10000384
Allocated inodes 10000384
Permission change flag chmodAndSetacl
afm-associated Yes
Target nfs://DK_VPN/mnt/fs01/AFMFILESET
Mode single-writer
File Lookup Refresh Interval 30 (default)
File Open Refresh Interval 30 (default)
Dir Lookup Refresh Interval 60 (default)
Dir Open Refresh Interval 60 (default)
Async Delay 15 (default)
Last pSnapId 0
Display Home Snapshots no
Number of Read Threads per Gateway 64
Parallel Read Chunk Size 128
Parallel Read Threshold 1024
Number of Gateway Flush Threads 48
Prefetch Threshold 0 (default)
Eviction Enabled yes (default)
Parallel Write Threshold 1024
Parallel Write Chunk Size 128
Number of Write Threads per Gateway 16
IO Flags 0 (default)
mmfsadm dump afm:
AFM Gateway:
RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072
readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize
2147483648
readBypassThresh 67108864
QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600
Ping thread: Started
Fileset: AFMFILESET 1 (fs02)
mode: single-writer queue: Normal MDS: <c0n1> QMem 0 CTL 577
home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16
handler: Mounted Dirty refCount: 1
queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1
terminate: 0 psnapWait: 0
remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0
avgTime 0,000000 maxTime 0.0
queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0
iwo 0 err 78
handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0
InflightAsyncLookups 0
lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime :
2020-04-27_15:09:57.415+0200
i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728
pReadThreads 64
i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh:
1073741824
i/o: prefetchThresh 0 (Prefetch)
Mnt status: 0:0 1:0 2:0 3:0
Export Map: 10.110.5.10/<c0n0> 10.110.5.11/<c0n1> 10.110.5.12/<c0n2>
10.110.5.13/<c0n9>
Priority Queue: Empty (state: Active)
Normal Queue: Empty (state: Active)
Cluster Config Cache:
maxFilesToCache 131072
maxStatCache 524288
afmDIO 2
afmIOFlags 4096
maxReceiverThreads 32
afmNumReadThreads 64
afmNumWriteThreads 8
afmHardMemThreshold 26843545600
maxBufferDescs 163840
afmMaxWriteMergeLen 32212254720
workerThreads 1024
The entries in the gpfs log states "AFM: Home is taking longer to
respond..." but its only AFM and the Cache AFM fileset which enteres a
locked state. we have the same NFS exports from home mounted on the same
gateway nodes to check when a file is transferred and they are all ok
while the AFM lock is happening. a simple gpfs restart of the AFM Master
node is enough to make AFM restart and continue for another week..
The home target is exported through CES NFS from 4 CES nodes and a map is
created at the Cache site to utilize the ParallelWrites feature.
If there is anyone sitting around with some ideas/knowledge on how to tune
this further for more stability then i would be happy if you could share
your thoughts about it! :-)
Many Thanks in Advance!
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=-XbtU1ILcqI_bUurDD3j1j-oqGszcNZAbQVIhQ5EZOs&s=IjrGy-VdY1cuNfy0bViEykWMEVDax7_xvrMdRhQ2QkM&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200428/ebdec3a2/attachment.htm>
More information about the gpfsug-discuss
mailing list