[gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions
Nathan Falk
nfalk at us.ibm.com
Mon Feb 10 14:57:13 GMT 2020
Hello Walter,
If you anticipate that the number of clients accessing this file system
may grow as high as 5000, then that is probably the value you should use
when creating the file system. The data structures (regions for example)
are allocated at file system creation time (more precisely at storage pool
creation time) and are not changed later.
The mmcrfs doc explains this:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm
-n NumNodes
The estimated number of nodes that will mount the file system in the local
cluster and all remote clusters. This is used as a best guess for the
initial size of some file system data structures. The default is 32. This
value can be changed after the file system has been created but it does
not change the existing data structures. Only the newly created data
structure is affected by the new value. For example, new storage pool.
When you create a GPFS file system, you might want to overestimate the
number of nodes that will mount the file system. GPFS uses this
information for creating data structures that are essential for achieving
maximum parallelism in file system operations (For more information, see
GPFS architecture ). If you are sure there will never be more than 64
nodes, allow the default value to be applied. If you are planning to add
nodes to your system, you should specify a number larger than the default.
Thanks,
Nate Falk
IBM Spectrum Scale Level 2 Support
Software Defined Infrastructure, IBM Systems
Phone: 1-720-349-9538 | Mobile: 1-845-546-4930
E-mail: nfalk at us.ibm.com
Find me on:
From: Walter Sklenka <Walter.Sklenka at EDV-Design.at>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 02/09/2020 04:59 AM
Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other
commands long running // influence of n and B on number of regions
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Hi!
At the time of writing we set N to 1200 , but we are not sure if it would
be better to set to overestimated 5000 ?
We use 6 backend nodes
The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We
will finally use 2 filesystems : data and home
Fs “data” consist of 12 metadada-nsd and 72 dataonly nsds
We have enough space to add nsds (finally the fs
[root at nsd75-01 ~]# mmlspool data
Storage pools in file system at '/gpfs/data':
Name Id BlkSize Data Meta Total Data in (KB) Free
Data in (KB) Total Meta in (KB) Free Meta in (KB)
system 0 4 MB no yes 0 0 ( 0%)
12884901888 12800315392 ( 99%)
saspool 65537 4 MB yes no 1082331758592
1082326446080 (100%) 0 0 ( 0%)
[root at nsd75-01 ~]# mmlsfs data
flag value description
------------------- ------------------------
-----------------------------------
-f 8192 Minimum fragment (subblock)
size in bytes
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 1 Default number of metadata
replicas
-M 2 Maximum number of metadata
replicas
-r 1 Default number of data
replicas
-R 2 Maximum number of data
replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in
effect
-k all ACL semantics in effect
-n 1200 Estimated number of nodes that
will mount file system
-B 4194304 Block size
-Q user;group;fileset Quotas accounting enabled
user;group;fileset Quotas enforced
fileset Default quotas enabled
--perfileset-quota Yes Per-fileset quota enforcement
--filesetdf Yes Fileset df enabled?
-V 21.00 (5.0.3.0) File system version
--create-time Fri Feb 7 15:32:05 2020 File system creation time
-z No Is DMAPI enabled?
-L 33554432 Logfile size
-E Yes Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation
option
--fastea Yes Fast external attributes
enabled?
--encryption No Encryption enabled?
--inode-limit 1342177280 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 512 Number of subblocks per full
block
-P system;saspool Disk storage pools in file
system
--file-audit-log No File Audit Logging enabled?
--maintenance-mode No Maintenance Mode enabled?
-d
de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06;
-d
de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06;
-d
de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06;
-d
de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06;
-d
de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06;
-d
de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06;
-d
globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11
Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /gpfs/data Default mount point
--mount-priority 0 Mount priority
##
For fs Home we use 24 dataAdnMetadata disks only on flash
[root at nsd75-01 ~]# mmlspool home
Storage pools in file system at '/gpfs/home':
Name Id BlkSize Data Meta Total Data in (KB) Free
Data in (KB) Total Meta in (KB) Free Meta in (KB)
system 0 1024 KB yes yes 25769803776 25722931200
(100%) 25769803776 25722981376 (100%)
[root at nsd75-01 ~]#
[root at nsd75-01 ~]# mmlsfs home
flag value description
------------------- ------------------------
-----------------------------------
-f 8192 Minimum fragment (subblock)
size in bytes
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 1 Default number of metadata
replicas
-M 2 Maximum number of metadata
replicas
-r 1 Default number of data
replicas
-R 2 Maximum number of data
replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in
effect
-k all ACL semantics in effect
-n 1200 Estimated number of nodes that
will mount file system
-B 1048576 Block size
-Q user;group;fileset Quotas accounting enabled
user;group;fileset Quotas enforced
fileset Default quotas enabled
--perfileset-quota Yes Per-fileset quota enforcement
--filesetdf Yes Fileset df enabled?
-V 21.00 (5.0.3.0) File system version
--create-time Fri Feb 7 15:31:28 2020 File system creation time
-z No Is DMAPI enabled?
-L 33554432 Logfile size
-E Yes Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation
option
--fastea Yes Fast external attributes
enabled?
--encryption No Encryption enabled?
--inode-limit 25166080 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 128 Number of subblocks per full
block
-P system Disk storage pools in file
system
--file-audit-log No File Audit Logging enabled?
--maintenance-mode No Maintenance Mode enabled?
-d
home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9
Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /gpfs/home Default mount point
--mount-priority 0 Mount priority
[root at nsd75-01 ~]#
Mit freundlichen Grüßen
Walter Sklenka
Technical Consultant
EDV-Design Informationstechnologie GmbH
Giefinggasse 6/1/2, A-1210 Wien
Tel: +43 1 29 22 165-31
Fax: +43 1 29 22 165-90
E-Mail: sklenka at edv-design.at
Internet: www.edv-design.at
Von: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von José Filipe
Higino
Gesendet: Saturday, February 8, 2020 1:00 PM
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running
// influence of n and B on number of regions
How many back end nodes for that cluster? and how many filesystems for
that same access... and how many pools for the same data access type (12
ndisks sounds very LOW to me, for that size of a cluster, probably no
other filesystem can do more than that). On GPFS there are so many
different ways to access the data, that is sometimes hard to start a
conversation. And you did a very great job of introducing it. =)
We (I am a customer too) do not have that many nodes, but from experience,
I know some clusters (and also multicluster configs) depend mostly on how
much metadata you can service in the network and how fast (latency wise)
you can do it, to accommodate such amount of nodes. There is never design
by the book that can safely tell something will work 100% times. But the
beauty of it is that GPFS allows lots of aspects to be resized at your
convenience to facilitate what you need most the system to do.
Let us know more...
On Sun, 9 Feb 2020 at 00:40, Walter Sklenka <Walter.Sklenka at edv-design.at>
wrote:
Hello!
We are designing two fs where we cannot anticipate if there will be 3000,
or maybe 5000 or more nodes totally accessing these filesystems
What we saw, was that execution time of mmdf can last 5-7min
We openend a case and they said, that during such commands like mmdf or
also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is
the reason why it takes so long
The technichian also said, that it is “rule of thumb” that there should be
(-n)*32 regions , this would then be enough ( N=5000 à 160000 regions per
pool ?)
(also Block size has influence on regions ?)
#mmfsadm saferdump stripe
Gives the regions number
storage pools: max 8
alloc map type 'scatter'
0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0
thinProvision reserved inode -1, reserved nBlocks 0
regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock
size 8192
We also saw when creating the filesystem with a speciicic (-n) very high
(5000) (where mmdf execution time was some minutes) and then changing
(-n) to a lower value this does not influence the behavior any more
My question is: Is the rule (Number of Nodes)x5000 for number of regios in
a pool an good estimation ,
Is it better to overestimate the number of Nodes (lnger running commands)
or is it unrealistic to get into problems when not reaching the regions
number calculated ?
Does anybody have experience with high number of nodes (>>3000) and how
to design the filesystems for such large clusters ?
Thank you very much in advance !
Mit freundlichen Grüßen
Walter Sklenka
Technical Consultant
EDV-Design Informationstechnologie GmbH
Giefinggasse 6/1/2, A-1210 Wien
Tel: +43 1 29 22 165-31
Fax: +43 1 29 22 165-90
E-Mail: sklenka at edv-design.at
Internet: www.edv-design.at
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=bgNFbl7WeRbpQtvfu8K1GC1HVGofxoeEehWJXVM6H0c&s=BRQWKQ--3xw8g_2o9-RD-XsRdMon6iIy31iSstzRRAw&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200210/4f964200/attachment.htm>
More information about the gpfsug-discuss
mailing list