[gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN
Jan-Frode Myklebust
janfrode at tanso.net
Fri Jun 5 13:58:39 BST 2020
Could maybe be interesting to drop the NSD servers, and let all nodes
access the storage via srp ?
Maybe turn off readahead, since it can cause performance degradation when
GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead always
reads too much. This might be the cause of the slow read seen — maybe
you’ll also overflow it if reading from both NSD-servers at the same time?
Plus.. it’s always nice to give a bit more pagepool to hhe clients than the
default.. I would prefer to start with 4 GB.
-jf
fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco <giovanni.bracco at enea.it>:
> In our lab we have received two storage-servers, Super micro
> SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
> controller (2 GB cache) and before putting them in production for other
> purposes we have setup a small GPFS test cluster to verify if they can
> be used as storage (our gpfs production cluster has the licenses based
> on the NSD sockets, so it would be interesting to expand the storage
> size just by adding storage-servers in a infiniband based SAN, without
> changing the number of NSD servers)
>
> The test cluster consists of:
>
> 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale each.
> 2) a Mellanox FDR switch used as a SAN switch
> 3) a Truescale QDR switch as GPFS cluster switch
> 4) two GPFS clients (Supermicro AMD nodes) one port QDR each.
>
> All the nodes run CentOS 7.7.
>
> On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
> configured and it is exported via infiniband as an iSCSI target so that
> both appear as devices accessed by the srp_daemon on the NSD servers,
> where multipath (not really necessary in this case) has been configured
> for these two LIO-ORG devices.
>
> GPFS version 5.0.4-0 has been installed and the RDMA has been properly
> configured
>
> Two NSD disk have been created and a GPFS file system has been configured.
>
> Very simple tests have been performed using lmdd serial write/read.
>
> 1) storage-server local performance: before configuring the RAID6 volume
> as NSD disk, a local xfs file system was created and lmdd write/read
> performance for 100 GB file was verified to be about 1 GB/s
>
> 2) once the GPFS cluster has been created write/read test have been
> performed directly from one of the NSD server at a time:
>
> write performance 2 GB/s, read performance 1 GB/s for 100 GB file
>
> By checking with iostat, it was observed that the I/O in this case
> involved only the NSD server where the test was performed, so when
> writing, the double of base performances was obtained, while in reading
> the same performance as on a local file system, this seems correct.
> Values are stable when the test is repeated.
>
> 3) when the same test is performed from the GPFS clients the lmdd result
> for a 100 GB file are:
>
> write - 900 MB/s and stable, not too bad but half of what is seen from
> the NSD servers.
>
> read - 30 MB/s to 300 MB/s: very low and unstable values
>
> No tuning of any kind in all the configuration of the involved system,
> only default values.
>
> Any suggestion to explain the very bad read performance from a GPFS
> client?
>
> Giovanni
>
> here are the configuration of the virtual drive on the storage-server
> and the file system configuration in GPFS
>
>
> Virtual drive
> ==============
>
> Virtual Drive: 2 (Target Id: 2)
> Name :
> RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
> Size : 81.856 TB
> Sector Size : 512
> Is VD emulated : Yes
> Parity Size : 18.190 TB
> State : Optimal
> Strip Size : 256 KB
> Number Of Drives : 11
> Span Depth : 1
> Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy : Disabled
>
>
> GPFS file system from mmlsfs
> ============================
>
> mmlsfs vsd_gexp2
> flag value description
> ------------------- ------------------------
> -----------------------------------
> -f 8192 Minimum fragment
> (subblock) size in bytes
> -i 4096 Inode size in bytes
> -I 32768 Indirect block size in bytes
> -m 1 Default number of metadata
> replicas
> -M 2 Maximum number of metadata
> replicas
> -r 1 Default number of data
> replicas
> -R 2 Maximum number of data
> replicas
> -j cluster Block allocation type
> -D nfs4 File locking semantics in
> effect
> -k all ACL semantics in effect
> -n 512 Estimated number of nodes
> that will mount file system
> -B 1048576 Block size
> -Q user;group;fileset Quotas accounting enabled
> user;group;fileset Quotas enforced
> none Default quotas enabled
> --perfileset-quota No Per-fileset quota enforcement
> --filesetdf No Fileset df enabled?
> -V 22.00 (5.0.4.0) File system version
> --create-time Fri Apr 3 19:26:27 2020 File system creation time
> -z No Is DMAPI enabled?
> -L 33554432 Logfile size
> -E Yes Exact mtime mount option
> -S relatime Suppress atime mount option
> -K whenpossible Strict replica allocation
> option
> --fastea Yes Fast external attributes
> enabled?
> --encryption No Encryption enabled?
> --inode-limit 134217728 Maximum number of inodes
> --log-replicas 0 Number of log replicas
> --is4KAligned Yes is4KAligned?
> --rapid-repair Yes rapidRepair enabled?
> --write-cache-threshold 0 HAWC Threshold (max 65536)
> --subblocks-per-full-block 128 Number of subblocks per
> full block
> -P system Disk storage pools in file
> system
> --file-audit-log No File Audit Logging enabled?
> --maintenance-mode No Maintenance Mode enabled?
> -d nsdfs4lun2;nsdfs5lun2 Disks in file system
> -A yes Automatic mount option
> -o none Additional mount options
> -T /gexp2 Default mount point
> --mount-priority 0 Mount priority
>
>
> --
> Giovanni Bracco
> phone +39 351 8804788
> E-mail giovanni.bracco at enea.it
> WWW http://www.afs.enea.it/bracco
>
>
> ==================================================
>
> Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle
> persone indicate e la casella di posta elettronica da cui e' stata inviata
> e' da qualificarsi quale strumento aziendale.
> La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza
> di queste informazioni sono rigorosamente vietate (art. 616 c.p, D.Lgs. n.
> 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
> Qualora abbiate ricevuto questo documento per errore siete cortesemente
> pregati di darne immediata comunicazione al mittente e di provvedere alla
> sua distruzione. Grazie.
>
> This e-mail and any attachments is confidential and may contain privileged
> information intended for the addressee(s) only.
> Dissemination, copying, printing or use by anybody else is unauthorised
> (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent amendments and GDPR UE
> 2016/679).
> If you are not the intended recipient, please delete this message and any
> attachments and advise the sender by return e-mail. Thanks.
>
> ==================================================
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200605/78af4f29/attachment.htm>
More information about the gpfsug-discuss
mailing list