[gpfsug-discuss] IO sizes
Uwe Falke
uwe.falke at kit.edu
Wed Feb 23 18:26:50 GMT 2022
Dear all,
sorry for asking a question which seems not directly GPFS related:
In a setup with 4 NSD servers (old-style, with storage controllers in
the back end), 12 clients and 10 Seagate storage systems, I do see in
benchmark tests that just one of the NSD servers does send smaller IO
requests to the storage than the other 3 (that is, both reads and
writes are smaller).
The NSD servers form 2 pairs, each pair is connected to 5 seagate boxes
( one server to the controllers A, the other one to controllers B of the
Seagates, resp.).
All 4 NSD servers are set up similarly:
kernel: 3.10.0-1160.el7.x86_64 #1 SMP
HBA: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
driver : mpt3sas 31.100.01.00
max_sectors_kb=8192 (max_hw_sectors_kb=16383 , not 16384, as limited by
mpt3sas) for all sd devices and all multipath (dm) devices built on top.
scheduler: deadline
multipath (actually we do have 3 paths to each volume, so there is some
asymmetry, but that should not affect the IOs, shouldn't it?, and if it
did we would see the same effect in both pairs of NSD servers, but we do
not).
All 4 storage systems are also configured the same way (2 disk groups /
pools / declustered arrays, one managed by ctrl A, one by ctrl B, and
8 volumes out of each; makes altogether 2 x 8 x 10 = 160 NSDs).
GPFS BS is 8MiB , according to iohistory (mmdiag) we do see clean IO
requests of 16384 disk blocks (i.e. 8192kiB) from GPFS.
The first question I have - but that is not my main one: I do see, both
in iostat and on the storage systems, that the default IO requests are
about 4MiB, not 8MiB as I'd expect from above settings (max_sectors_kb
is really in terms of kiB, not sectors, cf.
https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt).
But what puzzles me even more: one of the server compiles IOs even
smaller, varying between 3.2MiB and 3.6MiB mostly - both for reads and
writes ... I just cannot see why.
I have to suspect that this will (in writing to the storage) cause
incomplete stripe writes on our erasure-coded volumes (8+2p)(as long as
the controller is not able to re-coalesce the data properly; and it
seems it cannot do it completely at least)
If someone of you has seen that already and/or knows a potential
explanation I'd be glad to learn about.
And if some of you wonder: yes, I (was) moved away from IBM and am now
at KIT.
Many thanks in advance
Uwe
--
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Scientific Data Management (SDM)
Uwe Falke
Hermann-von-Helmholtz-Platz 1, Building 442, Room 187
D-76344 Eggenstein-Leopoldshafen
Tel: +49 721 608 28024
Email: uwe.falke at kit.edu
www.scc.kit.edu
Registered office:
Kaiserstraße 12, 76131 Karlsruhe, Germany
KIT – The Research University in the Helmholtz Association
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5814 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220223/455e0cbc/attachment.bin>
More information about the gpfsug-discuss
mailing list