[gpfsug-discuss] Clarification of mmdiag --iohist output
Sven Oehme
oehmes at gmail.com
Sun Feb 17 18:59:37 GMT 2019
If you run it on the client, it includes local queuing, network as well as NSD Server processing and the actual device I/O time.
if issued on the NSD Server it contains processing and I/O time, the processing shouldn’t really add any overhead but in some cases I have seen it contributing.
If you corelate the client and server iohist outputs you can find the server entry based on the tags in the iohist output, this allows you to see exactly how much time was spend on network vs on the server to rule out network as the problem.
Sven
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Aaron Knister <aaron.knister at gmail.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Sunday, February 17, 2019 at 6:26 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Clarification of mmdiag --iohist output
Hi Kevin,
It's funny you bring this up because I was looking at this yesterday. My belief is that it's the time the from when the I/O request was queued internally by the client to when the I/O response was received from the NSD server which means it absolutely includes the network RTT. It would be great to get formal confirmation of this from someone who knows the code.
Here's some trace data showing a single 4K read from an NSD client. I've stripped out a bunch of uninteresting stuff. It's my belief that the TRACE_IO indicates the point at which the "i/o timer" reported on by mmdiag --iohist begins ticking. The testing data seems to support this. If I'm correct, the testing data shows that the RDMA I/O to the NSD server occurs within the TRACE_IO timing window. The other thing that makes me believe this, is in my testing the mmdiag --iohist on the client shows an average latency of ~230us for a 4K read whereas mmdiag --iohist on the NSD server appears to show an average latency of ~170us when servicing those 4K reads from the back-end disk (a DDN SFA14KX).
0.000218276 37005 TRACE_DISK: doReplicatedRead: da 34:490710888
0.000218424 37005 TRACE_IO: QIO: read data tag 743942 108137 ioVecSize 1 1st buf 0x122F000 nsdName S01_DMD_NSD_034 da 34:490710888 nSectors 8 align 0 by iocMBHandler (DioHandlerThread)
0.000218566 37005 TRACE_DLEASE: checkLeaseForIO: rc 0
0.000218628 37005 TRACE_IO: SIO: ioVecSize 1 1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nSectors 8 doioP 0x1806994D300 isDaemonAddr 0
0.000218672 37005 TRACE_FS: verify4KIO exit: code 4 err 0
0.000219106 37005 TRACE_NSD: nsdDoIO enter: read ioVecSize 1 1st bufAddr 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nBytes 4096 isDaemonAddr 0
0.000219408 37005 TRACE_GCRYPTO: EncBufPool::getTmpBuf(): bsize=4096 code=205 err=0 bufP=0x180656C1058 outBufP=0x180656C1058 index=0
0.000220105 37005 TRACE_TS: sendMessage msg_id 22993695: dest <c0n2> 10.3.17.3 sto03
0.000220436 37005 TRACE_RDMA: verbs::verbsSend_i: enter: index 11 cookie 12 node <c0n2> msg_id 22993695 len 92
0.000221111 37005 TRACE_RDMA: verbsConn::postSend: currRpcHeadP 0x7F6D7FA106E8 sendType SEND_BUFFER_RPC index 11 cookie 12 threadId 37005 bufferId.index 11
0.000221662 37005 TRACE_MUTEX: Waiting on fast condvar for signal 0x7F6D7FA106F8 RdmaSend_NSD_SVC
0.000426716 16691 TRACE_RDMA: handleRecvComp: success 1 of 1 nWrSuccess 1 index 11 cookie 12 wr_id 0xB0E6E00000000 bufferP 0x7F6D7EEE1700 byte_len 4144
0.000432140 37005 TRACE_NSD: nsdDoIO_ReadAndCheck: read complete, len 0 status 6 err 0 bufP 0x180656C1058 dioIsOverRdma 1 ioDataP 0x200000BC000 ckSumType NsdCksum_None
0.000432163 37005 TRACE_NSD: nsdDoIO_ReadAndCheck: exit err 0
0.000433707 37005 TRACE_GCRYPTO: EncBufPool::releaseTmpBuf(): exit bsize=8192 err=0 inBufP=0x180656C1058 bufP=0x180656C1058 index=0
0.000433777 37005 TRACE_NSD: nsdDoIO exit: err 0 0
0.000433844 37005 TRACE_IO: FIO: read data tag 743942 108137 ioVecSize 1 1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nSectors 8 err 0
0.000434236 37005 TRACE_DISK: postIO: qosid A00D91E read data disk FFFFFFFF ioVecSize 1 1st buf 0x122F000 err 0 duration 0.000215000 by iocMBHandler (DioHandlerThread)
I'd suggest looking at "mmdiag --iohist" on the NSD server itself and see if/how that differs from the client. The other thing you could do is see if your NSD server queues are backing up (e.g. "mmfsadm saferdump nsd" and look for "requests pending" on queues where the "active" field is > 0). That doesn't necessarily mean you need to tune your queues but I'd suggest that if the disk I/O on your NSD server looks healthy (e.g. low latency, not overly-taxed) that you could benefit from queue tuning.
-Aaron
On Sat, Feb 16, 2019 at 9:47 AM Buterbaugh, Kevin L <Kevin.Buterbaugh at vanderbilt.edu> wrote:
Hi All,
Been reading man pages, docs, and Googling, and haven’t found a definitive answer to this question, so I knew exactly where to turn… ;-)
I’m dealing with some slow I/O’s to certain storage arrays in our environments … like really, really slow I/O’s … here’s just one example from one of my NSD servers of a 10 second I/O:
08:49:34.943186 W data 30:41615622144 2048 10115.192 srv dm-92 <client IP redacted>
So here’s my question … when mmdiag —iohist tells me that that I/O took slightly over 10 seconds, is that:
1. The time from when the NSD server received the I/O request from the client until it shipped the data back onto the wire towards the client?
2. The time from when the client issued the I/O request until it received the data back from the NSD server?
3. Something else?
I’m thinking it’s #1, but want to confirm. Which one it is has very obvious implications for our troubleshooting steps. Thanks in advance…
Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190217/b270dc02/attachment.htm>
More information about the gpfsug-discuss
mailing list