[gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x
IBM Spectrum Scale
scale at us.ibm.com
Wed Aug 21 18:46:40 BST 2019
We do appreciate the feedback. Since Spectrum Scale is a cluster based
solution we do not consider the failure of a single node significant since
the cluster will adjust to the loss of the node and access to the file
data is not lost. It seems in this specific instance this problem was
having a more significant impact in your environment. Presumably you have
installed the available fix and are no longer encountering the problem.
Regards, The Spectrum Scale (GPFS) team
------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.
The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 08/21/2019 01:34 PM
Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on
dirop.C:4548 in GPFS 5.0.2.x
Sent by: gpfsug-discuss-bounces at spectrumscale.org
If there is any means for feedback, I really think that anything that
causes a crash of mmfsd is absolutely cause to send a notice. Regardless
of data corruption, it makes the software unusable in production under
certain circumstances. There was a large customer impact at our site. We
have a reproducible case if it is useful. One customer workload crashed
every time, though it took almost a full day to get to that point so you
can imagine the time wasted.
> On Aug 21, 2019, at 1:20 PM, IBM Spectrum Scale <scale at us.ibm.com>
wrote:
>
> To my knowledge there has been no notification sent regarding this
problem. Generally we only notify customers about problems that impact
file system data corruption or data loss. This problem does cause the
GPFS instance to abort and restart (assert) but it does not impact file
system data. It seems in your case you may have been encountering the
problem frequently.
>
> Regards, The Spectrum Scale (GPFS) team
>
>
------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of Spectrum
Scale (GPFS), then please post it to the public IBM developerWroks Forum
at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
>
> If your query concerns a potential software error in Spectrum Scale
(GPFS) and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.
>
> The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From: Ryan Novosielski <novosirj at rutgers.edu>
> To: gpfsug main discussion list
<gpfsug-discuss at spectrumscale.org>
> Date: 08/21/2019 01:14 PM
> Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6
on dirop.C:4548 in GPFS 5.0.2.x
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>
>
>
> Has there been any official notification of this one? I can’t see
anything about it anyplace other than in my support ticket.
>
> --
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State | Ryan Novosielski -
novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
Campus
> || \\ of NJ | Office of Advanced Research Computing
- MSB C630, Newark
> `'
>
> > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale <scale at us.ibm.com>
wrote:
> >
> > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release
stream. Regarding the version number format of 5.0.2.0/1 I assume that it
is meant to convey version 5.0.2 efix 1.
> >
> > Regards, The Spectrum Scale (GPFS) team
> >
> >
------------------------------------------------------------------------------------------------------------------
> > If you feel that your question can benefit other users of Spectrum
Scale (GPFS), then please post it to the public IBM developerWroks Forum
at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
> >
> > If your query concerns a potential software error in Spectrum Scale
(GPFS) and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.
> >
> > The forum is informally monitored as time permits and should not be
used for priority messages to the Spectrum Scale (GPFS) team.
> >
> >
> >
> > From: Ryan Novosielski <novosirj at rutgers.edu>
> > To: gpfsug main discussion list
<gpfsug-discuss at spectrumscale.org>
> > Date: 08/21/2019 12:04 PM
> > Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on
dirop.C:4548 in GPFS 5.0.2.x
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >
> >
> >
> > I posted this on Slack, but it’s serious enough that I want to make
sure everyone sees it. Does anyone, from IBM or otherwise, have any more
information about this/whether it was even announced anyplace? Thanks!
> >
> > A little late, but we ran into a relatively serious problem at our
site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault
related to fs/dirop.C:4548. We ran into this sporadically, but it was
repeatable on the problem workload. From IBM Support:
> >
> > 2. This is a known defect.
> > The problem has been fixed through
> > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock
> > A companion fix is
> > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough
> >
> >
> > The rep further said "It's not an APAR since it's found in internal
testing. It's an internal function at a place it should not assert but a
part of the condition as the code path is specific to the
DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain
file creation code path, but the condition wasn't set strictly for that
code path that some other code path could also run into the assert. So we
cannot predict on which node it would happen.”
> >
> > The fix was setting disableAssert="dirop.C:4548, which can be done
live. Anyone seen anything else about this anyplace? The bug is fixed in
5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number
means; I’ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others).
> >
> > --
> > ____
> > || \\UTGERS,
|---------------------------*O*---------------------------
> > ||_// the State | Ryan Novosielski -
novosirj at rutgers.edu
> > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
Campus
> > || \\ of NJ | Office of Advanced Research
Computing - MSB C630, Newark
> > `'
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> >
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e=
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> >
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e=
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e=
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e=
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190821/b5869c6f/attachment.htm>
More information about the gpfsug-discuss
mailing list