[gpfsug-discuss] Protection against silent data corruption
Stephen Ulmer
ulmer at ulmer.org
Thu Jun 2 18:55:50 BST 2022
This only adds a checksum to the NSD wire protocol. The question was about detecting data corruption at rest.
--
Stephen
> On Jun 2, 2022, at 1:01 PM, Achim Rehor <Achim.Rehor at de.ibm.com> wrote:
>
> hi Stephan,
>
> there is, see mmchconfig man page :
>
> nsdCksumTraditional
> This attribute enables checksum data-integrity checking between a traditional NSD client node and its NSD server. Valid values are yes and no. The default value is no.
> (Traditional in this context means that the NSD client and server are configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID.
> The latter is a component of IBM Elastic Storage Server (ESS) and of IBM GPFS Storage Server (GSS).)
>
> The checksum procedure detects any corruption by the network of the data in the NSD RPCs that are exchanged between the NSD client and the
> server. A checksum error triggers a request to retransmit the message.
>
> When this attribute is enabled on a client node, the client indicates in each of its requests to the server that it is using checksums. The server uses checksums only in
> response to client requests in which the indicator is set. A client node that accesses a file system that belongs to another cluster can use checksums in the same way.
>
> You can change the value of the this attribute for an entire cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes.
>
> Note:
> * Enabling this feature can result in significant I/O performance degradation and a considerable increase in CPU usage.
>
> * To enable checksums for a subset of the nodes in a cluster, issue a command like the following one:
> mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>
>
> The -N flag is valid for this attribute.
>
> --
> Mit freundlichen Grüßen / Kind regards
>
> Achim Rehor
>
> Technical Support Specialist Spectrum Scale and ESS (SME)
> Advisory Product Services Professional
> IBM Systems Storage Support - EMEA
>
> Achim.Rehor at de.ibm.com <mailto:Achim.Rehor at de.ibm.com> +49-170-4521194
>
> IBM Deutschland GmbH
> Vorsitzender des Aufsichtsrats: Sebastian Krause
> Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer,
> Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
> Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
> -----Original Message-----
> From: Stephan Graf <st.graf at fz-juelich.de <mailto:Stephan%20Graf%20%3cst.graf at fz-juelich.de%3e>>
> Reply-To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org <mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e>>
> To: gpfsug-discuss <gpfsug-discuss at gpfsug.org <mailto:gpfsug-discuss%20%3cgpfsug-discuss at gpfsug.org%3e>>
> Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data corruption
> Date: Thu, 02 Jun 2022 16:31:43 +0200
>
> Hi,
>
> I am wondering if there is an option in SS to enable some checking to
> detect silent data corruption.
>
> Form GNR I know that there is End-to-End integrity. So a checksum is
> stored in addition.
>
> The background is that we are facing an issue where in some files (which
> have data replication = 2) the mmrestripefile is reporting, that one
> block is mismatching it's copy (the storage cluster is running SS
> without GNR).
> We have validated that the copied block is fine, but the original one is
> broken (and this is what is returned on read access).
> SS right now in our installation is unable to determine which is the
> correct one.
> Is there any option to enable this kind of feature in SS? If not, does
> it make sense to create an "IDEA" for it?
>
> Stephan
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220602/ef1a4a34/attachment.htm>
More information about the gpfsug-discuss
mailing list