From st.graf at fz-juelich.de Thu Jun 2 15:31:43 2022 From: st.graf at fz-juelich.de (Stephan Graf) Date: Thu, 2 Jun 2022 16:31:43 +0200 Subject: [gpfsug-discuss] Protection against silent data corruption Message-ID: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> Hi, I am wondering if there is an option in SS to enable some checking to detect silent data corruption. Form GNR I know that there is End-to-End integrity. So a checksum is stored in addition. The background is that we are facing an issue where in some files (which have data replication = 2) the mmrestripefile is reporting, that one block is mismatching it's copy (the storage cluster is running SS without GNR). We have validated that the copied block is fine, but the original one is broken (and this is what is returned on read access). SS right now in our installation is unable to determine which is the correct one. Is there any option to enable this kind of feature in SS? If not, does it make sense to create an "IDEA" for it? Stephan -- Stephan Graf Juelich Supercomputing Centre Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From Achim.Rehor at de.ibm.com Thu Jun 2 18:01:06 2022 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Thu, 2 Jun 2022 17:01:06 +0000 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> Message-ID: hi Stephan, there is, see mmchconfig man page : nsdCksumTraditional This attribute enables checksum data-integrity checking between a traditional NSD client node and its NSD server. Valid values are yes and no. The default value is no. (Traditional in this context means that the NSD client and server are configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID. The latter is a component of IBM Elastic Storage Server (ESS) and of IBM GPFS Storage Server (GSS).) The checksum procedure detects any corruption by the network of the data in the NSD RPCs that are exchanged between the NSD client and the server. A checksum error triggers a request to retransmit the message. When this attribute is enabled on a client node, the client indicates in each of its requests to the server that it is using checksums. The server uses checksums only in response to client requests in which the indicator is set. A client node that accesses a file system that belongs to another cluster can use checksums in the same way. You can change the value of the this attribute for an entire cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes. Note: * Enabling this feature can result in significant I/O performance degradation and a considerable increase in CPU usage. * To enable checksums for a subset of the nodes in a cluster, issue a command like the following one: mmchconfig nsdCksumTraditional=yes -i -N The -N flag is valid for this attribute. -- Mit freundlichen Gr??en / Kind regards Achim Rehor Technical Support Specialist S?pectrum Scale and ESS (SME) Advisory Product Services Professional IBM Systems Storage Support - EMEA Achim.Rehor at de.ibm.com +49-170-4521194 IBM Deutschland GmbH Vorsitzender des Aufsichtsrats: Sebastian Krause Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, Gabriele Schwarenthorer, Christine Rupp, Frank Theisen Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -----Original Message----- From: Stephan Graf > Reply-To: gpfsug main discussion list > To: gpfsug-discuss > Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data corruption Date: Thu, 02 Jun 2022 16:31:43 +0200 Hi, I am wondering if there is an option in SS to enable some checking to detect silent data corruption. Form GNR I know that there is End-to-End integrity. So a checksum is stored in addition. The background is that we are facing an issue where in some files (which have data replication = 2) the mmrestripefile is reporting, that one block is mismatching it's copy (the storage cluster is running SS without GNR). We have validated that the copied block is fine, but the original one is broken (and this is what is returned on read access). SS right now in our installation is unable to determine which is the correct one. Is there any option to enable this kind of feature in SS? If not, does it make sense to create an "IDEA" for it? Stephan _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Jun 2 18:55:50 2022 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 2 Jun 2022 13:55:50 -0400 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> Message-ID: <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> This only adds a checksum to the NSD wire protocol. The question was about detecting data corruption at rest. -- Stephen > On Jun 2, 2022, at 1:01 PM, Achim Rehor wrote: > > hi Stephan, > > there is, see mmchconfig man page : > > nsdCksumTraditional > This attribute enables checksum data-integrity checking between a traditional NSD client node and its NSD server. Valid values are yes and no. The default value is no. > (Traditional in this context means that the NSD client and server are configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID. > The latter is a component of IBM Elastic Storage Server (ESS) and of IBM GPFS Storage Server (GSS).) > > The checksum procedure detects any corruption by the network of the data in the NSD RPCs that are exchanged between the NSD client and the > server. A checksum error triggers a request to retransmit the message. > > When this attribute is enabled on a client node, the client indicates in each of its requests to the server that it is using checksums. The server uses checksums only in > response to client requests in which the indicator is set. A client node that accesses a file system that belongs to another cluster can use checksums in the same way. > > You can change the value of the this attribute for an entire cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes. > > Note: > * Enabling this feature can result in significant I/O performance degradation and a considerable increase in CPU usage. > > * To enable checksums for a subset of the nodes in a cluster, issue a command like the following one: > mmchconfig nsdCksumTraditional=yes -i -N > > The -N flag is valid for this attribute. > > -- > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > Technical Support Specialist S?pectrum Scale and ESS (SME) > Advisory Product Services Professional > IBM Systems Storage Support - EMEA > > Achim.Rehor at de.ibm.com +49-170-4521194 > > IBM Deutschland GmbH > Vorsitzender des Aufsichtsrats: Sebastian Krause > Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, > Gabriele Schwarenthorer, Christine Rupp, Frank Theisen > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > -----Original Message----- > From: Stephan Graf > > Reply-To: gpfsug main discussion list > > To: gpfsug-discuss > > Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data corruption > Date: Thu, 02 Jun 2022 16:31:43 +0200 > > Hi, > > I am wondering if there is an option in SS to enable some checking to > detect silent data corruption. > > Form GNR I know that there is End-to-End integrity. So a checksum is > stored in addition. > > The background is that we are facing an issue where in some files (which > have data replication = 2) the mmrestripefile is reporting, that one > block is mismatching it's copy (the storage cluster is running SS > without GNR). > We have validated that the copied block is fine, but the original one is > broken (and this is what is returned on read access). > SS right now in our installation is unable to determine which is the > correct one. > Is there any option to enable this kind of feature in SS? If not, does > it make sense to create an "IDEA" for it? > > Stephan > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdmaestas at us.ibm.com Fri Jun 3 21:19:25 2022 From: cdmaestas at us.ibm.com (Christopher Maestas) Date: Fri, 3 Jun 2022 20:19:25 +0000 Subject: [gpfsug-discuss] Spectrum Scale 5.1.4 release notes items! Message-ID: Hello everyone! I know I spoke to some of you at ISC 2022 this week about some of these features. They are officially out! Check out: https://www.ibm.com/docs/en/spectrum-scale/5.1.4?topic=summary-changes Summary of changes This topic summarizes changes to the IBM Spectrum Scale licensed program and the IBM Spectrum Scale library. Within each topic, these markers ( ) surrounding text or illustrations indicate technical changes or additions that are made to the previous edition of the information. www.ibm.com Particularly: --- Control fileset access for remote clusters Administrators can now configure access to remote cluster nodes for only a subset of filesets instead of the entire file system. For more information, see Fileset access control for remote clusters. Increase in the number of independent filesets In IBM Spectrum Scale the maximum number of independent filesets is increased from 1000 to 3000. --- We'll talk further about this at the Scale user group in a few weeks in London! -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Fri Jun 3 22:44:22 2022 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 3 Jun 2022 23:44:22 +0200 Subject: [gpfsug-discuss] Spectrum Scale 5.1.4 release notes items! In-Reply-To: References: Message-ID: Hello, nice to see that only file set can be exported now. We are running Kubernetes platform together with Spectrum Scale. Beside K8s, we have also HPC clusters using GPFS/NFS exports. We would like to integrate storage from HPC to K8s and vice versa. Currently, this is a problem because in K8s almost all users are using UID 1000 for running pods while in HPC they have different UIDs. As far as I know, there is no possibility to remap UIDs between K8s and HPC on the same Spectrum Scale file system. Running pods with different UIDs is hard option as many containers assume, they run exactly as UID 1000. What do you think, is there anything that can be done here? On Fri, Jun 03, 2022 at 08:19:25PM +0000, Christopher Maestas wrote: > Hello everyone! > > I know I spoke to some of you at ISC 2022 this week about some of these features. They are officially out! > > Check out: https://www.ibm.com/docs/en/spectrum-scale/5.1.4?topic=summary-changes > Summary of changes > This topic summarizes changes to the IBM Spectrum Scale licensed program and the IBM Spectrum Scale library. Within each topic, these markers ( ) surrounding text or illustrations indicate technical changes or additions that are made to the previous edition of the information. > www.ibm.com > > Particularly: > --- > > Control fileset access for remote clusters > Administrators can now configure access to remote cluster nodes for only a subset of filesets instead of the entire file system. For more information, see Fileset access control for remote clusters. > > Increase in the number of independent filesets > In IBM Spectrum Scale the maximum number of independent filesets is increased from 1000 to 3000. > --- > > We'll talk further about this at the Scale user group in a few weeks in London! > > -Chris > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jcatana at gmail.com Fri Jun 3 22:51:48 2022 From: jcatana at gmail.com (Josh Catana) Date: Fri, 3 Jun 2022 17:51:48 -0400 Subject: [gpfsug-discuss] Spectrum Scale 5.1.4 release notes items! In-Reply-To: References: Message-ID: I force my users to runAsUser their user ID in order to access storage ( enforced by OPA policy) and maintain POSIX complaince. I put the responsibility of being able to run as non-root and the container creator. I feel like this is growing as standard to run as non-root for things that aren't system level operators in k8s. If they aren't accessing storage, I don't care what UID they run as. On Fri, Jun 3, 2022, 5:46 PM Lukas Hejtmanek wrote: > Hello, > > nice to see that only file set can be exported now. > > We are running Kubernetes platform together with Spectrum Scale. Beside > K8s, > we have also HPC clusters using GPFS/NFS exports. > > We would like to integrate storage from HPC to K8s and vice versa. > > Currently, this is a problem because in K8s almost all users are using UID > 1000 for running pods while in HPC they have different UIDs. > > As far as I know, there is no possibility to remap UIDs between K8s and > HPC on > the same Spectrum Scale file system. Running pods with different UIDs is > hard > option as many containers assume, they run exactly as UID 1000. > > What do you think, is there anything that can be done here? > > On Fri, Jun 03, 2022 at 08:19:25PM +0000, Christopher Maestas wrote: > > Hello everyone! > > > > I know I spoke to some of you at ISC 2022 this week about some of these > features. They are officially out! > > > > Check out: > https://www.ibm.com/docs/en/spectrum-scale/5.1.4?topic=summary-changes > > Summary of changes< > https://www.ibm.com/docs/en/spectrum-scale/5.1.4?topic=summary-changes> > > This topic summarizes changes to the IBM Spectrum Scale licensed program > and the IBM Spectrum Scale library. Within each topic, these markers ( ) > surrounding text or illustrations indicate technical changes or additions > that are made to the previous edition of the information. > > www.ibm.com > > > > Particularly: > > --- > > > > Control fileset access for remote clusters > > Administrators can now configure access to remote cluster nodes for only > a subset of filesets instead of the entire file system. For more > information, see Fileset access control for remote clusters< > https://www.ibm.com/docs/en/STXKQY_5.1.4/com.ibm.spectrum.scale.v5r10.doc/bl1adv_fielsetaccesscontrol.html > >. > > > > Increase in the number of independent filesets > > In IBM Spectrum Scale the maximum number of independent filesets is > increased from 1000 to 3000. > > --- > > > > We'll talk further about this at the Scale user group in a few weeks in > London! > > > > -Chris > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Sat Jun 4 08:26:56 2022 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 4 Jun 2022 17:26:56 +1000 Subject: [gpfsug-discuss] Watch folders Message-ID: Hi all I was wondering if anyone had any scoping suggestions for enabling this feature for multiple filesystems with SMB and NFS shares We are running a standalone kafka cluster, not part of spectrumscale, and each of the multiple file system watches, update this with individual topics for each file system We have noticed file system access being affected negatively by the watches when we were running all the 10 filesystems at the same time. All of the filesets are AFM, some to NFS homes, and some to NSD homes any feedback appreciated leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jun 7 20:53:00 2022 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 8 Jun 2022 01:23:00 +0530 Subject: [gpfsug-discuss] Watch folders In-Reply-To: References: Message-ID: Hi Jake, Can you or some from your squad please answer the below Watch Folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "leslie elliott" To: "gpfsug main discussion list" Date: 04-06-2022 12.58 PM Subject: [EXTERNAL] [gpfsug-discuss] Watch folders Sent by: "gpfsug-discuss" Hi all I was wondering if anyone had any scoping suggestions for enabling this? feature for multiple filesystems with SMB and NFS shares? We are running a standalone kafka cluster, not part of spectrumscale,? and each of the multiple file system ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi all I was wondering if anyone had any scoping suggestions for enabling this feature for multiple filesystems with SMB and NFS shares We are running a standalone kafka cluster, not part of spectrumscale, and each of the multiple file system watches, update this with individual topics for each file system We have noticed file system access being affected negatively by the watches when we were running all the 10 filesystems at the same time. All of the filesets are AFM, some to NFS homes, and some to NSD homes any feedback appreciated leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Tue Jun 7 20:53:00 2022 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 8 Jun 2022 01:23:00 +0530 Subject: [gpfsug-discuss] Watch folders In-Reply-To: References: Message-ID: Hi Jake, Can you or some from your squad please answer the below Watch Folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "leslie elliott" To: "gpfsug main discussion list" Date: 04-06-2022 12.58 PM Subject: [EXTERNAL] [gpfsug-discuss] Watch folders Sent by: "gpfsug-discuss" Hi all I was wondering if anyone had any scoping suggestions for enabling this? feature for multiple filesystems with SMB and NFS shares? We are running a standalone kafka cluster, not part of spectrumscale,? and each of the multiple file system ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi all I was wondering if anyone had any scoping suggestions for enabling this feature for multiple filesystems with SMB and NFS shares We are running a standalone kafka cluster, not part of spectrumscale, and each of the multiple file system watches, update this with individual topics for each file system We have noticed file system access being affected negatively by the watches when we were running all the 10 filesystems at the same time. All of the filesets are AFM, some to NFS homes, and some to NSD homes any feedback appreciated leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Wed Jun 8 19:35:05 2022 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 9 Jun 2022 00:05:05 +0530 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> Message-ID: Hi Stephen, Currently such a feature is not available in Spectrum Scale product. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Stephen Ulmer" To: "gpfsug main discussion list" Date: 02-06-2022 11.32 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption Sent by: "gpfsug-discuss" This only adds a checksum to the NSD wire protocol. The question was about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at 1:01 PM, Achim Rehor wrote: hi Stephan, ???????????????????????????? ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd This only adds a checksum to the NSD wire protocol. The question was about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at 1:01 PM, Achim Rehor wrote: hi Stephan, there is, see mmchconfig man page : nsdCksumTraditional This attribute enables checksum data-integrity checking between a traditional NSD client node and its NSD server. Valid values are yes and no. The default value is no. (Traditional in this context means that the NSD client and server are configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID. The latter is a component of IBM Elastic Storage Server (ESS) and of IBM GPFS Storage Server (GSS).) The checksum procedure detects any corruption by the network of the data in the NSD RPCs that are exchanged between the NSD client and the server. A checksum error triggers a request to retransmit the message. When this attribute is enabled on a client node, the client indicates in each of its requests to the server that it is using checksums. The server uses checksums only in response to client requests in which the indicator is set. A client node that accesses a file system that belongs to another cluster can use checksums in the same way. You can change the value of the this attribute for an entire cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes. Note: * Enabling this feature can result in significant I/O performance degradation and a considerable increase in CPU usage. * To enable checksums for a subset of the nodes in a cluster, issue a command like the following one: mmchconfig nsdCksumTraditional=yes -i -N The -N flag is valid for this attribute. -- Mit freundlichen Gr??en / Kind regards Achim Rehor Technical Support Specialist S?pectrum Scale and ESS (SME) Advisory Product Services Professional IBM Systems Storage Support - EMEA Achim.Rehor at de.ibm.com +49-170-4521194 IBM Deutschland GmbH Vorsitzender des Aufsichtsrats: Sebastian Krause Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, Gabriele Schwarenthorer, Christine Rupp, Frank Theisen Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -----Original Message----- From: Stephan Graf Reply-To: gpfsug main discussion list To: gpfsug-discuss Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data corruption Date: Thu, 02 Jun 2022 16:31:43 +0200 Hi, I am wondering if there is an option in SS to enable some checking to detect silent data corruption. Form GNR I know that there is End-to-End integrity. So a checksum is stored in addition. The background is that we are facing an issue where in some files (which have data replication = 2) the mmrestripefile is reporting, that one block is mismatching it's copy (the storage cluster is running SS without GNR). We have validated that the copied block is fine, but the original one is broken (and this is what is returned on read access). SS right now in our installation is unable to determine which is the correct one. Is there any option to enable this kind of feature in SS? If not, does it make sense to create an "IDEA" for it? Stephan _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From st.graf at fz-juelich.de Thu Jun 9 06:59:13 2022 From: st.graf at fz-juelich.de (Stephan Graf) Date: Thu, 9 Jun 2022 07:59:13 +0200 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> Message-ID: <101bf257-ee13-11fd-95f4-523135dbb57b@fz-juelich.de> Hi, I have create an IDEA for it: https://ibm-sys-storage.ideas.ibm.com/ideas/GPFS-I-851 Stephan Am 08.06.2022 um 20:35 schrieb IBM Spectrum Scale: > Hi Stephen, > > Currently such a feature is not available in Spectrum Scale product. > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks Forum > at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > . > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please contact > ?1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for "Stephen Ulmer" ---02-06-2022 11.32.27 > PM---This only adds a checksum to the NSD wire protocol. The q"Stephen > Ulmer" ---02-06-2022 11.32.27 PM---This only adds a checksum to the NSD > wire protocol. The question was about detecting data corruption > > From: "Stephen Ulmer" > To: "gpfsug main discussion list" > Date: 02-06-2022 11.32 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data > corruption > Sent by: "gpfsug-discuss" > > ------------------------------------------------------------------------ > > > > This only adds a checksum to the NSD wire protocol. The question was > about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at > 1:01 PM, Achim Rehor wrote: hi Stephan, > ???????????????????????????? > ZjQcmQRYFpfptBannerStart > *This Message Is From an External Sender * > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > This only adds a checksum to the NSD wire protocol. The question was > about detecting data corruption at rest. > > -- > Stephen > > > On Jun 2, 2022, at 1:01 PM, Achim Rehor <_Achim.Rehor at de.ibm.com_ > > wrote: > > hi Stephan, > > there is, see mmchconfig man page : > > nsdCksumTraditional > This attribute enables checksum data-integrity checking between a > traditional NSD client node and its NSD server. Valid values are yes > and no. The default value is no. > (Traditional in this context means that the NSD client and server > are configured with IBM Spectrum Scale rather than with IBM Spectrum > Scale RAID. > The latter is a component of IBM Elastic Storage Server (ESS) and of > IBM GPFS Storage Server (GSS).) > > The checksum procedure detects any corruption by the network of the > data in the NSD RPCs that are exchanged between the NSD client and the > server. A checksum error triggers a request to retransmit the message. > > When this attribute is enabled on a client node, the client > indicates in each of its requests to the server that it is using > checksums. The server uses checksums only in > response to client requests in which the indicator is set. A client > node that accesses a file system that belongs to another cluster can > use checksums in the same way. > > You can change the value of the this attribute for an entire cluster > without shutting down the mmfsd daemon, or for one or more nodes > without restarting the nodes. > > Note: > * Enabling this feature can result in significant I/O performance > degradation and a considerable increase in CPU usage. > > * To enable checksums for a subset of the nodes in a cluster, issue > a command like the following one: > ? ?mmchconfig nsdCksumTraditional=yes -i -N > > ? ?The -N flag is valid for this attribute. > > -- > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > Technical Support Specialist S?pectrum Scale and ESS (SME) > Advisory Product Services Professional > IBM Systems Storage Support - EMEA > > _Achim.Rehor at de.ibm.com_ > ?+49-170-4521194 > IBM Deutschland GmbH > Vorsitzender des Aufsichtsrats: Sebastian Krause > Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, > Gabriele Schwarenthorer, Christine Rupp, Frank Theisen > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > -----Original Message----- > *From*: Stephan Graf <_st.graf at fz-juelich.de_ > > > *Reply-To*: gpfsug main discussion list <_gpfsug-discuss at gpfsug.org_ > > > *To*: gpfsug-discuss <_gpfsug-discuss at gpfsug.org_ > > > *Subject*: [EXTERNAL] [gpfsug-discuss] Protection against silent > data corruption > *Date*: Thu, 02 Jun 2022 16:31:43 +0200 > > Hi, > > I am wondering if there is an option in SS to enable some checking to > detect silent data corruption. > > Form GNR I know that there is End-to-End integrity. So a checksum is > stored in addition. > > The background is that we are facing an issue where in some files > (which > have data replication = ?2) the mmrestripefile is reporting, that one > block is mismatching it's copy (the storage cluster is running SS > without GNR). > We have validated that the copied block is fine, but the original > one is > broken (and this is what is returned on read access). > SS right now in our installation is unable to determine which is the > correct one. > Is there any option to enable this kind of feature in SS? If not, does > it make sense to create an "IDEA" for it? > > Stephan > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _gpfsug.org_ > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _gpfsug.org_ _ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Stephan Graf Juelich Supercomputing Centre Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From scale at us.ibm.com Thu Jun 9 19:45:40 2022 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 10 Jun 2022 00:15:40 +0530 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: <101bf257-ee13-11fd-95f4-523135dbb57b@fz-juelich.de> References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> <101bf257-ee13-11fd-95f4-523135dbb57b@fz-juelich.de> Message-ID: Thanks Stephan. This will be looked into and accordingly prioritized by the offering manager team. Incase the IBM team has any further questions on this then we will get back to you. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Stephan Graf" To: Date: 09-06-2022 11.31 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption Sent by: "gpfsug-discuss" Hi, I have create an IDEA for it: https://ibm-sys-storage.ideas.ibm.com/ideas/GPFS-I-851 Stephan Am 08.06.2022 um 20:35 schrieb IBM Spectrum Scale: > Hi Stephen, > > Currently such a feature is not available in Spectrum Scale product. > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks Forum > at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > < https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please contact > ?1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for "Stephen Ulmer" ---02-06-2022 11.32.27 > PM---This only adds a checksum to the NSD wire protocol. The q"Stephen > Ulmer" ---02-06-2022 11.32.27 PM---This only adds a checksum to the NSD > wire protocol. The question was about detecting data corruption > > From: "Stephen Ulmer" > To: "gpfsug main discussion list" > Date: 02-06-2022 11.32 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data > corruption > Sent by: "gpfsug-discuss" > > ------------------------------------------------------------------------ > > > > This only adds a checksum to the NSD wire protocol. The question was > about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at > 1:01 PM, Achim Rehor wrote: hi Stephan, > ???????????????????????????? > ZjQcmQRYFpfptBannerStart > *This Message Is From an External Sender * > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > This only adds a checksum to the NSD wire protocol. The question was > about detecting data corruption at rest. > > -- > Stephen > > > On Jun 2, 2022, at 1:01 PM, Achim Rehor <_Achim.Rehor at de.ibm.com_ > > wrote: > > hi Stephan, > > there is, see mmchconfig man page : > > nsdCksumTraditional > This attribute enables checksum data-integrity checking between a > traditional NSD client node and its NSD server. Valid values are yes > and no. The default value is no. > (Traditional in this context means that the NSD client and server > are configured with IBM Spectrum Scale rather than with IBM Spectrum > Scale RAID. > The latter is a component of IBM Elastic Storage Server (ESS) and of > IBM GPFS Storage Server (GSS).) > > The checksum procedure detects any corruption by the network of the > data in the NSD RPCs that are exchanged between the NSD client and the > server. A checksum error triggers a request to retransmit the message. > > When this attribute is enabled on a client node, the client > indicates in each of its requests to the server that it is using > checksums. The server uses checksums only in > response to client requests in which the indicator is set. A client > node that accesses a file system that belongs to another cluster can > use checksums in the same way. > > You can change the value of the this attribute for an entire cluster > without shutting down the mmfsd daemon, or for one or more nodes > without restarting the nodes. > > Note: > * Enabling this feature can result in significant I/O performance > degradation and a considerable increase in CPU usage. > > * To enable checksums for a subset of the nodes in a cluster, issue > a command like the following one: > ? ?mmchconfig nsdCksumTraditional=yes -i -N > > ? ?The -N flag is valid for this attribute. > > -- > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > Technical Support Specialist S?pectrum Scale and ESS (SME) > Advisory Product Services Professional > IBM Systems Storage Support - EMEA > > _Achim.Rehor at de.ibm.com_ > ?+49-170-4521194 > IBM Deutschland GmbH > Vorsitzender des Aufsichtsrats: Sebastian Krause > Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, > Gabriele Schwarenthorer, Christine Rupp, Frank Theisen > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > -----Original Message----- > *From*: Stephan Graf <_st.graf at fz-juelich.de_ > > > *Reply-To*: gpfsug main discussion list <_gpfsug-discuss at gpfsug.org_ > < mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e >> > *To*: gpfsug-discuss <_gpfsug-discuss at gpfsug.org_ > > > *Subject*: [EXTERNAL] [gpfsug-discuss] Protection against silent > data corruption > *Date*: Thu, 02 Jun 2022 16:31:43 +0200 > > Hi, > > I am wondering if there is an option in SS to enable some checking to > detect silent data corruption. > > Form GNR I know that there is End-to-End integrity. So a checksum is > stored in addition. > > The background is that we are facing an issue where in some files > (which > have data replication = ?2) the mmrestripefile is reporting, that one > block is mismatching it's copy (the storage cluster is running SS > without GNR). > We have validated that the copied block is fine, but the original > one is > broken (and this is what is returned on read access). > SS right now in our installation is unable to determine which is the > correct one. > Is there any option to enable this kind of feature in SS? If not, does > it make sense to create an "IDEA" for it? > > Stephan > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _gpfsug.org_ > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _gpfsug.org_ _ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Stephan Graf Juelich Supercomputing Centre Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- [attachment "smime.p7s" deleted by Huzefa H Pancha/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From ulmer at ulmer.org Thu Jun 9 20:47:07 2022 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 9 Jun 2022 15:47:07 -0400 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> <101bf257-ee13-11fd-95f4-523135dbb57b@fz-juelich.de> Message-ID: <6423D118-609A-4767-8F96-79B1D8EB4C8F@ulmer.org> Just to be clear: any follow-up should be directed to Stephan, who is requesting the feature. I am well aware that Scale does not provide this feature, and was just clarifying Stephan?s question for Achim, who answered the question with an unrelated reference after which Scale support replied to me. This is also where I notice that for all that is holy, the generated IDEA links point to DeveloperWorks and don?t even get you to the correct forum thread. Sigh. -- Stephen > On Jun 9, 2022, at 2:45 PM, IBM Spectrum Scale wrote: > > Thanks Stephan. > This will be looked into and accordingly prioritized by the offering manager team. Incase the IBM team has any further questions on this then we will get back to you. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > "Stephan Graf" ---09-06-2022 11.31.01 AM---Hi, I have create an IDEA for it: > > From: "Stephan Graf" > To: > Date: 09-06-2022 11.31 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption > Sent by: "gpfsug-discuss" > > > > > Hi, > > I have create an IDEA for it: > https://ibm-sys-storage.ideas.ibm.com/ideas/GPFS-I-851 > > Stephan > > > Am 08.06.2022 um 20:35 schrieb IBM Spectrum Scale: > > Hi Stephen, > > > > Currently such a feature is not available in Spectrum Scale product. > > > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum > > Scale (GPFS), then please post it to the public IBM developerWroks Forum > > at > > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > > >. > > > > > > If your query concerns a potential software error in Spectrum Scale > > (GPFS) and you have an IBM software maintenance contract please contact > > 1-800-237-5511 in the United States or your local IBM Service Center > > in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > Inactive hide details for "Stephen Ulmer" ---02-06-2022 11.32.27 > > PM---This only adds a checksum to the NSD wire protocol. The q"Stephen > > Ulmer" ---02-06-2022 11.32.27 PM---This only adds a checksum to the NSD > > wire protocol. The question was about detecting data corruption > > > > From: "Stephen Ulmer" > > To: "gpfsug main discussion list" > > Date: 02-06-2022 11.32 PM > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data > > corruption > > Sent by: "gpfsug-discuss" > > > > ------------------------------------------------------------------------ > > > > > > > > This only adds a checksum to the NSD wire protocol. The question was > > about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at > > 1:01 PM, Achim Rehor wrote: hi Stephan, > > ???????????????????????????? > > > > This only adds a checksum to the NSD wire protocol. The question was > > about detecting data corruption at rest. > > > > -- > > Stephen > > > > > > On Jun 2, 2022, at 1:01 PM, Achim Rehor <_Achim.Rehor at de.ibm.com_ > > >> wrote: > > > > hi Stephan, > > > > there is, see mmchconfig man page : > > > > nsdCksumTraditional > > This attribute enables checksum data-integrity checking between a > > traditional NSD client node and its NSD server. Valid values are yes > > and no. The default value is no. > > (Traditional in this context means that the NSD client and server > > are configured with IBM Spectrum Scale rather than with IBM Spectrum > > Scale RAID. > > The latter is a component of IBM Elastic Storage Server (ESS) and of > > IBM GPFS Storage Server (GSS).) > > > > The checksum procedure detects any corruption by the network of the > > data in the NSD RPCs that are exchanged between the NSD client and the > > server. A checksum error triggers a request to retransmit the message. > > > > When this attribute is enabled on a client node, the client > > indicates in each of its requests to the server that it is using > > checksums. The server uses checksums only in > > response to client requests in which the indicator is set. A client > > node that accesses a file system that belongs to another cluster can > > use checksums in the same way. > > > > You can change the value of the this attribute for an entire cluster > > without shutting down the mmfsd daemon, or for one or more nodes > > without restarting the nodes. > > > > Note: > > * Enabling this feature can result in significant I/O performance > > degradation and a considerable increase in CPU usage. > > > > * To enable checksums for a subset of the nodes in a cluster, issue > > a command like the following one: > > mmchconfig nsdCksumTraditional=yes -i -N > > > > The -N flag is valid for this attribute. > > > > -- > > Mit freundlichen Gr??en / Kind regards > > > > Achim Rehor > > > > Technical Support Specialist S?pectrum Scale and ESS (SME) > > Advisory Product Services Professional > > IBM Systems Storage Support - EMEA > > > > _Achim.Rehor at de.ibm.com_ > > > +49-170-4521194 > > IBM Deutschland GmbH > > Vorsitzender des Aufsichtsrats: Sebastian Krause > > Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, > > Gabriele Schwarenthorer, Christine Rupp, Frank Theisen > > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > > > -----Original Message----- > > *From*: Stephan Graf <_st.graf at fz-juelich.de_ > > >> > > *Reply-To*: gpfsug main discussion list <_gpfsug-discuss at gpfsug.org_ > > >> > > *To*: gpfsug-discuss <_gpfsug-discuss at gpfsug.org_ > > >> > > *Subject*: [EXTERNAL] [gpfsug-discuss] Protection against silent > > data corruption > > *Date*: Thu, 02 Jun 2022 16:31:43 +0200 > > > > Hi, > > > > I am wondering if there is an option in SS to enable some checking to > > detect silent data corruption. > > > > Form GNR I know that there is End-to-End integrity. So a checksum is > > stored in addition. > > > > The background is that we are facing an issue where in some files > > (which > > have data replication = 2) the mmrestripefile is reporting, that one > > block is mismatching it's copy (the storage cluster is running SS > > without GNR). > > We have validated that the copied block is fine, but the original > > one is > > broken (and this is what is returned on read access). > > SS right now in our installation is unable to determine which is the > > correct one. > > Is there any option to enable this kind of feature in SS? If not, does > > it make sense to create an "IDEA" for it? > > > > Stephan > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _gpfsug.org_ > > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _gpfsug.org_ >_ > > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > -- > Stephan Graf > Juelich Supercomputing Centre > > Phone: +49-2461-61-6578 > Fax: +49-2461-61-6656 > E-mail: st.graf at fz-juelich.de > WWW: http://www.fz-juelich.de/jsc/ > --------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------- > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Volker Rieke > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht, > Prof. Dr. Frauke Melchior > --------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------- > [attachment "smime.p7s" deleted by Huzefa H Pancha/India/IBM] _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Fri Jun 10 09:01:01 2022 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 10 Jun 2022 08:01:01 +0000 Subject: [gpfsug-discuss] Protection against silent data corruption In-Reply-To: <6423D118-609A-4767-8F96-79B1D8EB4C8F@ulmer.org> References: <804f4f79-e852-9713-6253-f006b1920c11@fz-juelich.de> <8359A397-6332-4791-A153-DF6752EE4806@ulmer.org> <101bf257-ee13-11fd-95f4-523135dbb57b@fz-juelich.de> <6423D118-609A-4767-8F96-79B1D8EB4C8F@ulmer.org> Message-ID: Thanks Stephen, for clarifying, i misread the initial question, and thanks Stefan for raising that IDEA. The new address for raising RFEs/IDEAs on GPFS now is : https://ibm-sys-storage.ideas.ibm.com/ideas?project=GPFS -- Mit freundlichen Gr??en / Kind regards Achim Rehor -----Original Message----- From: Stephen Ulmer > Reply-To: gpfsug main discussion list > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption Date: Thu, 09 Jun 2022 15:47:07 -0400 Just to be clear: any follow-up should be directed to Stephan, who is requesting the feature. I am well aware that Scale does not provide this feature, and was just clarifying Stephan?s question for Achim, who answered the question with an ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Just to be clear: any follow-up should be directed to Stephan, who is requesting the feature. I am well aware that Scale does not provide this feature, and was just clarifying Stephan?s question for Achim, who answered the question with an unrelated reference after which Scale support replied to me. This is also where I notice that for all that is holy, the generated IDEA links point to DeveloperWorks and don?t even get you to the correct forum thread. Sigh. -- Stephen On Jun 9, 2022, at 2:45 PM, IBM Spectrum Scale > wrote: Thanks Stephan. This will be looked into and accordingly prioritized by the offering manager team. Incase the IBM team has any further questions on this then we will get back to you. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Stephan Graf" ---09-06-2022 11.31.01 AM---Hi, I have create an IDEA for it: From: "Stephan Graf" > To: > Date: 09-06-2022 11.31 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption Sent by: "gpfsug-discuss" > ________________________________ Hi, I have create an IDEA for it: https://ibm-sys-storage.ideas.ibm.com/ideas/GPFS-I-851 Stephan Am 08.06.2022 um 20:35 schrieb IBM Spectrum Scale: > Hi Stephen, > > Currently such a feature is not available in Spectrum Scale product. > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks Forum > at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > . > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for "Stephen Ulmer" ---02-06-2022 11.32.27 > PM---This only adds a checksum to the NSD wire protocol. The q"Stephen > Ulmer" ---02-06-2022 11.32.27 PM---This only adds a checksum to the NSD > wire protocol. The question was about detecting data corruption > > From: "Stephen Ulmer" > > To: "gpfsug main discussion list" > > Date: 02-06-2022 11.32 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data > corruption > Sent by: "gpfsug-discuss" > > > ------------------------------------------------------------------------ > > > > This only adds a checksum to the NSD wire protocol. The question was > about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at > 1:01 PM, Achim Rehor > wrote: hi Stephan, > ???????????????????????????? > > This only adds a checksum to the NSD wire protocol. The question was > about detecting data corruption at rest. > > -- > Stephen > > > On Jun 2, 2022, at 1:01 PM, Achim Rehor <_Achim.Rehor at de.ibm.com_ > > wrote: > > hi Stephan, > > there is, see mmchconfig man page : > > nsdCksumTraditional > This attribute enables checksum data-integrity checking between a > traditional NSD client node and its NSD server. Valid values are yes > and no. The default value is no. > (Traditional in this context means that the NSD client and server > are configured with IBM Spectrum Scale rather than with IBM Spectrum > Scale RAID. > The latter is a component of IBM Elastic Storage Server (ESS) and of > IBM GPFS Storage Server (GSS).) > > The checksum procedure detects any corruption by the network of the > data in the NSD RPCs that are exchanged between the NSD client and the > server. A checksum error triggers a request to retransmit the message. > > When this attribute is enabled on a client node, the client > indicates in each of its requests to the server that it is using > checksums. The server uses checksums only in > response to client requests in which the indicator is set. A client > node that accesses a file system that belongs to another cluster can > use checksums in the same way. > > You can change the value of the this attribute for an entire cluster > without shutting down the mmfsd daemon, or for one or more nodes > without restarting the nodes. > > Note: > * Enabling this feature can result in significant I/O performance > degradation and a considerable increase in CPU usage. > > * To enable checksums for a subset of the nodes in a cluster, issue > a command like the following one: > mmchconfig nsdCksumTraditional=yes -i -N > > The -N flag is valid for this attribute. > > -- > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > Technical Support Specialist S?pectrum Scale and ESS (SME) > Advisory Product Services Professional > IBM Systems Storage Support - EMEA > > _Achim.Rehor at de.ibm.com_ > +49-170-4521194 > IBM Deutschland GmbH > Vorsitzender des Aufsichtsrats: Sebastian Krause > Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Nicole Reimer, > Gabriele Schwarenthorer, Christine Rupp, Frank Theisen > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > -----Original Message----- > *From*: Stephan Graf <_st.graf at fz-juelich.de_ > > > *Reply-To*: gpfsug main discussion list <_gpfsug-discuss at gpfsug.org_ > > > *To*: gpfsug-discuss <_gpfsug-discuss at gpfsug.org_ > > > *Subject*: [EXTERNAL] [gpfsug-discuss] Protection against silent > data corruption > *Date*: Thu, 02 Jun 2022 16:31:43 +0200 > > Hi, > > I am wondering if there is an option in SS to enable some checking to > detect silent data corruption. > > Form GNR I know that there is End-to-End integrity. So a checksum is > stored in addition. > > The background is that we are facing an issue where in some files > (which > have data replication = 2) the mmrestripefile is reporting, that one > block is mismatching it's copy (the storage cluster is running SS > without GNR). > We have validated that the copied block is fine, but the original > one is > broken (and this is what is returned on read access). > SS right now in our installation is unable to determine which is the > correct one. > Is there any option to enable this kind of feature in SS? If not, does > it make sense to create an "IDEA" for it? > > Stephan > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _gpfsug.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _gpfsug.org_ >_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_ > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Stephan Graf Juelich Supercomputing Centre Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- [attachment "smime.p7s" deleted by Huzefa H Pancha/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jun 10 19:30:10 2022 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Sat, 11 Jun 2022 00:00:10 +0530 Subject: [gpfsug-discuss] Watch folders In-Reply-To: References: Message-ID: Hi Jake, Just checking if you or someone from you squad got a chance to respond to Leslie's Watch folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: IBM Spectrum Scale/Poughkeepsie/IBM at IBMUS To: "gpfsug main discussion list" , Jacob M Tick/Tucson/IBM at IBMMail Cc: "gpfsug main discussion list" , "gpfsug-discuss" Date: 08-06-2022 01.27 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Watch folders Sent by: "gpfsug-discuss" Hi Jake, Can you or some from your squad please answer the below Watch Folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi Jake, Can you or some from your squad please answer the below Watch Folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. Inactive hide details for "leslie elliott" ---04-06-2022 12.58.48 PM---Hi all I was wondering if anyone had any scoping suggest"leslie elliott" ---04-06-2022 12.58.48 PM---Hi all I was wondering if anyone had any scoping suggestions for enabling this From: "leslie elliott" To: "gpfsug main discussion list" Date: 04-06-2022 12.58 PM Subject: [EXTERNAL] [gpfsug-discuss] Watch folders Sent by: "gpfsug-discuss" Hi all I was wondering if anyone had any scoping suggestions for enabling this? feature for multiple filesystems with SMB and NFS shares? We are running a standalone kafka cluster, not part of spectrumscale,? and each of the multiple file system ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi all I was wondering if anyone had any scoping suggestions for enabling this feature for multiple filesystems with SMB and NFS shares We are running a standalone kafka cluster, not part of spectrumscale, and each of the multiple file system watches, update this with individual topics for each file system We have noticed file system access being affected negatively by the watches when we were running all the 10 filesystems at the same time. All of the filesets are AFM, some to NFS homes, and some to NSD homes any feedback appreciated leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Fri Jun 10 19:30:10 2022 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Sat, 11 Jun 2022 00:00:10 +0530 Subject: [gpfsug-discuss] Watch folders In-Reply-To: References: Message-ID: Hi Jake, Just checking if you or someone from you squad got a chance to respond to Leslie's Watch folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: IBM Spectrum Scale/Poughkeepsie/IBM at IBMUS To: "gpfsug main discussion list" , Jacob M Tick/Tucson/IBM at IBMMail Cc: "gpfsug main discussion list" , "gpfsug-discuss" Date: 08-06-2022 01.27 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Watch folders Sent by: "gpfsug-discuss" Hi Jake, Can you or some from your squad please answer the below Watch Folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi Jake, Can you or some from your squad please answer the below Watch Folder query. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. Inactive hide details for "leslie elliott" ---04-06-2022 12.58.48 PM---Hi all I was wondering if anyone had any scoping suggest"leslie elliott" ---04-06-2022 12.58.48 PM---Hi all I was wondering if anyone had any scoping suggestions for enabling this From: "leslie elliott" To: "gpfsug main discussion list" Date: 04-06-2022 12.58 PM Subject: [EXTERNAL] [gpfsug-discuss] Watch folders Sent by: "gpfsug-discuss" Hi all I was wondering if anyone had any scoping suggestions for enabling this? feature for multiple filesystems with SMB and NFS shares? We are running a standalone kafka cluster, not part of spectrumscale,? and each of the multiple file system ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi all I was wondering if anyone had any scoping suggestions for enabling this feature for multiple filesystems with SMB and NFS shares We are running a standalone kafka cluster, not part of spectrumscale, and each of the multiple file system watches, update this with individual topics for each file system We have noticed file system access being affected negatively by the watches when we were running all the 10 filesystems at the same time. All of the filesets are AFM, some to NFS homes, and some to NSD homes any feedback appreciated leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From leslie.james.elliott at gmail.com Fri Jun 10 22:02:20 2022 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 11 Jun 2022 07:02:20 +1000 Subject: [gpfsug-discuss] Watch folders In-Reply-To: References: Message-ID: thanks for chasing this up I will log a support call if that is easier to track this was hoping this was something someone had seen already but doesn't look like it so far leslie On Sat, 11 Jun 2022 at 04:30, IBM Spectrum Scale wrote: > Hi Jake, > > Just checking if you or someone from you squad got a chance to respond to > Leslie's Watch folder query. > > > Regards, The Spectrum Scale (GPFS) team > > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for IBM Spectrum Scale---08-06-2022 01.27.29 > AM---Hi Jake, Can you or some from your squad please answer]IBM Spectrum > Scale---08-06-2022 01.27.29 AM---Hi Jake, Can you or some from your squad > please answer the below Watch Folder query. > > From: IBM Spectrum Scale/Poughkeepsie/IBM at IBMUS > To: "gpfsug main discussion list" , Jacob M > Tick/Tucson/IBM at IBMMail > Cc: "gpfsug main discussion list" , > "gpfsug-discuss" > Date: 08-06-2022 01.27 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Watch folders > Sent by: "gpfsug-discuss" > ------------------------------ > > > > Hi Jake, Can you or some from your squad please answer the below Watch > Folder query. Regards, The Spectrum Scale (GPFS) team > ------------------------------------------------------------------------------------------------------------------ > > ZjQcmQRYFpfptBannerStart > *This Message Is From an External Sender * > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Hi Jake, > > Can you or some from your squad please answer the below Watch Folder query. > > Regards, The Spectrum Scale (GPFS) team > > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > *https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479* > . > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for "leslie elliott" ---04-06-2022 12.58.48 > PM---Hi all I was wondering if anyone had any scoping suggest]"leslie > elliott" ---04-06-2022 12.58.48 PM---Hi all I was wondering if anyone had > any scoping suggestions for enabling this > > From: "leslie elliott" > To: "gpfsug main discussion list" > Date: 04-06-2022 12.58 PM > Subject: [EXTERNAL] [gpfsug-discuss] Watch folders > Sent by: "gpfsug-discuss" > ------------------------------ > > > > Hi all I was wondering if anyone had any scoping suggestions for enabling > this feature for multiple filesystems with SMB and NFS shares We are > running a standalone kafka cluster, not part of spectrumscale, and each of > the multiple file system > ZjQcmQRYFpfptBannerStart > *This Message Is From an External Sender * > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Hi all > > I was wondering if anyone had any scoping suggestions for enabling this > feature for multiple filesystems with SMB and NFS shares > > We are running a standalone kafka cluster, not part of spectrumscale, > and each of the multiple file system watches, update this with individual > topics > for each file system > > We have noticed file system access being affected negatively by the > watches > when we were running all the 10 filesystems at the same time. > > All of the filesets are AFM, some to NFS homes, and some to NSD homes > > any feedback appreciated > > leslie > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From leslie.james.elliott at gmail.com Fri Jun 10 22:02:20 2022 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 11 Jun 2022 07:02:20 +1000 Subject: [gpfsug-discuss] Watch folders In-Reply-To: References: Message-ID: thanks for chasing this up I will log a support call if that is easier to track this was hoping this was something someone had seen already but doesn't look like it so far leslie On Sat, 11 Jun 2022 at 04:30, IBM Spectrum Scale wrote: > Hi Jake, > > Just checking if you or someone from you squad got a chance to respond to > Leslie's Watch folder query. > > > Regards, The Spectrum Scale (GPFS) team > > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for IBM Spectrum Scale---08-06-2022 01.27.29 > AM---Hi Jake, Can you or some from your squad please answer]IBM Spectrum > Scale---08-06-2022 01.27.29 AM---Hi Jake, Can you or some from your squad > please answer the below Watch Folder query. > > From: IBM Spectrum Scale/Poughkeepsie/IBM at IBMUS > To: "gpfsug main discussion list" , Jacob M > Tick/Tucson/IBM at IBMMail > Cc: "gpfsug main discussion list" , > "gpfsug-discuss" > Date: 08-06-2022 01.27 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Watch folders > Sent by: "gpfsug-discuss" > ------------------------------ > > > > Hi Jake, Can you or some from your squad please answer the below Watch > Folder query. Regards, The Spectrum Scale (GPFS) team > ------------------------------------------------------------------------------------------------------------------ > > ZjQcmQRYFpfptBannerStart > *This Message Is From an External Sender * > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Hi Jake, > > Can you or some from your squad please answer the below Watch Folder query. > > Regards, The Spectrum Scale (GPFS) team > > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > *https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479* > . > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for "leslie elliott" ---04-06-2022 12.58.48 > PM---Hi all I was wondering if anyone had any scoping suggest]"leslie > elliott" ---04-06-2022 12.58.48 PM---Hi all I was wondering if anyone had > any scoping suggestions for enabling this > > From: "leslie elliott" > To: "gpfsug main discussion list" > Date: 04-06-2022 12.58 PM > Subject: [EXTERNAL] [gpfsug-discuss] Watch folders > Sent by: "gpfsug-discuss" > ------------------------------ > > > > Hi all I was wondering if anyone had any scoping suggestions for enabling > this feature for multiple filesystems with SMB and NFS shares We are > running a standalone kafka cluster, not part of spectrumscale, and each of > the multiple file system > ZjQcmQRYFpfptBannerStart > *This Message Is From an External Sender * > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Hi all > > I was wondering if anyone had any scoping suggestions for enabling this > feature for multiple filesystems with SMB and NFS shares > > We are running a standalone kafka cluster, not part of spectrumscale, > and each of the multiple file system watches, update this with individual > topics > for each file system > > We have noticed file system access being affected negatively by the > watches > when we were running all the 10 filesystems at the same time. > > All of the filesets are AFM, some to NFS homes, and some to NSD homes > > any feedback appreciated > > leslie > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From chair at gpfsug.org Mon Jun 13 12:12:41 2022 From: chair at gpfsug.org (chair at gpfsug.org) Date: Mon, 13 Jun 2022 12:12:41 +0100 Subject: [gpfsug-discuss] UK Spectrum Scale User Group meeting 30th June 2022 Message-ID: Hi all, Just a reminder that the next UK User Group meeting will be taking place in London (IBM York Road) on 30th June 2022. Registration is open at https://www.eventbrite.co.uk/e/spectrum-scale-user-group-registration-321290978967 The agenda is below 9:30 ? 10:00 Arrivals and refreshments 10:00 ? 10:15 Introductions and committee updates, Paul Tomlinson Group Chair and Caroline Bradley, Group Secretary 10:15 ? 10:35 Strategy Update (IBM) 10:35 ? 10:55 New S3 Access for AI and Analytics (IBM) 10:55 ? 11:20 What is new in Spectrum Scale and ESS (IBM) 11:20 ? 11:40 nvidia GPUDirect Storage (IBM) 11:40 ? 12:00 New Deplyoment using Ansible and Terraform (IBM) 12:00 ? 13:00 Buffet Lunch with viewings of :- Quantum, Immersive Room and AI Cars 13:00 ? 13:20 Migrating Spectrum Scale using Atmepo Software (Atempo) 13:20 ? 13:40 Monitoring and Serviceability Enhancements (IBM) 13:40 ? 14:00 Spectrum Scale and Spectrum Discover for Data Management University of Oslo 14:00 ? 14:30 Performance update (IBM) 14:30 ? 15:00 Tea Break with viewing of Boston Dynamics, Spot the Robot Dog 15:00 ? 15:30 Data orchestration across the global data platform (IBM) 15:30 ? 16:00 AFM Deep Dive (IBM) 16:00 ? 17:00 Group discussion, Challenges, Experiences and Questions Led by Paul Tomlinson 17:00 Drinks reception Thanks Paul From pinto at scinet.utoronto.ca Mon Jun 20 19:04:05 2022 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 20 Jun 2022 14:04:05 -0400 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? Message-ID: I'm wondering if it's possible to shrink GPFS gracefully. I've seen some references to that effect on some presentations, however I can't find detailed instructions on any formal IBM documentation on how to do it. About 3 years ago we launched a new GPFS deployment with 3 DSS-G enclosures (9.6PB usable). Some 1.5 years later we added 2 more enclosures, for a total of 16PB, and only 7PB occupancy so far. Basically I'd like to return to the original 3 enclosures, and still maintain the (8+2p) parity level. Any suggestions? Thanks Jaime --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - www.scinet.utoronto.ca University of Toronto From jonathan.buzzard at strath.ac.uk Mon Jun 20 19:40:16 2022 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 20 Jun 2022 19:40:16 +0100 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: References: Message-ID: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> On 20/06/2022 19:04, Jaime Pinto wrote: > > I'm wondering if it's possible to shrink GPFS gracefully. Yes absolutely, been possible since at least version 2.2 and probably older. > I've seen some > references to that effect on some presentations, however I can't find > detailed instructions on any formal IBM documentation on how to do it. > Use mmdeldisk to remove the NSD(s) from a file system. This will take a while so I recommend in the *STRONGEST* possible terms running it in a screen or tmux session. By a while it could be days or even weeks depending on how much data needs to be moved about. Once you have removed the NSD's from a file system then you can use mmdelnsd to wipe the NSD descriptors from the disks if necessary. > About 3 years ago we launched a new GPFS deployment with 3 DSS-G > enclosures (9.6PB usable). > Some 1.5 years later we added 2 more enclosures, for a total of 16PB, > and only 7PB occupancy so far. > > Basically I'd like to return to the original 3 enclosures, and still > maintain the (8+2p) parity level. > > Any suggestions? Not being sarky but really use Google. Say "gpfs remove nsd from file system" and select the first link! JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From luis.bolinches at fi.ibm.com Mon Jun 20 19:54:29 2022 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 20 Jun 2022 18:54:29 +0000 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> Message-ID: <9D80BF69-B3D0-4B9B-86E5-1CABCEF4F63E@fi.ibm.com> Hi I?d like to add that you will be removing, assuming identical systems and networks, 2/5 of your throughput. Hope that is ok too. -- Cheers > On 20. Jun 2022, at 21.42, Jonathan Buzzard wrote: > > ?On 20/06/2022 19:04, Jaime Pinto wrote: >> I'm wondering if it's possible to shrink GPFS gracefully. > > Yes absolutely, been possible since at least version 2.2 and probably older. > >> I've seen some references to that effect on some presentations, however I can't find detailed instructions on any formal IBM documentation on how to do it. >> > > Use mmdeldisk to remove the NSD(s) from a file system. This will take a while so I recommend in the *STRONGEST* possible terms running it in a screen or tmux session. By a while it could be days or even weeks depending on how much data needs to be moved about. > > Once you have removed the NSD's from a file system then you can use mmdelnsd to wipe the NSD descriptors from the disks if necessary. > > >> About 3 years ago we launched a new GPFS deployment with 3 DSS-G enclosures (9.6PB usable). >> Some 1.5 years later we added 2 more enclosures, for a total of 16PB, and only 7PB occupancy so far. >> Basically I'd like to return to the original 3 enclosures, and still maintain the (8+2p) parity level. >> Any suggestions? > > Not being sarky but really use Google. Say "gpfs remove nsd from file system" and select the first link! > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org Unless otherwise stated above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland From pinto at scinet.utoronto.ca Mon Jun 20 20:12:22 2022 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 20 Jun 2022 15:12:22 -0400 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> Message-ID: Thanks JAB and Luis I know, there are mmdelnsd, mmdeldisk, mmrestripefs and a few other correlated mm* commands. They are very high-level in work in bulk discreet fashion (I mean, considering the number of NSDs we have, each deletion will shave 4% of the storage at once, that is too much). Maybe I should have used the term "very gradual" instead of "gracefully" in my original email. I'm just looking to do this in a very gradual and controlled fashion, just delete(or fail) a couple of hard drives at the time. In fact, I'd like to carefully specify which hard drives (not volumes) are removed from the pool, and in which order, and set which drives should remain in read-only mode (since they will be removed later, so no data is written to them during mmrestripefs), and so on. I guess I'm looking for an article or a white paper on how to do this under "my absolute control", if that makes sense. After this exercise I expect the occupancy to be at 68% with the remaining enclosures. I'll them repurpose the left over enclosures/drives to run some experiments, and later on grow the file system again. Thanks Jaime On 6/20/2022 14:40:16, Jonathan Buzzard wrote: > On 20/06/2022 19:04, Jaime Pinto wrote: >> >> I'm wondering if it's possible to shrink GPFS gracefully. > > Yes absolutely, been possible since at least version 2.2 and probably older. > >> I've seen some references to that effect on some presentations, however I can't find detailed instructions on any formal IBM documentation on how to do it. >> > > Use mmdeldisk to remove the NSD(s) from a file system. This will take a while so I recommend in the *STRONGEST* possible terms running it in a screen or tmux session. By a while it could be days or even weeks depending on how much data needs to be moved about. > > Once you have removed the NSD's from a file system then you can use mmdelnsd to wipe the NSD descriptors from the disks if necessary. > > >> About 3 years ago we launched a new GPFS deployment with 3 DSS-G enclosures (9.6PB usable). >> Some 1.5 years later we added 2 more enclosures, for a total of 16PB, and only 7PB occupancy so far. >> >> Basically I'd like to return to the original 3 enclosures, and still maintain the (8+2p) parity level. >> >> Any suggestions? > > Not being sarky but really use Google. Say "gpfs remove nsd from file system" and select the first link! > > > JAB. > --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - www.scinet.utoronto.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From jonathan.buzzard at strath.ac.uk Mon Jun 20 20:19:15 2022 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 20 Jun 2022 20:19:15 +0100 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> Message-ID: <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> On 20/06/2022 20:12, Jaime Pinto wrote: > > Thanks JAB and Luis > > I know, there are mmdelnsd, mmdeldisk, mmrestripefs and a few other > correlated mm* commands. They are very high-level in work in bulk > discreet fashion (I mean, considering the number of NSDs we have, each > deletion will shave 4% of the storage at once, that is too much). > Then you are goosed. An NSD cannot be changed in size once created and can only ever be in a file system or out a file system. The only way to change the size of a GPFS file system is by adding or removing NSD's. I am not a fan of how the DSS-G creates small numbers of huge NSD's. In fact the script sucks a *lot* from a systems admin perspective. Then again someone at IBM thought redeploying your entire OS every time you want to make a point release upgrade was a good idea. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From luis.bolinches at fi.ibm.com Mon Jun 20 20:47:06 2022 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 20 Jun 2022 19:47:06 +0000 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> Message-ID: <16EFF080-E403-4249-885C-C25532342BD3@fi.ibm.com> Those redeploys days are gone :) And on ESS you get a fix number of vdisks per enclosure. To avoid having a 1PB or bigger vdisk. That makes it as you mentioned ? not manageable. -- Cheers > On 20. Jun 2022, at 22.20, Jonathan Buzzard wrote: > > ?On 20/06/2022 20:12, Jaime Pinto wrote: >> Thanks JAB and Luis >> I know, there are mmdelnsd, mmdeldisk, mmrestripefs and a few other correlated mm* commands. They are very high-level in work in bulk discreet fashion (I mean, considering the number of NSDs we have, each deletion will shave 4% of the storage at once, that is too much). > > Then you are goosed. An NSD cannot be changed in size once created and can only ever be in a file system or out a file system. The only way to change the size of a GPFS file system is by adding or removing NSD's. > > I am not a fan of how the DSS-G creates small numbers of huge NSD's. In fact the script sucks a *lot* from a systems admin perspective. Then again someone at IBM thought redeploying your entire OS every time you want to make a point release upgrade was a good idea. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org Unless otherwise stated above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland From anacreo at gmail.com Mon Jun 20 21:07:03 2022 From: anacreo at gmail.com (Alec) Date: Mon, 20 Jun 2022 13:07:03 -0700 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> Message-ID: Okay if you have double parity why not just resize the disk. And let gpfs recover using the parity. And by the way there is a qos setting for maintenance operations and you can give that higher priority to make the recovery/deleting/adding operations quicker. Also I don't know if this matters in gpfs but you may want to change the affinity for disk (distribute the primary/first node for each disk/Christmas tree it) to a different servers to spread the load. I don't know if gpfs will actually use that to distribute load, but worth checking. Alec On Mon, Jun 20, 2022, 12:20 PM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > On 20/06/2022 20:12, Jaime Pinto wrote: > > > > Thanks JAB and Luis > > > > I know, there are mmdelnsd, mmdeldisk, mmrestripefs and a few other > > correlated mm* commands. They are very high-level in work in bulk > > discreet fashion (I mean, considering the number of NSDs we have, each > > deletion will shave 4% of the storage at once, that is too much). > > > > Then you are goosed. An NSD cannot be changed in size once created and > can only ever be in a file system or out a file system. The only way to > change the size of a GPFS file system is by adding or removing NSD's. > > I am not a fan of how the DSS-G creates small numbers of huge NSD's. In > fact the script sucks a *lot* from a systems admin perspective. Then > again someone at IBM thought redeploying your entire OS every time you > want to make a point release upgrade was a good idea. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Mon Jun 20 21:10:04 2022 From: anacreo at gmail.com (Alec) Date: Mon, 20 Jun 2022 13:10:04 -0700 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> Message-ID: In production we've come up on an FS with missing disks and GPFS just carries on giving io errors on unusable files.. you could simply stop the disk and bring up the FS and see what it looks like, maybe do a full backup to null devices to make sure all the data is truely readable.. then decide if you want to just delete the disk and add in another disk and let GPFS recover the situation. Alec On Mon, Jun 20, 2022, 1:07 PM Alec wrote: > Okay if you have double parity why not just resize the disk. And let gpfs > recover using the parity. And by the way there is a qos setting for > maintenance operations and you can give that higher priority to make the > recovery/deleting/adding operations quicker. Also I don't know if this > matters in gpfs but you may want to change the affinity for disk > (distribute the primary/first node for each disk/Christmas tree it) to a > different servers to spread the load. I don't know if gpfs will actually > use that to distribute load, but worth checking. > > Alec > > On Mon, Jun 20, 2022, 12:20 PM Jonathan Buzzard < > jonathan.buzzard at strath.ac.uk> wrote: > >> On 20/06/2022 20:12, Jaime Pinto wrote: >> > >> > Thanks JAB and Luis >> > >> > I know, there are mmdelnsd, mmdeldisk, mmrestripefs and a few other >> > correlated mm* commands. They are very high-level in work in bulk >> > discreet fashion (I mean, considering the number of NSDs we have, each >> > deletion will shave 4% of the storage at once, that is too much). >> > >> >> Then you are goosed. An NSD cannot be changed in size once created and >> can only ever be in a file system or out a file system. The only way to >> change the size of a GPFS file system is by adding or removing NSD's. >> >> I am not a fan of how the DSS-G creates small numbers of huge NSD's. In >> fact the script sucks a *lot* from a systems admin perspective. Then >> again someone at IBM thought redeploying your entire OS every time you >> want to make a point release upgrade was a good idea. >> >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Jun 20 21:44:01 2022 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 20 Jun 2022 21:44:01 +0100 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: <16EFF080-E403-4249-885C-C25532342BD3@fi.ibm.com> References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> <16EFF080-E403-4249-885C-C25532342BD3@fi.ibm.com> Message-ID: <2f503ed4-01a3-e8c6-2e5d-808a99e8bc15@strath.ac.uk> On 20/06/2022 20:47, Luis Bolinches wrote: > > Those redeploys days are gone :) > Not yet for DSS-G unfortunately. However even if they are, it means someone thought it was an acceptable idea at some point. > And on ESS you get a fix number of vdisks per enclosure. To avoid > having a 1PB or bigger vdisk. That makes it as you mentioned ? not > manageable. > The issue I have is what I want to do is reserve a set number of disks per tray/enclosure as spares. Ok not actual disks but the capacity of a disk as a spare. As of last February that was not possible, I had to mess about creating and destroying the vdisks till I got where I wanted to be. What a palaver that was. I was also super unimpressed that the scripts throw a wobbler because my two DSS-G servers where named gpfs1 and gpfs2 which was not a problem on 4.2.x and not actually documented anywhere. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Mon Jun 20 21:56:13 2022 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 20 Jun 2022 21:56:13 +0100 Subject: [gpfsug-discuss] How to shrink GPFS on DSSG's? In-Reply-To: References: <1010bbbc-3da6-3e51-9737-8e1063730c44@strath.ac.uk> <8e9acc09-fd16-c06e-b637-f95551d2bc11@strath.ac.uk> Message-ID: <1794f84a-ce66-32e8-8e2e-5541b3dc1573@strath.ac.uk> On 20/06/2022 21:07, Alec wrote: > Also I don't know if > this matters in gpfs but you may want to change the affinity for disk > (distribute the primary/first node for each disk/Christmas tree it) to a > different servers to spread the load. I don't know if gpfs will actually > use that to distribute load, but worth checking. > Certainly made a difference historically. Perhaps not so much these days as your NSD servers are wildly more powerful than they used to be. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From p.childs at qmul.ac.uk Wed Jun 22 10:59:50 2022 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 22 Jun 2022 09:59:50 +0000 Subject: [gpfsug-discuss] [EXTERNAL] How to shrink GPFS on DSSG's? In-Reply-To: References: Message-ID: Having only just got an ESS I'm still learning how GNR works. as I read it there are currently two "breeds" of GNR, the version on the "DSS and ESS appliances" and the one in "Erasure Code Edition" As I understand it from past talks using mmdeldisk to remove a disk works fine in none GNR editions but is not the best way to do the task. My understanding is that you should mmchdisk suspend/empty # so new data is not put on the disk but the disk remains available for read. mmrestripefs -m # to move the data off the disk mmdeldisk # to actually remove the disk which should be fast as its already been emptied. We have done this with success in the past to migrate data between Raid6 arrays, with success. I believe there are some commands with mmvdisk to re-shape recovery groups in GNR but I've not as yet worked out how they work. Peter Childs ________________________________________ From: gpfsug-discuss on behalf of Jaime Pinto Sent: Monday, June 20, 2022 7:04 PM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] How to shrink GPFS on DSSG's? CAUTION: This email originated from outside of QMUL. Do not click links or open attachments unless you recognise the sender and know the content is safe. I'm wondering if it's possible to shrink GPFS gracefully. I've seen some references to that effect on some presentations, however I can't find detailed instructions on any formal IBM documentation on how to do it. About 3 years ago we launched a new GPFS deployment with 3 DSS-G enclosures (9.6PB usable). Some 1.5 years later we added 2 more enclosures, for a total of 16PB, and only 7PB occupancy so far. Basically I'd like to return to the original 3 enclosures, and still maintain the (8+2p) parity level. Any suggestions? Thanks Jaime --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - www.scinet.utoronto.ca University of Toronto _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From chair at gpfsug.org Sun Jun 26 11:11:29 2022 From: chair at gpfsug.org (chair at gpfsug.org) Date: Sun, 26 Jun 2022 11:11:29 +0100 Subject: [gpfsug-discuss] Spectrum Scale Users Group 30th June - Logistics Message-ID: Hi all, For those attending the User Group this week, please bring Photogaphic ID for entry into the IBM building. Also, we will be meeting in "The Mulberry Bush" pub (https://www.mulberrybushpub.co.uk/) on the Wednesday evening, if anyone wishes to join us. I look forward to seeing you all next week Regards Paul From tina.friedrich at it.ox.ac.uk Thu Jun 30 12:31:43 2022 From: tina.friedrich at it.ox.ac.uk (Tina Friedrich) Date: Thu, 30 Jun 2022 12:31:43 +0100 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system Message-ID: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Hello everyone, this should be a simple question, but we can't quite figure out how to best proceed. We have some file systems that we want to, basically, empty out. As in remove all files and directories currently on them. Both contain a pretty large number of files/directories (something like 50,000,000, with sometimes silly characters in the file names). 'rm -rf' clearly isn't the way to go forward. We've come up with either 'mmapplypolicy' (i.e. a policy to remove all files) or removing and re-creating the file systems as options (open to other suggestions!). We want the file systems still; ideally without having to redo the authentication and key swaps etc for the 'remote' clusters using them. This is a Lenovo DSS, but I don't think it makes much of a difference. So - what's the best way to proceed? If it is mmapplypolicy - does anyone have a (tested/known working) example of a policy to simply remove all files? Thanks, Tina -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk From olaf.weiser at de.ibm.com Thu Jun 30 13:05:33 2022 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 30 Jun 2022 12:05:33 +0000 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: Hi Tina, I think its much faster to recreate the file system after that .. it is enough to do mmauth grant {RemoteClusterName | all} -f {Device in my case ..its always ... grant all -f all ? and every remote mount will work as before.. the remote cluster key information is in the cluster CCR .. not in the filesystem.. Pay attention.. when you 'll create the file system, it will be created with the current code's version... in Case remote cluster is backlevel.. don't forget to specify --version have fun ? ________________________________ Von: gpfsug-discuss im Auftrag von Tina Friedrich Gesendet: Donnerstag, 30. Juni 2022 13:31 An: 'gpfsug main discussion list' Betreff: [EXTERNAL] [gpfsug-discuss] quickest way to delete all files (and directories) in a file system Hello everyone, this should be a simple question, but we can't quite figure out how to best proceed. We have some file systems that we want to, basically, empty out. As in remove all files and directories currently on them. Both contain a pretty large number of files/directories (something like 50,000,000, with sometimes silly characters in the file names). 'rm -rf' clearly isn't the way to go forward. We've come up with either 'mmapplypolicy' (i.e. a policy to remove all files) or removing and re-creating the file systems as options (open to other suggestions!). We want the file systems still; ideally without having to redo the authentication and key swaps etc for the 'remote' clusters using them. This is a Lenovo DSS, but I don't think it makes much of a difference. So - what's the best way to proceed? If it is mmapplypolicy - does anyone have a (tested/known working) example of a policy to simply remove all files? Thanks, Tina -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org H -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.steel at belisama.com.sg Thu Jun 30 13:26:16 2022 From: antony.steel at belisama.com.sg (Antony Steel) Date: Thu, 30 Jun 2022 22:26:16 +1000 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: <181b4938702.33b0dff581274.2969139072384838762@belisama.com.sg> Hi, Perhaps use filesets?? Quicker to remove and recreate? Keep safe, Antony Steel CTO Belisama amailto:ntony.steel at belisama.com.sg Singapore: +65 9789 6663 Australia +61 4 1980 3049 http://www.belisama.com.sg ---- On Thu, 30 Jun 2022 21:31:43 +1000 Tina Friedrich wrote --- Hello everyone, this should be a simple question, but we can't quite figure out how to best proceed. We have some file systems that we want to, basically, empty out. As in remove all files and directories currently on them. Both contain a pretty large number of files/directories (something like 50,000,000, with sometimes silly characters in the file names). 'rm -rf' clearly isn't the way to go forward. We've come up with either 'mmapplypolicy' (i.e. a policy to remove all files) or removing and re-creating the file systems as options (open to other suggestions!). We want the file systems still; ideally without having to redo the authentication and key swaps etc for the 'remote' clusters using them. This is a Lenovo DSS, but I don't think it makes much of a difference. So - what's the best way to proceed? If it is mmapplypolicy - does anyone have a (tested/known working) example of a policy to simply remove all files? Thanks, Tina -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1655907620786000_629392709.png Type: image/png Size: 12431 bytes Desc: not available URL: From stockf at us.ibm.com Thu Jun 30 13:47:45 2022 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 30 Jun 2022 12:47:45 +0000 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: For speed Olaf?s recommendation is the best option. If you really do not want to remove the file systems and recreate them, and the version of Scale is fairly current, you could use the mmfind command to simplify creating a policy to remove the files. Still removing 50M files will take some time. Fred Fred Stock, Spectrum Scale Development Advocacy stockf at us.ibm.com | 720-430-8821 From: gpfsug-discuss on behalf of Olaf Weiser Date: Thursday, June 30, 2022 at 8:09 AM To: 'gpfsug main discussion list' Subject: [EXTERNAL] Re: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system Hi Tina, I think its much faster to recreate the file system after that .. it is enough to do mmauth grant {RemoteClusterName | all} -f {Device in my case ..its always ... grant all -f all ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi Tina, I think its much faster to recreate the file system after that .. it is enough to do mmauth grant {RemoteClusterName | all} -f {Device in my case ..its always ... grant all -f all ? and every remote mount will work as before.. the remote cluster key information is in the cluster CCR .. not in the filesystem.. Pay attention.. when you 'll create the file system, it will be created with the current code's version... in Case remote cluster is backlevel.. don't forget to specify --version have fun ? ________________________________ Von: gpfsug-discuss im Auftrag von Tina Friedrich Gesendet: Donnerstag, 30. Juni 2022 13:31 An: 'gpfsug main discussion list' Betreff: [EXTERNAL] [gpfsug-discuss] quickest way to delete all files (and directories) in a file system Hello everyone, this should be a simple question, but we can't quite figure out how to best proceed. We have some file systems that we want to, basically, empty out. As in remove all files and directories currently on them. Both contain a pretty large number of files/directories (something like 50,000,000, with sometimes silly characters in the file names). 'rm -rf' clearly isn't the way to go forward. We've come up with either 'mmapplypolicy' (i.e. a policy to remove all files) or removing and re-creating the file systems as options (open to other suggestions!). We want the file systems still; ideally without having to redo the authentication and key swaps etc for the 'remote' clusters using them. This is a Lenovo DSS, but I don't think it makes much of a difference. So - what's the best way to proceed? If it is mmapplypolicy - does anyone have a (tested/known working) example of a policy to simply remove all files? Thanks, Tina -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org H -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Jun 30 13:47:45 2022 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 30 Jun 2022 12:47:45 +0000 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: For speed Olaf?s recommendation is the best option. If you really do not want to remove the file systems and recreate them, and the version of Scale is fairly current, you could use the mmfind command to simplify creating a policy to remove the files. Still removing 50M files will take some time. Fred Fred Stock, Spectrum Scale Development Advocacy stockf at us.ibm.com | 720-430-8821 From: gpfsug-discuss on behalf of Olaf Weiser Date: Thursday, June 30, 2022 at 8:09 AM To: 'gpfsug main discussion list' Subject: [EXTERNAL] Re: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system Hi Tina, I think its much faster to recreate the file system after that .. it is enough to do mmauth grant {RemoteClusterName | all} -f {Device in my case ..its always ... grant all -f all ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi Tina, I think its much faster to recreate the file system after that .. it is enough to do mmauth grant {RemoteClusterName | all} -f {Device in my case ..its always ... grant all -f all ? and every remote mount will work as before.. the remote cluster key information is in the cluster CCR .. not in the filesystem.. Pay attention.. when you 'll create the file system, it will be created with the current code's version... in Case remote cluster is backlevel.. don't forget to specify --version have fun ? ________________________________ Von: gpfsug-discuss im Auftrag von Tina Friedrich Gesendet: Donnerstag, 30. Juni 2022 13:31 An: 'gpfsug main discussion list' Betreff: [EXTERNAL] [gpfsug-discuss] quickest way to delete all files (and directories) in a file system Hello everyone, this should be a simple question, but we can't quite figure out how to best proceed. We have some file systems that we want to, basically, empty out. As in remove all files and directories currently on them. Both contain a pretty large number of files/directories (something like 50,000,000, with sometimes silly characters in the file names). 'rm -rf' clearly isn't the way to go forward. We've come up with either 'mmapplypolicy' (i.e. a policy to remove all files) or removing and re-creating the file systems as options (open to other suggestions!). We want the file systems still; ideally without having to redo the authentication and key swaps etc for the 'remote' clusters using them. This is a Lenovo DSS, but I don't think it makes much of a difference. So - what's the best way to proceed? If it is mmapplypolicy - does anyone have a (tested/known working) example of a policy to simply remove all files? Thanks, Tina -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org H -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu Jun 30 14:55:02 2022 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 30 Jun 2022 09:55:02 -0400 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: Hi Tina Please see attachment for a working version of 'mmrmdir', that I've been using for over 10 years. You may have to tweak it a bit for the name of the node you want to run it from, and the location of the policy (also attached). I have used all the suggested ways so far on this thread to delete files in bulk. I still prefer to use this script when I don't want to disturb anything else on the cluster setup, in particular multi-cluster as you appear to have. It gives absolute and fine control of what to delete. You may also use it in test mode, and gradually delete only subsets of directories if you wish. Traversing the inodes database and creating the list of files to delete is what takes most of the time, whether deleting 1M or 50M files. To that effect, deleting and recreating file systems or filesets still takes a very long time, if those areas are populated with files. Best Jaime On 6/30/22 07:31, Tina Friedrich wrote: > Hello everyone, > > this should be a simple question, but we can't quite figure out how to > best proceed. > > We have some file systems that we want to, basically, empty out. As in > remove all files and directories currently on them. Both contain a > pretty large number of files/directories (something like 50,000,000, > with sometimes silly characters in the file names). 'rm -rf' clearly > isn't the way to go forward. > > We've come up with either 'mmapplypolicy' (i.e. a policy to remove all > files) or removing and re-creating the file systems as options (open to > other suggestions!). > > We want the file systems still; ideally without having to redo the > authentication and key swaps etc for the 'remote' clusters using them. > > This is a Lenovo DSS, but I don't think it makes much of a difference. > > So - what's the best way to proceed? > > If it is mmapplypolicy - does anyone have a (tested/known working) > example of a policy to simply remove all files? > > Thanks, > Tina > --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - www.scinet.utoronto.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 -------------- next part -------------- #!/bin/bash echo "" echo "Command issued: "$0" "$@ echo "" if [ "${HOSTNAME:0:12}" != datamover ]; then echo "You can only use mmrmdir on the datamovers" echo exit fi if [ "$1" == "" ] || [ "$1" == "-h" ] || [ "$1" == "-help" ] || [ "$1" == "--h" ] || [ "$1" == "--help" ] || [ $# -gt 2 ] || [ "$1" == "-test" ]; then echo "Usage: mmrmdir [-test]" echo " -test to verify what will be deleted" echo exit fi if [ "$2" != "" ] && [ "$2" != "-test" ]; then echo "Usage: mmrmdir [-test]" echo " -test to verify what will be deleted" echo exit fi echo -n "You have 10 seconds to cancel:" for a in `seq 0 9`; do echo -n " $a" sleep 1; done echo " resuming ..." LOCATION=$1 slash=`echo $LOCATION | grep /` if [ "$slash" == "" ]; then echo "$LOCATION is not an absolute path" echo exit fi if [ "$2" == "-test" ]; then mmapplypolicy $LOCATION -P /usr/lpp/mmfs/bin/mmpolicyRules-DELETE-ALL -I test -L 2 else mmapplypolicy $LOCATION -P /usr/lpp/mmfs/bin/mmpolicyRules-DELETE-ALL -I defer -L 2 if [ "$?" != 0 ] then echo #### there was an error with mmapplypolicy execution #### else echo removing empty directories in $LOCATION rm -rf $LOCATION fi fi exit 0 -------------- next part -------------- /* Define deletion rules for aged files in /dev/scratch (system pool by default). If the file has not been accessed in 90 days AND not owned by root then delete it. */ RULE 'DelSystem' DELETE FROM POOL 'system' FOR FILESET('root') From pinto at scinet.utoronto.ca Thu Jun 30 14:55:02 2022 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 30 Jun 2022 09:55:02 -0400 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: Hi Tina Please see attachment for a working version of 'mmrmdir', that I've been using for over 10 years. You may have to tweak it a bit for the name of the node you want to run it from, and the location of the policy (also attached). I have used all the suggested ways so far on this thread to delete files in bulk. I still prefer to use this script when I don't want to disturb anything else on the cluster setup, in particular multi-cluster as you appear to have. It gives absolute and fine control of what to delete. You may also use it in test mode, and gradually delete only subsets of directories if you wish. Traversing the inodes database and creating the list of files to delete is what takes most of the time, whether deleting 1M or 50M files. To that effect, deleting and recreating file systems or filesets still takes a very long time, if those areas are populated with files. Best Jaime On 6/30/22 07:31, Tina Friedrich wrote: > Hello everyone, > > this should be a simple question, but we can't quite figure out how to > best proceed. > > We have some file systems that we want to, basically, empty out. As in > remove all files and directories currently on them. Both contain a > pretty large number of files/directories (something like 50,000,000, > with sometimes silly characters in the file names). 'rm -rf' clearly > isn't the way to go forward. > > We've come up with either 'mmapplypolicy' (i.e. a policy to remove all > files) or removing and re-creating the file systems as options (open to > other suggestions!). > > We want the file systems still; ideally without having to redo the > authentication and key swaps etc for the 'remote' clusters using them. > > This is a Lenovo DSS, but I don't think it makes much of a difference. > > So - what's the best way to proceed? > > If it is mmapplypolicy - does anyone have a (tested/known working) > example of a policy to simply remove all files? > > Thanks, > Tina > --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - www.scinet.utoronto.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 -------------- next part -------------- #!/bin/bash echo "" echo "Command issued: "$0" "$@ echo "" if [ "${HOSTNAME:0:12}" != datamover ]; then echo "You can only use mmrmdir on the datamovers" echo exit fi if [ "$1" == "" ] || [ "$1" == "-h" ] || [ "$1" == "-help" ] || [ "$1" == "--h" ] || [ "$1" == "--help" ] || [ $# -gt 2 ] || [ "$1" == "-test" ]; then echo "Usage: mmrmdir [-test]" echo " -test to verify what will be deleted" echo exit fi if [ "$2" != "" ] && [ "$2" != "-test" ]; then echo "Usage: mmrmdir [-test]" echo " -test to verify what will be deleted" echo exit fi echo -n "You have 10 seconds to cancel:" for a in `seq 0 9`; do echo -n " $a" sleep 1; done echo " resuming ..." LOCATION=$1 slash=`echo $LOCATION | grep /` if [ "$slash" == "" ]; then echo "$LOCATION is not an absolute path" echo exit fi if [ "$2" == "-test" ]; then mmapplypolicy $LOCATION -P /usr/lpp/mmfs/bin/mmpolicyRules-DELETE-ALL -I test -L 2 else mmapplypolicy $LOCATION -P /usr/lpp/mmfs/bin/mmpolicyRules-DELETE-ALL -I defer -L 2 if [ "$?" != 0 ] then echo #### there was an error with mmapplypolicy execution #### else echo removing empty directories in $LOCATION rm -rf $LOCATION fi fi exit 0 -------------- next part -------------- /* Define deletion rules for aged files in /dev/scratch (system pool by default). If the file has not been accessed in 90 days AND not owned by root then delete it. */ RULE 'DelSystem' DELETE FROM POOL 'system' FOR FILESET('root') From harr1 at llnl.gov Thu Jun 30 16:08:54 2022 From: harr1 at llnl.gov (Cameron Harr) Date: Thu, 30 Jun 2022 08:08:54 -0700 Subject: [gpfsug-discuss] quickest way to delete all files (and directories) in a file system In-Reply-To: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> References: <15938fc5-9c06-5024-6f5e-9e3d64129b12@it.ox.ac.uk> Message-ID: If you have MPI infrastructure set up already on some clients, 'drm' from the MPI File Utils can delete them fairly quickly (e.g. with 256 procs) https://github.com/hpc/mpifileutils On 6/30/22 4:31 AM, Tina Friedrich wrote: > Hello everyone, > > this should be a simple question, but we can't quite figure out how to > best proceed. > > We have some file systems that we want to, basically, empty out. As in > remove all files and directories currently on them. Both contain a > pretty large number of files/directories (something like 50,000,000, > with sometimes silly characters in the file names). 'rm -rf' clearly > isn't the way to go forward. > > We've come up with either 'mmapplypolicy' (i.e. a policy to remove all > files) or removing and re-creating the file systems as options (open > to other suggestions!). > > We want the file systems still; ideally without having to redo the > authentication and key swaps etc for the 'remote' clusters using them. > > This is a Lenovo DSS, but I don't think it makes much of a difference. > > So - what's the best way to proceed? > > If it is mmapplypolicy - does anyone have a (tested/known working) > example of a policy to simply remove all files? > > Thanks, > Tina >