From peter.chase at metoffice.gov.uk Wed Aug 2 10:09:49 2023 From: peter.chase at metoffice.gov.uk (Peter Chase) Date: Wed, 2 Aug 2023 09:09:49 +0000 Subject: [gpfsug-discuss] Inode size, and system pool subblock Message-ID: Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, the inode size is 2KiB, and the subblock is 4KiB. It feels like I'm missing something. If I've understood the docs on blocks and subblocks correctly, it sounds like the subblock is the smallest atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time. So, in my head (and maybe only there), having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so. I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes. As I said, I feel like I'm missing something, so if anyone can tell me where I'm wrong it would be greatly appreciated! Sincerely, Pete Chase UKMO -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Aug 2 12:42:46 2023 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 2 Aug 2023 11:42:46 +0000 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: Hallo Peter, [1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...] in short - yes ? [2] [...] I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes.[...] you may need to consider snapshots and directories , which all contributes to MD space predicting the space requirements for MD for directories is always hard, because the size of a directory is depending on the file's name length, the users will create... further more, using a less than 4k inode size makes also not much sense, when taking into account, that NVMEs and other modern block storage devices comes with a hardware block size of 4k (even though GPFS still can deal with 512 Bytes per sector) hope this helps .. ________________________________ Von: gpfsug-discuss im Auftrag von Peter Chase Gesendet: Mittwoch, 2. August 2023 11:09 An: gpfsug-discuss at gpfsug.org Betreff: [EXTERNAL] [gpfsug-discuss] Inode size, and system pool subblock Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, the inode size is 2KiB, and the subblock is 4KiB. It feels like I'm missing something. If I've understood the docs on blocks and subblocks correctly, it sounds like the subblock is the smallest atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time. So, in my head (and maybe only there), having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so. I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes. As I said, I feel like I'm missing something, so if anyone can tell me where I'm wrong it would be greatly appreciated! Sincerely, Pete Chase UKMO -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at hpe.com Wed Aug 2 12:56:32 2023 From: daniel.kidger at hpe.com (Kidger, Daniel) Date: Wed, 2 Aug 2023 11:56:32 +0000 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: Peter, "Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time" I believe that a 2k inode *does* save space, hence more files in the filesystem for a given size of the system metadata pool. However with modern 4k disk block sizes, you pay the price of a performance penalty. Hence unless space constrained, you should use 4k inodes always. Also remember that GPFS supports Data-on-Metadata (DoM in Lustre-speak), so 4k inodes can store small files (up to c. 3k), and so save significant space in the data pools (where the subblock size is at least 8kB and in your case probably 128kB. Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com [cid:image001.png at 01D9C540.C15BD4A0] From: gpfsug-discuss On Behalf Of Peter Chase Sent: 02 August 2023 10:10 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Inode size, and system pool subblock Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, the inode size is 2KiB, and the subblock is 4KiB. It feels like I'm missing something. If I've understood the docs on blocks and subblocks correctly, it sounds like the subblock is the smallest atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time. So, in my head (and maybe only there), having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so. I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes. As I said, I feel like I'm missing something, so if anyone can tell me where I'm wrong it would be greatly appreciated! Sincerely, Pete Chase UKMO -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2541 bytes Desc: image001.png URL: From ewahl at osc.edu Wed Aug 2 13:55:29 2023 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 2 Aug 2023 12:55:29 +0000 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: >>[2] [...] I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes.[...] >you may need to consider snapshots and directories , which all contributes to MD space >predicting the space requirements for MD for directories is always hard, because the size of a directory is depending on the file's name length, the users will create... Unless you enable encryption. In which case NO metadata will be stored on MD disks/devices. Ed Wahl Ohio Supercomputer Center From: gpfsug-discuss On Behalf Of Olaf Weiser Sent: Wednesday, August 2, 2023 7:43 AM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Inode size, and system pool subblock Hallo Peter, [1] [.?.?.?] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[.?.?.?] in short - yes ? [2] [.?.?.?] I believe I'm correct in saying that inodes are not Hallo Peter, [1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...] in short - yes ? [2] [...] I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes.[...] you may need to consider snapshots and directories , which all contributes to MD space predicting the space requirements for MD for directories is always hard, because the size of a directory is depending on the file's name length, the users will create... further more, using a less than 4k inode size makes also not much sense, when taking into account, that NVMEs and other modern block storage devices comes with a hardware block size of 4k (even though GPFS still can deal with 512 Bytes per sector) hope this helps .. ________________________________ Von: gpfsug-discuss > im Auftrag von Peter Chase > Gesendet: Mittwoch, 2. August 2023 11:09 An: gpfsug-discuss at gpfsug.org

> Betreff: [EXTERNAL] [gpfsug-discuss] Inode size, and system pool subblock Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, the inode size is 2KiB, and the subblock is 4KiB. It feels like I'm missing something. If I've understood the docs on blocks and subblocks correctly, it sounds like the subblock is the smallest atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time. So, in my head (and maybe only there), having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so. I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes. As I said, I feel like I'm missing something, so if anyone can tell me where I'm wrong it would be greatly appreciated! Sincerely, Pete Chase UKMO -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.heichler at gmx.net Wed Aug 2 14:09:29 2023 From: jan.heichler at gmx.net (Jan Heichler) Date: Wed, 2 Aug 2023 15:09:29 +0200 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: <4FB56CDF-8532-4312-A64E-C4E12D63DEB7@gmx.net> > Am 02.08.2023 um 13:42 schrieb Olaf Weiser : > > Hallo Peter, > > [1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...] > in short - yes ? > The expectation that there is a waste in space seems to come from the idea that inodes are stored as individual files - which then can?t be smaller than a subblock. Referring to: https://www.ibm.com/blogs/digitale-perspektive/wp-content/uploads/2020/03/04-SSSD20-SpectrumScale-Konzepte-Teil2-032020.pdf? 04-SSSD20-SpectrumScale-Konzepte-Teil2-032020 PDF-Dokument ? 1,2 MB Slide 21: ?Held in one invisible inode file?? -> I would understand from that that 2kiB inodes are just ligned up in a single file and worst case you lose 2kiB because you don?t completely match your 4kiB inodes Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Vorschau.png Type: image/png Size: 193197 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Aug 2 14:44:17 2023 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 2 Aug 2023 13:44:17 +0000 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: <4FB56CDF-8532-4312-A64E-C4E12D63DEB7@gmx.net> References: <4FB56CDF-8532-4312-A64E-C4E12D63DEB7@gmx.net> Message-ID: ok.. let me give some more context.. all inodes are in a (single) inode file.. so ... depending on the blocksize .. lets say ... 1MB ... you can have 256 inodes (in case of 4K inode size) ... in one block... in Peter's case .. 1 MB block would hold 512 inodes... the total number of allocated file system blocks of the inode file is a bit more complex ... off topic here to be clear .. the waste of space does not come from having small inode size/or mismatch of subblocksize ... itself.. the worst case (which is negligible) there is n unused fragment its more.. that small file's data can't (or less likely) written into the inode (data in inode) .. (( please note the good remark from Ed , only possible at all, - if there is no encryption )) so in Peter's case .. a file , that has , lets say 2.x KB ... can't be written into the inode... and so a file system block needs to be allocated.. if it is a new file.. a full block gets allocated first and then, on close of the file.. the size will be truncated to the next matching sub blocksize boundary so .. performance wise.. that adds latency and space wise.. this could be avoided .. (if the file's data fits into the inode) to be more accurate and correct, the best answer would have been .. it depends ? .. on the data structure... Am 02.?08.?2023 um 13:?42 schrieb Olaf Weiser : Hallo Peter, [1] [.?.?.?] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[.?.?.?] in ZjQcmQRYFpfptBannerStart This Message Is From an Untrusted Sender You have not previously corresponded with this sender. Report Suspicious ZjQcmQRYFpfptBannerEnd Am 02.08.2023 um 13:42 schrieb Olaf Weiser : Hallo Peter, [1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...] in short - yes ? The expectation that there is a waste in space seems to come from the idea that inodes are stored as individual files - which then can?t be smaller than a subblock. Referring to: [Vorschau.png] 04-SSSD20-SpectrumScale-Konzepte-Teil2-032020 PDF-Dokument ? 1,2 MB Slide 21: ?Held in one invisible inode file?? -> I would understand from that that 2kiB inodes are just ligned up in a single file and worst case you lose 2kiB because you don?t completely match your 4kiB inodes Jan ________________________________ Von: gpfsug-discuss im Auftrag von Jan Heichler Gesendet: Mittwoch, 2. August 2023 15:09 An: gpfsug main discussion list Betreff: [EXTERNAL] Re: [gpfsug-discuss] Inode size, and system pool subblock Am 02.?08.?2023 um 13:?42 schrieb Olaf Weiser : Hallo Peter, [1] [.?.?.?] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[.?.?.?] in ZjQcmQRYFpfptBannerStart This Message Is From an Untrusted Sender You have not previously corresponded with this sender. Report Suspicious ZjQcmQRYFpfptBannerEnd Am 02.08.2023 um 13:42 schrieb Olaf Weiser : Hallo Peter, [1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...] in short - yes ? The expectation that there is a waste in space seems to come from the idea that inodes are stored as individual files - which then can?t be smaller than a subblock. Referring to: [Vorschau.png] 04-SSSD20-SpectrumScale-Konzepte-Teil2-032020 PDF-Dokument ? 1,2 MB Slide 21: ?Held in one invisible inode file?? -> I would understand from that that 2kiB inodes are just ligned up in a single file and worst case you lose 2kiB because you don?t completely match your 4kiB inodes Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Vorschau.png Type: image/png Size: 193197 bytes Desc: Vorschau.png URL: From anacreo at gmail.com Wed Aug 2 17:07:17 2023 From: anacreo at gmail.com (Alec) Date: Wed, 2 Aug 2023 09:07:17 -0700 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: I think things are conflated here... The inode size is really just a call on how much functionality you need in an inode. I wouldn't even think about disk block size when setting this. Essentially the smaller the inode the less space I need for metadata but also the less capacity I have in my inode. The default is 4k and if you don't change it then GPFS will put up to a 3.8k file in the inode itself vs going to an indirect disk allocation. Someone mentioned encryption will bypass this feature, but it's actually encryption that perhaps requires larger inode sizes to store all the key meta info (you can have up to 8 keys per inode I believe). So essentially it you've got a smaller inode size your directories max size will max out sooner, your ACLs could be constrained, large file names can exhaust, you may not have enough space for Encryption details. But the upshot is you need to dedicate less space to metadata and can handle more file entries. So if you've got billions of files and are managing replicas then you should consider fine tuning inode size down. You can go from 3.5% of space going to inodes to 1% if you went from 4k to 512 bytes.. but there is a reason GPFS defaults to 4k... And doesn't expand on it too much. If you've guessed wrong you're kind of hosed. None of this has to do with hardware block sizes, subblock allocation and fragment sizes. And further compounded by 4k native block sizes vs emulated 512 block size some disk hardware does. For GPFS you generally will have a very large block size 256kb or 1MB and GPFS will divide those blocks into 32 fragments. So you may have your smallest unit being a 8kb or 32kb fragment. If you have a dedicated MD pool (highly recommended) you'd definitely specify a smaller block size than 1MB (128kb = 4kb fragments). The balance you're trying to strike here is the least amount of commands to retrieve your data efficiently. Think about the roundtrip on the bus being the same for a 4kb read vs a 1mb read so try to maximize this. Generally the goal of the file system is to ensure that the excess data that is read when trying to pull fragments is as useless as possible. I may also be confused but I wouldn't worry so much about inode size to block size.. just worry about getting large blocks working well for regular storage pool if your data is huge and using a smaller block size in MD if dedicate pool which is almost always recommended. Be very careful of specifying a small inode size because it's not just max filenames and max file counts in a directory.. it is much more.. and if you have a lot of small files don't underestimate the advantage of those files being stored directly in the inode. A 512 byte inode could only store about a 380byte file vs a 4k file storing 3800 byte file. These files tend to be shell scripts and config files which you really don't want to be waiting around for and occupying a huge 1mb read for and waisting a potentially larger 64kb fragment allocation on. Alec On Wed, Aug 2, 2023, 4:47 AM Olaf Weiser wrote: > Hallo Peter, > > [1] *[...] having a smaller inode size than the subblock size means* > * there's a big wastage on disk usage, with no performance benefit to > doing so[...] * > in short - yes ? > > > > [2] > *[...] I believe I'm correct in saying that inodes are not the only > things to live on the metadata pool, so I assume that some other metadata > might benefit from the larger block/subblock size. But looking at the > number of inodes, the inode size, and the space consumed in the system > pool, it really looks like the majority of space consumed is by > inodes.[...] * > you may need to consider snapshots and directories , which all contributes > to MD space > > predicting the space requirements for MD for directories is always hard, > because the size of a directory is depending on the file's name length, > the users will create... > > > further more, using a less than 4k inode size makes also not much sense, > when taking into account, that NVMEs and other modern block storage devices > comes with a hardware block size of 4k (even though GPFS still can deal > with 512 Bytes per sector) > > > hope this helps .. > > > > > > ------------------------------ > *Von:* gpfsug-discuss im Auftrag von > Peter Chase > *Gesendet:* Mittwoch, 2. August 2023 11:09 > *An:* gpfsug-discuss at gpfsug.org > *Betreff:* [EXTERNAL] [gpfsug-discuss] Inode size, and system pool > subblock > > Good Morning, I have a question about inode size vs subblock size. Can > anyone think of a reason that the chosen inode size of a scale filesystem > should be smaller than the subblock size for the metadata pool? I'm looking > at an existing filesystem, > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > Report Suspicious > > > > ZjQcmQRYFpfptBannerEnd > Good Morning, > > I have a question about inode size vs subblock size. Can anyone think of a > reason that the chosen inode size of a scale filesystem should be smaller > than the subblock size for the metadata pool? > I'm looking at an existing filesystem, the inode size is 2KiB, and the > subblock is 4KiB. > It feels like I'm missing something. If I've understood the docs on > blocks and subblocks correctly, it sounds like the subblock is the smallest > atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the > inode would return its contents and 2K of empty subblock every time. So, in > my head (and maybe only there), having a smaller inode size than the > subblock size means there's a big wastage on disk usage, with no > performance benefit to doing so. > I believe I'm correct in saying that inodes are not the only things to > live on the metadata pool, so I assume that some other metadata might > benefit from the larger block/subblock size. But looking at the number of > inodes, the inode size, and the space consumed in the system pool, it > really looks like the majority of space consumed is by inodes. > > As I said, I feel like I'm missing something, so if anyone can tell me > where I'm wrong it would be greatly appreciated! > > Sincerely, > > Pete Chase > > UKMO > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Aug 2 17:29:38 2023 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 2 Aug 2023 16:29:38 +0000 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: >Someone mentioned encryption will bypass this feature, but it's actually encryption that perhaps requires larger inode sizes to store all the key meta info (you can have up to 8 keys per inode I believe). I believe that is incorrect. If encryption is used, the size of the inode makes no difference. This is due to the fact that Only data, NOT metadata is encrypted on the file system. So storing blocks in MD spaces is out. See the Scale documentation, and older GPFS documentation, for more information. (such as Encryption - IBM Documentation ) Until such time as they start encrypting the metadata, it?s pointless to size MD for small files. Ed Wahl Ohio Supercomputer Center From: gpfsug-discuss On Behalf Of Alec Sent: Wednesday, August 2, 2023 12:07 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Inode size, and system pool subblock I think things are conflated here.?.?. The inode size is really just a call on how much functionality you need in an inode. I wouldn't even think about disk block size when setting this. Essentially the smaller the inode the less space I I think things are conflated here... The inode size is really just a call on how much functionality you need in an inode. I wouldn't even think about disk block size when setting this. Essentially the smaller the inode the less space I need for metadata but also the less capacity I have in my inode. The default is 4k and if you don't change it then GPFS will put up to a 3.8k file in the inode itself vs going to an indirect disk allocation. Someone mentioned encryption will bypass this feature, but it's actually encryption that perhaps requires larger inode sizes to store all the key meta info (you can have up to 8 keys per inode I believe). So essentially it you've got a smaller inode size your directories max size will max out sooner, your ACLs could be constrained, large file names can exhaust, you may not have enough space for Encryption details. But the upshot is you need to dedicate less space to metadata and can handle more file entries. So if you've got billions of files and are managing replicas then you should consider fine tuning inode size down. You can go from 3.5% of space going to inodes to 1% if you went from 4k to 512 bytes.. but there is a reason GPFS defaults to 4k... And doesn't expand on it too much. If you've guessed wrong you're kind of hosed. None of this has to do with hardware block sizes, subblock allocation and fragment sizes. And further compounded by 4k native block sizes vs emulated 512 block size some disk hardware does. For GPFS you generally will have a very large block size 256kb or 1MB and GPFS will divide those blocks into 32 fragments. So you may have your smallest unit being a 8kb or 32kb fragment. If you have a dedicated MD pool (highly recommended) you'd definitely specify a smaller block size than 1MB (128kb = 4kb fragments). The balance you're trying to strike here is the least amount of commands to retrieve your data efficiently. Think about the roundtrip on the bus being the same for a 4kb read vs a 1mb read so try to maximize this. Generally the goal of the file system is to ensure that the excess data that is read when trying to pull fragments is as useless as possible. I may also be confused but I wouldn't worry so much about inode size to block size.. just worry about getting large blocks working well for regular storage pool if your data is huge and using a smaller block size in MD if dedicate pool which is almost always recommended. Be very careful of specifying a small inode size because it's not just max filenames and max file counts in a directory.. it is much more.. and if you have a lot of small files don't underestimate the advantage of those files being stored directly in the inode. A 512 byte inode could only store about a 380byte file vs a 4k file storing 3800 byte file. These files tend to be shell scripts and config files which you really don't want to be waiting around for and occupying a huge 1mb read for and waisting a potentially larger 64kb fragment allocation on. Alec On Wed, Aug 2, 2023, 4:47 AM Olaf Weiser > wrote: Hallo Peter, [1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...] in short - yes ? [2] [...] I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes.[...] you may need to consider snapshots and directories , which all contributes to MD space predicting the space requirements for MD for directories is always hard, because the size of a directory is depending on the file's name length, the users will create... further more, using a less than 4k inode size makes also not much sense, when taking into account, that NVMEs and other modern block storage devices comes with a hardware block size of 4k (even though GPFS still can deal with 512 Bytes per sector) hope this helps .. ________________________________ Von: gpfsug-discuss > im Auftrag von Peter Chase > Gesendet: Mittwoch, 2. August 2023 11:09 An: gpfsug-discuss at gpfsug.org

> Betreff: [EXTERNAL] [gpfsug-discuss] Inode size, and system pool subblock Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem, the inode size is 2KiB, and the subblock is 4KiB. It feels like I'm missing something. If I've understood the docs on blocks and subblocks correctly, it sounds like the subblock is the smallest atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time. So, in my head (and maybe only there), having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so. I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes. As I said, I feel like I'm missing something, so if anyone can tell me where I'm wrong it would be greatly appreciated! Sincerely, Pete Chase UKMO _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Wed Aug 2 18:01:03 2023 From: anacreo at gmail.com (Alec) Date: Wed, 2 Aug 2023 10:01:03 -0700 Subject: [gpfsug-discuss] Inode size, and system pool subblock In-Reply-To: References: Message-ID: That part is true however the FEK (File Encryption Key) goes into the inode and they can be very large, and you can have up to 8 of them.. so may be good having one or 2 FEKs but if you go to rotate FEK's, and need an extra 2 to handle the change you could run out of room. In our FPO we don't encrypt .ksh, .sh, and other source type files. We have another policy in place that reencrypts unencrypted files that are larger 1mb by creating them under a different file name extension and then move them back over. That's how we get around this limitation. We explain that if a file is less than 1mb it shouldn't have data we are worried about encryption. Alec On Wed, Aug 2, 2023, 9:34 AM Wahl, Edward wrote: > > > >Someone mentioned encryption will bypass this feature, but it's actually > encryption that perhaps requires larger inode sizes to store all the key > meta info (you can have up to 8 keys per inode I believe). > > > > I believe that is incorrect. If encryption is used, the size of the inode > makes no difference. This is due to the fact that Only data, NOT metadata > is encrypted on the file system. So storing blocks in MD spaces is out. > See the Scale documentation, and older GPFS documentation, for more > information. (such as Encryption - IBM Documentation > > ) Until such time as they start encrypting the metadata, it?s pointless to > size MD for small files. > > > > Ed Wahl > > Ohio Supercomputer Center > > > > *From:* gpfsug-discuss *On Behalf Of * > Alec > *Sent:* Wednesday, August 2, 2023 12:07 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Inode size, and system pool subblock > > > > I think things are conflated here. . . The inode size is really just a > call on how much functionality you need in an inode. I wouldn't even think > about disk block size when setting this. Essentially the smaller the inode > the less space I > > I think things are conflated here... > > > > The inode size is really just a call on how much functionality you need in > an inode. I wouldn't even think about disk block size when setting this. > Essentially the smaller the inode the less space I need for metadata but > also the less capacity I have in my inode. > > > > The default is 4k and if you don't change it then GPFS will put up to a > 3.8k file in the inode itself vs going to an indirect disk allocation. > Someone mentioned encryption will bypass this feature, but it's actually > encryption that perhaps requires larger inode sizes to store all the key > meta info (you can have up to 8 keys per inode I believe). > > > > So essentially it you've got a smaller inode size your directories max > size will max out sooner, your ACLs could be constrained, large file names > can exhaust, you may not have enough space for Encryption details. But the > upshot is you need to dedicate less space to metadata and can handle more > file entries. So if you've got billions of files and are managing replicas > then you should consider fine tuning inode size down. > > > > You can go from 3.5% of space going to inodes to 1% if you went from 4k to > 512 bytes.. but there is a reason GPFS defaults to 4k... And doesn't expand > on it too much. If you've guessed wrong you're kind of hosed. > > > > None of this has to do with hardware block sizes, subblock allocation and > fragment sizes. And further compounded by 4k native block sizes vs > emulated 512 block size some disk hardware does. > > > > For GPFS you generally will have a very large block size 256kb or 1MB and > GPFS will divide those blocks into 32 fragments. So you may have your > smallest unit being a 8kb or 32kb fragment. If you have a dedicated MD > pool (highly recommended) you'd definitely specify a smaller block size > than 1MB (128kb = 4kb fragments). > > > > The balance you're trying to strike here is the least amount of commands > to retrieve your data efficiently. Think about the roundtrip on the bus > being the same for a 4kb read vs a 1mb read so try to maximize this. > > > > Generally the goal of the file system is to ensure that the excess data > that is read when trying to pull fragments is as useless as possible. > > > > I may also be confused but I wouldn't worry so much about inode size to > block size.. just worry about getting large blocks working well for regular > storage pool if your data is huge and using a smaller block size in MD if > dedicate pool which is almost always recommended. > > > > Be very careful of specifying a small inode size because it's not just max > filenames and max file counts in a directory.. it is much more.. and if you > have a lot of small files don't underestimate the advantage of those files > being stored directly in the inode. A 512 byte inode could only store > about a 380byte file vs a 4k file storing 3800 byte file. These files tend > to be shell scripts and config files which you really don't want to be > waiting around for and occupying a huge 1mb read for and waisting a > potentially larger 64kb fragment allocation on. > > > > Alec > > > > > > > > On Wed, Aug 2, 2023, 4:47 AM Olaf Weiser wrote: > > Hallo Peter, > > > > [1] *[...] having a smaller inode size than the subblock size means** there's > a big wastage on disk usage, with no performance benefit to doing so[...] * > > in short - yes ? > > > > > > > > [2] *[...] I believe I'm correct in saying that inodes are not the only > things to live on the metadata pool, so I assume that some other metadata > might benefit from the larger block/subblock size. But looking at the > number of inodes, the inode size, and the space consumed in the system > pool, it really looks like the majority of space consumed is by > inodes.[...] * > > you may need to consider snapshots and directories , which all contributes > to MD space > > > > predicting the space requirements for MD for directories is always hard, > because the size of a directory is depending on the file's name length, > the users will create... > > > > > > further more, using a less than 4k inode size makes also not much sense, > when taking into account, that NVMEs and other modern block storage devices > comes with a hardware block size of 4k (even though GPFS still can deal > with 512 Bytes per sector) > > > > > > hope this helps .. > > > > > > > > > ------------------------------ > > *Von:* gpfsug-discuss im Auftrag von > Peter Chase > *Gesendet:* Mittwoch, 2. August 2023 11:09 > *An:* gpfsug-discuss at gpfsug.org > *Betreff:* [EXTERNAL] [gpfsug-discuss] Inode size, and system pool > subblock > > > > Good Morning, I have a question about inode size vs subblock size. Can > anyone think of a reason that the chosen inode size of a scale filesystem > should be smaller than the subblock size for the metadata pool? I'm looking > at an existing filesystem, > > Good Morning, > > > > I have a question about inode size vs subblock size. Can anyone think of a > reason that the chosen inode size of a scale filesystem should be smaller > than the subblock size for the metadata pool? > > I'm looking at an existing filesystem, the inode size is 2KiB, and the > subblock is 4KiB. > > It feels like I'm missing something. If I've understood the docs on blocks > and subblocks correctly, it sounds like the subblock is the smallest atomic > access size. Meaning with a 4K subblock, and a 2K inode, reading the inode > would return its contents and 2K of empty subblock every time. So, in my > head (and maybe only there), having a smaller inode size than the > subblock size means there's a big wastage on disk usage, with no > performance benefit to doing so. > > I believe I'm correct in saying that inodes are not the only things to > live on the metadata pool, so I assume that some other metadata might > benefit from the larger block/subblock size. But looking at the number of > inodes, the inode size, and the space consumed in the system pool, it > really looks like the majority of space consumed is by inodes. > > > > As I said, I feel like I'm missing something, so if anyone can tell me > where I'm wrong it would be greatly appreciated! > > > > Sincerely, > > > > Pete Chase > > UKMO > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Aug 14 13:10:08 2023 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 14 Aug 2023 12:10:08 +0000 Subject: [gpfsug-discuss] Save the date - Storage Scale User Meeting @ SC23 Message-ID: Greetings, IBM is organizing a Storage Scale User Meeting at SC23. We have an exciting agenda covering user stories, roadmap updates, insights into potential future product enhancements, plus access to IBM experts and your peers. We look forward to welcoming you to this event. The user meeting is followed by a Get Together to continue the discussion. Sunday, November 12th, 2023 - 12:00-18:00 Westin Denver Downtown Detailed agenda and registration link will be shared later on the event page: https://www.spectrumscaleug.org/event/storage-scale-user-meeting-sc23/ As always we are looking for customer and partner talks to share your experience. Please drop me a mail, if you are interested to speak. Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From amjadcsu at gmail.com Mon Aug 14 16:45:56 2023 From: amjadcsu at gmail.com (Amjad Syed) Date: Mon, 14 Aug 2023 16:45:56 +0100 Subject: [gpfsug-discuss] Vmtouch on GPFS is supported? Message-ID: Hi We are using GPFS to store a particular software GUI product that is accessed over VPN and nomachine software. It takes more then 1 min to load this software. We were planning to use vmtouch daemon to see if it can reduce loading time of this software/ https://github.com/hoytech/vmtouch Just wanted to check if any one used this and got some thoughts Amjad -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Tue Aug 15 07:25:15 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Tue, 15 Aug 2023 08:25:15 +0200 Subject: [gpfsug-discuss] Vmtouch on GPFS is supported? In-Reply-To: References: Message-ID: Hi, Amjad, vmtouch uses the OS filesystem caches, but GPFS uses its own caching (pagepool). I suppose, vmtouch won't help here. Uwe On 14.08.23 17:45, Amjad Syed wrote: > Hi > > We are using GPFS to store a particular software GUI product that is > accessed over VPN and nomachine software. > > It takes more then 1 min to load this software. We were planning to > use vmtouch daemon to see if it can reduce loading time of this software/ > https://github.com/hoytech/vmtouch > Just wanted to check if any one used this and got some thoughts > > Amjad > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From uwe.falke at kit.edu Tue Aug 15 07:32:06 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Tue, 15 Aug 2023 08:32:06 +0200 Subject: [gpfsug-discuss] Vmtouch on GPFS is supported? In-Reply-To: References: Message-ID: <82878d70-bdd1-eccc-2328-95a801bcaf86@kit.edu> second point: while there is probably no vmtouch4gpfs, you might check and tune your gpfs parameters (pagepool size, maxFilesToCache).? But first you should identify where the bottleneck is. Is your GPFS cluster spanning the VPN? Suppose not. So how do you know that it is really GPFS which is delaying your loading? nomachine is an remote desktop app, how do you load software efficiently through nomachine? Uwe On 14.08.23 17:45, Amjad Syed wrote: > Hi > > We are using GPFS to store a particular software GUI product that is > accessed over VPN and nomachine software. > > It takes more then 1 min to load this software. We were planning to > use vmtouch daemon to see if it can reduce loading time of this software/ > https://github.com/hoytech/vmtouch > Just wanted to check if any one used this and got some thoughts > > Amjad > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From jonathan.buzzard at strath.ac.uk Tue Aug 15 08:26:35 2023 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Aug 2023 08:26:35 +0100 Subject: [gpfsug-discuss] Vmtouch on GPFS is supported? In-Reply-To: <82878d70-bdd1-eccc-2328-95a801bcaf86@kit.edu> References: <82878d70-bdd1-eccc-2328-95a801bcaf86@kit.edu> Message-ID: On 15/08/2023 07:32, Uwe Falke wrote: > > second point: > > while there is probably no vmtouch4gpfs, you might check and tune your > gpfs parameters (pagepool size, maxFilesToCache).? But first you should > identify where the bottleneck is. Is your GPFS cluster spanning the VPN? > Suppose not. So how do you know that it is really GPFS which is delaying > your loading? > > nomachine is an remote desktop app, how do you load software efficiently > through nomachine? I would say if it takes longer launching through nomachine than launching locally, then the problem is nomachine. This should IMHO be the first thing you test. If it takes a long time launching locally the application is a steaming pile and you need to resolve that first. We use thinlinc which is a similar Linux remote desktop solution extensively to provide a Linux desktop to our users with through VirtualGL 3D visualization capabilities and we do not have an issue with excessive launch times. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From uwe.falke at kit.edu Tue Aug 15 18:03:23 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Tue, 15 Aug 2023 19:03:23 +0200 Subject: [gpfsug-discuss] unlocking mmafmctl prefetch Message-ID: <64cf409d-9038-dc77-2165-942613e00411@kit.edu> Dear all, we had to kill a running mmafmctl prefetch ... including the children on another node. The processes appear all gone # mmdsh -N ALL 'ps -aef | egrep -i "(afm|prefetch)" | grep -v grep | wc -l' | sed -e 's/^[^:]*:/xxx:/' xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 xxx:? 0 But trying to start another prefetch tells me it is locked (Cannot initiate prefetch for fileset root.? Recovery or another instance of prefetch may be in progress.). Any suggestion how to remove that? I had recycled mmfsd on the node i run the command and I moved the SG Mgr for the SG in question, had not helped. Thanks in advance Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From olaf.weiser at de.ibm.com Tue Aug 15 18:13:00 2023 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 15 Aug 2023 17:13:00 +0000 Subject: [gpfsug-discuss] unlocking mmafmctl prefetch In-Reply-To: <64cf409d-9038-dc77-2165-942613e00411@kit.edu> References: <64cf409d-9038-dc77-2165-942613e00411@kit.edu> Message-ID: Hi Uwe, does this show show smth ? mmcommon showLocks ________________________________ Von: gpfsug-discuss im Auftrag von Uwe Falke Gesendet: Dienstag, 15. August 2023 19:03 An: gpfsug-discuss at gpfsug.org Betreff: [EXTERNAL] [gpfsug-discuss] unlocking mmafmctl prefetch Dear all, we had to kill a running mmafmctl prefetch ... including the children on another node. The processes appear all gone # mmdsh -N ALL 'ps -aef | egrep -i "(afm|prefetch)" | grep -v grep | wc -l' | sed -e 's/^[^:]*:/xxx:/' xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 xxx: 0 But trying to start another prefetch tells me it is locked (Cannot initiate prefetch for fileset root. Recovery or another instance of prefetch may be in progress.). Any suggestion how to remove that? I had recycled mmfsd on the node i run the command and I moved the SG Mgr for the SG in question, had not helped. Thanks in advance Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Tue Aug 15 18:23:15 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Tue, 15 Aug 2023 19:23:15 +0200 Subject: [gpfsug-discuss] unlocking mmafmctl prefetch In-Reply-To: References: <64cf409d-9038-dc77-2165-942613e00411@kit.edu> Message-ID: Hi, Olaf, nope ... except "No lock found." Thx -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From olaf.weiser at de.ibm.com Tue Aug 15 19:07:32 2023 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 15 Aug 2023 18:07:32 +0000 Subject: [gpfsug-discuss] unlocking mmafmctl prefetch In-Reply-To: References: <64cf409d-9038-dc77-2165-942613e00411@kit.edu>

Message-ID: if mmfsdm recycle didn't clean it up , then a.) report a SF ticket/open a case I expect, support 'll need some more data to analyze b.) let's meet tomorrow directly on the system and check (if so ..reach out to me directly) ________________________________ Von: gpfsug-discuss im Auftrag von Uwe Falke Gesendet: Dienstag, 15. August 2023 19:23 An: gpfsug-discuss at gpfsug.org Betreff: [EXTERNAL] Re: [gpfsug-discuss] unlocking mmafmctl prefetch Hi, Olaf, nope ... except "No lock found." Thx -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Wed Aug 16 11:22:46 2023 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Wed, 16 Aug 2023 10:22:46 +0000 Subject: [gpfsug-discuss] unlocking mmafmctl prefetch In-Reply-To: References: <64cf409d-9038-dc77-2165-942613e00411@kit.edu>

Message-ID: AFM prefetch is not allowed if the recovery is running. What is the caching mode? Check the fileset cache state using the command below mmafmctl device getState -j fileset You could also try commands mmafmctl stop/start. mmafmctl device stop -j fileset mmafmctl device start -j fileset ~Venkat (vpuvvada at in.ibm.com) ________________________________ From: gpfsug-discuss on behalf of Olaf Weiser Sent: Tuesday, August 15, 2023 11:37 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] unlocking mmafmctl prefetch if mmfsdm recycle didn't clean it up , then a.?) report a SF ticket/open a case I expect, support 'll need some more data to analyze b.?) let's meet tomorrow directly on the system and check (if so ..?reach out to me directly) Von: gpfsug-discuss ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd if mmfsdm recycle didn't clean it up , then a.) report a SF ticket/open a case I expect, support 'll need some more data to analyze b.) let's meet tomorrow directly on the system and check (if so ..reach out to me directly) ________________________________ Von: gpfsug-discuss im Auftrag von Uwe Falke Gesendet: Dienstag, 15. August 2023 19:23 An: gpfsug-discuss at gpfsug.org Betreff: [EXTERNAL] Re: [gpfsug-discuss] unlocking mmafmctl prefetch Hi, Olaf, nope ... except "No lock found." Thx -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Wed Aug 16 14:46:20 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Wed, 16 Aug 2023 15:46:20 +0200 Subject: [gpfsug-discuss] unlocking mmafmctl prefetch In-Reply-To: References: <64cf409d-9038-dc77-2165-942613e00411@kit.edu>

Message-ID: <8a37a76e-bf16-502d-35b4-e7f13564e810@kit.edu> Thx, Venkat, the stop / start obviously cleaned up things, prefetch is running now. Thx again Uwe On 16.08.23 12:22, Venkateswara R Puvvada wrote: > AFM prefetch is not allowed if the recovery is running.? What is the > caching mode? Check the fileset cache state using the command below > > mmafmctl device getState -j fileset > > You could also try commands mmafmctl stop/start. > > mmafmctl device stop -j fileset > mmafmctl device start -j fileset > > > ~Venkat (vpuvvada at in.ibm.com) > ------------------------------------------------------------------------ > *From:* gpfsug-discuss on behalf > of Olaf Weiser > *Sent:* Tuesday, August 15, 2023 11:37 PM > *To:* gpfsug main discussion list > *Subject:* [EXTERNAL] Re: [gpfsug-discuss] unlocking mmafmctl prefetch > if mmfsdm recycle didn't clean it up , then a.?) report a SF > ticket/open a case I expect, support 'll need some more data to > analyze b.?) let's meet tomorrow directly on the system and check (if > so ..?reach out to me directly) Von: gpfsug-discuss > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > Report?Suspicious > > ZjQcmQRYFpfptBannerEnd > if mmfsdm recycle didn't clean it up , then > a.) report a SF ticket/open a case > I expect, support 'll need some more data to analyze > b.) let's meet tomorrow directly on the system and check (if so > ..reach out to me directly) > > ------------------------------------------------------------------------ > *Von:* gpfsug-discuss im Auftrag > von Uwe Falke > *Gesendet:* Dienstag, 15. August 2023 19:23 > *An:* gpfsug-discuss at gpfsug.org > *Betreff:* [EXTERNAL] Re: [gpfsug-discuss] unlocking mmafmctl prefetch > Hi, Olaf, > > nope ... except "No lock found." > > Thx > > -- > Karlsruhe Institute of Technology (KIT) > Steinbuch Centre for Computing (SCC) > Scientific Data Management (SDM) > > Uwe Falke > > Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 > D-76344 Eggenstein-Leopoldshafen > > Tel: +49 721 608 28024 > Email: uwe.falke at kit.edu > www.scc.kit.edu > > Registered office: > Kaiserstra?e 12, 76131 Karlsruhe, Germany > > KIT ? The Research University in the Helmholtz Association > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email:uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From anacreo at gmail.com Wed Aug 16 20:28:29 2023 From: anacreo at gmail.com (Alec) Date: Wed, 16 Aug 2023 12:28:29 -0700 Subject: [gpfsug-discuss] Vmtouch on GPFS is supported? In-Reply-To: References: <82878d70-bdd1-eccc-2328-95a801bcaf86@kit.edu> Message-ID: You should keep in mind that GPFS primarily ships out configured for large sequential read/write as it's you know a multimedia file system... Not knowing your application, but assuming it maybe has a ton of little library files and such it's trying to keep track of I'd make sure to do the following optimizations: - Increase page pool maybe to 8gb or more. - Change maxfiles pagepool tracks from 4k to like 10k or 40k. - If you're not doing a separate meta device, i would ensure you are doing that. Then I'd also ensure it was on SSD or similar media and pinned to SSD if over SAN. For data like this on our AIX environment we will keep a jfs2 volume for it.. because it will just stay in RAM because we have 100gb free. And so for large sorts and quick load of random I/O this outperforms GPFS. GPFS secret benefit is that it prevents memory cache thrashing by not caching large data.... But this may be holding your app back. I couldn't find/remember the setting but I believe there is somewhere where it decides anything over like 1mb isn't worth caching, maybe that needs to be tuned for your instance. Alec On Tue, Aug 15, 2023, 12:29 AM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > On 15/08/2023 07:32, Uwe Falke wrote: > > > > second point: > > > > while there is probably no vmtouch4gpfs, you might check and tune your > > gpfs parameters (pagepool size, maxFilesToCache). But first you should > > identify where the bottleneck is. Is your GPFS cluster spanning the VPN? > > Suppose not. So how do you know that it is really GPFS which is delaying > > your loading? > > > > nomachine is an remote desktop app, how do you load software efficiently > > through nomachine? > > I would say if it takes longer launching through nomachine than > launching locally, then the problem is nomachine. This should IMHO be > the first thing you test. > > If it takes a long time launching locally the application is a steaming > pile and you need to resolve that first. > > We use thinlinc which is a similar Linux remote desktop solution > extensively to provide a Linux desktop to our users with through > VirtualGL 3D visualization capabilities and we do not have an issue with > excessive launch times. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Wed Aug 16 20:32:34 2023 From: anacreo at gmail.com (Alec) Date: Wed, 16 Aug 2023 12:32:34 -0700 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice Message-ID: Hello we are using a remote key server with GPFS I have two questions: First question: How can we verify that a key server is up and running when there are multiple key servers in an rkm pool serving a single key. The scenario is after maintenance or periodically we want to verify that all member of the pool are in service. Second question is: Is there any documentation or diagram officially from IBM that recommends having 2 keys from independent RKM environments for high availability as best practice that I could refer to? Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Aug 16 21:56:53 2023 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 16 Aug 2023 20:56:53 +0000 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References: Message-ID: > How can we verify that a key server is up and running when there are multiple key servers in an rkm pool serving a single key. Pretty simple. -Grab a compute node/client (and mark it offline if needed) unmount all encrypted File Systems. -Hack the RKM.conf to point to JUST the server you want to test (and maybe a backup) -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a great command if you need to load new Certificates too) -Attempt to mount the encrypted FS, and then cat a few files. If you?ve not setup a 2nd server in your test you will see quarantine messages in the logs for a bad KMIP server. If it works, you can clear keys again and see how many were retrieved. >Is there any documentation or diagram officially from IBM that recommends having 2 keys from independent RKM environments for high availability as best practice that I could refer to? I am not an IBM-er? but I?m also not 100% sure what you are asking here. Two un-related SKLM setups? How would you sync the keys? How would this be better than multiple replicated servers? Ed Wahl Ohio Supercomputer Center From: gpfsug-discuss On Behalf Of Alec Sent: Wednesday, August 16, 2023 3:33 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] RKM resilience questions testing and best practice Hello we are using a remote key server with GPFS I have two questions: First question: How can we verify that a key server is up and running when there are multiple key servers in an rkm pool serving a single key. The scenario is after maintenance Hello we are using a remote key server with GPFS I have two questions: First question: How can we verify that a key server is up and running when there are multiple key servers in an rkm pool serving a single key. The scenario is after maintenance or periodically we want to verify that all member of the pool are in service. Second question is: Is there any documentation or diagram officially from IBM that recommends having 2 keys from independent RKM environments for high availability as best practice that I could refer to? Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Wed Aug 16 22:22:55 2023 From: anacreo at gmail.com (Alec) Date: Wed, 16 Aug 2023 14:22:55 -0700 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References: Message-ID: Ed Thanks for the response, I wasn't aware of those two commands. I will see if that unlocks a solution. I kind of need the test to work in a production environment. So can't just be adding spare nodes onto the cluster and forgetting with file systems. Unfortunately the logs don't indicate when a node has returned to health. Only that it's in trouble but as we patch often we see these regularly. For the second question, we would add a 2nd MEK key to each file so that two independent keys from two different RKM pools would be able to unlock any file. This would give us two whole independent paths to encrypt and decrypt a file. So I'm looking for a best practice example from IBM to indicate this so we don't have a dependency on a single RKM environment. Alec On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: > > How can we verify that a key server is up and running when there are > multiple key servers in an rkm pool serving a single key. > > > > Pretty simple. > > -Grab a compute node/client (and mark it offline if needed) unmount all > encrypted File Systems. > > -Hack the RKM.conf to point to JUST the server you want to test (and maybe > a backup) > > -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? > > -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a > great command if you need to load new Certificates too) > > -Attempt to mount the encrypted FS, and then cat a few files. > > > > If you?ve not setup a 2nd server in your test you will see quarantine > messages in the logs for a bad KMIP server. If it works, you can clear > keys again and see how many were retrieved. > > > > >Is there any documentation or diagram officially from IBM that recommends > having 2 keys from independent RKM environments for high availability as > best practice that I could refer to? > > > > I am not an IBM-er? but I?m also not 100% sure what you are asking here. > Two un-related SKLM setups? How would you sync the keys? How would this > be better than multiple replicated servers? > > > > Ed Wahl > > Ohio Supercomputer Center > > > > *From:* gpfsug-discuss *On Behalf Of * > Alec > *Sent:* Wednesday, August 16, 2023 3:33 PM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] RKM resilience questions testing and best > practice > > > > Hello we are using a remote key server with GPFS I have two questions: > First question: How can we verify that a key server is up and running when > there are multiple key servers in an rkm pool serving a single key. The > scenario is after maintenance > > Hello we are using a remote key server with GPFS I have two questions: > > > > First question: > > How can we verify that a key server is up and running when there are > multiple key servers in an rkm pool serving a single key. > > > > The scenario is after maintenance or periodically we want to verify that > all member of the pool are in service. > > > > Second question is: > > Is there any documentation or diagram officially from IBM that recommends > having 2 keys from independent RKM environments for high availability as > best practice that I could refer to? > > > > Alec > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.stephenson at imperial.ac.uk Thu Aug 17 13:59:02 2023 From: robert.stephenson at imperial.ac.uk (Stephenson, Robert f) Date: Thu, 17 Aug 2023 12:59:02 +0000 Subject: [gpfsug-discuss] Hello from a new member Message-ID: Hi, my name is Rob Stephenson. We use GPFS for Academic group shares. OS is RHEL 8.7 and backend SAN attached storage is IBM V5000. We use TSM to backup GPFS data. We are currently migrating from and older GPFS instance to a new instance running: 5.1.6.0 We are using RSYNC to transfer the data. Organisation name: Imperial College London Sector: Education City / Country: London ; UK Regards Rob Rob Stephenson ICT Datacentre Services Imperial College London +44 (0)795 4176319 www.imperial.ac.uk/admin-services/ict/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Thu Aug 17 16:08:29 2023 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Thu, 17 Aug 2023 17:08:29 +0200 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References: Message-ID: Your second KMIP server don?t need to have an active replication relationship with the first one ? it just needs to contain the same MEK. So you could do a one time replication / copying between them, and they would not have to see each other anymore. I don?t think having them host different keys will work, as you won?t be able to fetch the second key from the one server your client is connected to, and then will be unable to encrypt with that key. >From what I?ve seen of KMIP setups with Scale, it?s a stupidly trivial service. It?s just a server that will tell you the key when asked + some access control to make sure no one else gets it. Also MEKs never changes? unless you actively change them in the file system policy, and then you could just post the new key to all/both your independent key servers when you do the change. -jf ons. 16. aug. 2023 kl. 23:25 skrev Alec : > Ed > Thanks for the response, I wasn't aware of those two commands. I will > see if that unlocks a solution. I kind of need the test to work in a > production environment. So can't just be adding spare nodes onto the > cluster and forgetting with file systems. > > Unfortunately the logs don't indicate when a node has returned to health. > Only that it's in trouble but as we patch often we see these regularly. > > > For the second question, we would add a 2nd MEK key to each file so that > two independent keys from two different RKM pools would be able to unlock > any file. This would give us two whole independent paths to encrypt and > decrypt a file. > > So I'm looking for a best practice example from IBM to indicate this so we > don't have a dependency on a single RKM environment. > > Alec > > > > On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: > >> > How can we verify that a key server is up and running when there are >> multiple key servers in an rkm pool serving a single key. >> >> >> >> Pretty simple. >> >> -Grab a compute node/client (and mark it offline if needed) unmount all >> encrypted File Systems. >> >> -Hack the RKM.conf to point to JUST the server you want to test (and >> maybe a backup) >> >> -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? >> >> -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a >> great command if you need to load new Certificates too) >> >> -Attempt to mount the encrypted FS, and then cat a few files. >> >> >> >> If you?ve not setup a 2nd server in your test you will see quarantine >> messages in the logs for a bad KMIP server. If it works, you can clear >> keys again and see how many were retrieved. >> >> >> >> >Is there any documentation or diagram officially from IBM that >> recommends having 2 keys from independent RKM environments for high >> availability as best practice that I could refer to? >> >> >> >> I am not an IBM-er? but I?m also not 100% sure what you are asking here. >> Two un-related SKLM setups? How would you sync the keys? How would this >> be better than multiple replicated servers? >> >> >> >> Ed Wahl >> >> Ohio Supercomputer Center >> >> >> >> *From:* gpfsug-discuss *On Behalf Of >> *Alec >> *Sent:* Wednesday, August 16, 2023 3:33 PM >> *To:* gpfsug main discussion list >> *Subject:* [gpfsug-discuss] RKM resilience questions testing and best >> practice >> >> >> >> Hello we are using a remote key server with GPFS I have two questions: >> First question: How can we verify that a key server is up and running when >> there are multiple key servers in an rkm pool serving a single key. The >> scenario is after maintenance >> >> Hello we are using a remote key server with GPFS I have two questions: >> >> >> >> First question: >> >> How can we verify that a key server is up and running when there are >> multiple key servers in an rkm pool serving a single key. >> >> >> >> The scenario is after maintenance or periodically we want to verify that >> all member of the pool are in service. >> >> >> >> Second question is: >> >> Is there any documentation or diagram officially from IBM that recommends >> having 2 keys from independent RKM environments for high availability as >> best practice that I could refer to? >> >> >> >> Alec >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Thu Aug 17 16:52:08 2023 From: anacreo at gmail.com (Alec) Date: Thu, 17 Aug 2023 08:52:08 -0700 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References:

Message-ID: Yesterday I proposed treating the replicated key servers as 2 different sets of servers. And having scale address two of the RKM servers by one rkmid/tenant/devicegrp/client name, and having a second rkmid/tenant/devicegrp/client name for the 2nd set of servers. So define the same cluster of key management servers in two separate stanzas of RKM.conf, an upper and lower half. If we do that and key management team takes one set offline, everything should work but scale would think one set of keys are offline and scream. I think we need an IBM ticket to help vet all that out. Alec On Thu, Aug 17, 2023, 8:11 AM Jan-Frode Myklebust wrote: > > Your second KMIP server don?t need to have an active replication > relationship with the first one ? it just needs to contain the same MEK. So > you could do a one time replication / copying between them, and they would > not have to see each other anymore. > > I don?t think having them host different keys will work, as you won?t be > able to fetch the second key from the one server your client is connected > to, and then will be unable to encrypt with that key. > > From what I?ve seen of KMIP setups with Scale, it?s a stupidly trivial > service. It?s just a server that will tell you the key when asked + some > access control to make sure no one else gets it. Also MEKs never changes? > unless you actively change them in the file system policy, and then you > could just post the new key to all/both your independent key servers when > you do the change. > > > -jf > > ons. 16. aug. 2023 kl. 23:25 skrev Alec : > >> Ed >> Thanks for the response, I wasn't aware of those two commands. I will >> see if that unlocks a solution. I kind of need the test to work in a >> production environment. So can't just be adding spare nodes onto the >> cluster and forgetting with file systems. >> >> Unfortunately the logs don't indicate when a node has returned to >> health. Only that it's in trouble but as we patch often we see these >> regularly. >> >> >> For the second question, we would add a 2nd MEK key to each file so that >> two independent keys from two different RKM pools would be able to unlock >> any file. This would give us two whole independent paths to encrypt and >> decrypt a file. >> >> So I'm looking for a best practice example from IBM to indicate this so >> we don't have a dependency on a single RKM environment. >> >> Alec >> >> >> >> On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: >> >>> > How can we verify that a key server is up and running when there are >>> multiple key servers in an rkm pool serving a single key. >>> >>> >>> >>> Pretty simple. >>> >>> -Grab a compute node/client (and mark it offline if needed) unmount all >>> encrypted File Systems. >>> >>> -Hack the RKM.conf to point to JUST the server you want to test (and >>> maybe a backup) >>> >>> -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? >>> >>> -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a >>> great command if you need to load new Certificates too) >>> >>> -Attempt to mount the encrypted FS, and then cat a few files. >>> >>> >>> >>> If you?ve not setup a 2nd server in your test you will see quarantine >>> messages in the logs for a bad KMIP server. If it works, you can clear >>> keys again and see how many were retrieved. >>> >>> >>> >>> >Is there any documentation or diagram officially from IBM that >>> recommends having 2 keys from independent RKM environments for high >>> availability as best practice that I could refer to? >>> >>> >>> >>> I am not an IBM-er? but I?m also not 100% sure what you are asking >>> here. Two un-related SKLM setups? How would you sync the keys? How >>> would this be better than multiple replicated servers? >>> >>> >>> >>> Ed Wahl >>> >>> Ohio Supercomputer Center >>> >>> >>> >>> *From:* gpfsug-discuss *On Behalf >>> Of *Alec >>> *Sent:* Wednesday, August 16, 2023 3:33 PM >>> *To:* gpfsug main discussion list >>> *Subject:* [gpfsug-discuss] RKM resilience questions testing and best >>> practice >>> >>> >>> >>> Hello we are using a remote key server with GPFS I have two questions: >>> First question: How can we verify that a key server is up and running when >>> there are multiple key servers in an rkm pool serving a single key. The >>> scenario is after maintenance >>> >>> Hello we are using a remote key server with GPFS I have two questions: >>> >>> >>> >>> First question: >>> >>> How can we verify that a key server is up and running when there are >>> multiple key servers in an rkm pool serving a single key. >>> >>> >>> >>> The scenario is after maintenance or periodically we want to verify that >>> all member of the pool are in service. >>> >>> >>> >>> Second question is: >>> >>> Is there any documentation or diagram officially from IBM that >>> recommends having 2 keys from independent RKM environments for high >>> availability as best practice that I could refer to? >>> >>> >>> >>> Alec >>> >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Thu Aug 17 16:54:29 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Thu, 17 Aug 2023 17:54:29 +0200 Subject: [gpfsug-discuss] GPL compilation failure Message-ID: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> Hi, just to let you know: building the 5.1.7.1 GPL layer for Linux Kernel 4.18.0-477.21.1.el8_8.x86_64 failed for me: [...] In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) ^~~~~~~~~~~~~~~~~~~~~ NFS_INO_ODIRECT /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in [...] If anyone managed to build that, a message would be nice, else you should expect problems. I opened a case with IBM but it is caught in the call entry ... (SF TS013920636, in case any supporter wants to look after it). Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email:uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From Renar.Grunenberg at huk-coburg.de Thu Aug 17 17:04:03 2023 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 17 Aug 2023 16:04:03 +0000 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> Message-ID: Hallo Uwe, Kernel-Level 4.18.0-477 is only supported with scale 5.1.8.x. We had the same issue and must updated tot he last level. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. Helen Reck, Dr. J?rg Rheinl?nder, Thomas Sehn, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss Im Auftrag von Uwe Falke Gesendet: Donnerstag, 17. August 2023 17:54 An: gpfsug-discuss at gpfsug.org Betreff: [gpfsug-discuss] GPL compilation failure Hi, just to let you know: building the 5.1.7.1 GPL layer for Linux Kernel 4.18.0-477.21.1.el8_8.x86_64 failed for me: [...] In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) ^~~~~~~~~~~~~~~~~~~~~ NFS_INO_ODIRECT /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in [...] If anyone managed to build that, a message would be nice, else you should expect problems. I opened a case with IBM but it is caught in the call entry ... (SF TS013920636, in case any supporter wants to look after it). Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.nickell at inl.gov Thu Aug 17 17:16:44 2023 From: ben.nickell at inl.gov (Ben G. Nickell) Date: Thu, 17 Aug 2023 16:16:44 +0000 Subject: [gpfsug-discuss] [EXTERNAL] GPL compilation failure In-Reply-To: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> Message-ID: I had to upgrade to 5.1.8.0 or 5.1.8.1 to get it to build on that kernel, I think. Then it worked. From: gpfsug-discuss on behalf of Uwe Falke Date: Thursday, August 17, 2023 at 9:58 AM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] GPL compilation failure Hi, just to let you know: building the 5.1.7.1 GPL layer for Linux Kernel 4.18.0-477.21.1.el8_8.x86_64 failed for me: [...] In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) ^~~~~~~~~~~~~~~~~~~~~ NFS_INO_ODIRECT /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in [...] If anyone managed to build that, a message would be nice, else you should expect problems. I opened a case with IBM but it is caught in the call entry ... (SF TS013920636, in case any supporter wants to look after it). Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Thu Aug 17 17:38:32 2023 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 17 Aug 2023 16:38:32 +0000 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> Message-ID: <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> Not supported on that GPFS version; probably need a 5.1.8.x ? definitely supported on 5.1.8.1. Always check here before upgrading the kernel: https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Aug 17, 2023, at 11:54, Uwe Falke wrote: Hi, just to let you know: building the 5.1.7.1 GPL layer for Linux Kernel 4.18.0-477.21.1.el8_8.x86_64 failed for me: [...] In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) ^~~~~~~~~~~~~~~~~~~~~ NFS_INO_ODIRECT /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in [...] If anyone managed to build that, a message would be nice, else you should expect problems. I opened a case with IBM but it is caught in the call entry ... (SF TS013920636, in case any supporter wants to look after it). Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Thu Aug 17 17:53:41 2023 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 17 Aug 2023 16:53:41 +0000 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> Message-ID: <6E40DCAC-4EB1-4414-A300-86F446A934B8@rutgers.edu> Sorry, one more thing: while you can often upgrade from say 3.10.0-1160.92.1 to 3.10.0-1160.95.1, like the kind of upgrade you?d see within an RHEL point release ? and you?ll often see an older version get updated to list support for those kinds of version increments ? you should definitely expect to potentially need to upgrade GPFS if you?re going between RHEL point releases, like 8.7 to 8.8 ? those ones that change from, say, 4.18.0-425.x to 4.18.0-477.x. Sometimes unsupported versions will work, but often times they will not, and if you have a support contract, that difference is sort of meaningless since you wouldn?t want to run an unsupported version anyway. On Aug 17, 2023, at 12:38, Ryan Novosielski wrote: Not supported on that GPFS version; probably need a 5.1.8.x ? definitely supported on 5.1.8.1. Always check here before upgrading the kernel: https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Aug 17, 2023, at 11:54, Uwe Falke wrote: Hi, just to let you know: building the 5.1.7.1 GPL layer for Linux Kernel 4.18.0-477.21.1.el8_8.x86_64 failed for me: [...] In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) ^~~~~~~~~~~~~~~~~~~~~ NFS_INO_ODIRECT /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in [...] If anyone managed to build that, a message would be nice, else you should expect problems. I opened a case with IBM but it is caught in the call entry ... (SF TS013920636, in case any supporter wants to look after it). Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Thu Aug 17 18:58:15 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Thu, 17 Aug 2023 19:58:15 +0200 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: <6E40DCAC-4EB1-4414-A300-86F446A934B8@rutgers.edu> References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> <6E40DCAC-4EB1-4414-A300-86F446A934B8@rutgers.edu> Message-ID: thanks, ryan. I would have loved to stay with RHEL 8.7, however RHEL chose to not patch the RHEL8.7 Kernel line it seems ... Uwe On 17.08.23 18:53, Ryan Novosielski wrote: > Sorry, one more thing: while you can often upgrade from say > 3.10.0-1160.92.1 to 3.10.0-1160.95.1, like the kind of upgrade you?d > see within an RHEL point release ? and you?ll often see an older > version get updated to list support for those kinds of version > increments ? you should definitely expect to potentially need to > upgrade GPFS if you?re going between RHEL point releases, like 8.7 to > 8.8 ? those ones that change from, say, 4.18.0-425.x to 4.18.0-477.x. > > Sometimes unsupported versions will work, but often times they will > not, and if you have a support contract, that difference is sort of > meaningless since you wouldn?t want to run an unsupported version anyway. > >> On Aug 17, 2023, at 12:38, Ryan Novosielski wrote: >> >> Not supported on that GPFS version; probably need a 5.1.8.x ? >> definitely supported on 5.1.8.1. >> >> Always check here before upgrading the kernel: >> >> https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html >> >> -- >> #BlackLivesMatter >> ____ >> ||?\\UTGERS, |---------------------------*O*--------------------------- >> ||_// the State?| ? ? ? ? Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist -?973/972.0922 (2x0922) ~*~ >> RBHS?Campus >> || ?\\ ? ?of NJ?| Office of Advanced?Research Computing - MSB >> A555B,?Newark >> ? ? ?`' >> >>> On Aug 17, 2023, at 11:54, Uwe Falke wrote: >>> >>> Hi, just to let you know: >>> >>> building the 5.1.7.1 GPL layer for Linux Kernel >>> 4.18.0-477.21.1.el8_8.x86_64 failed for me: >>> >>> [...] >>> >>> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, >>> from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: >>> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': >>> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? >>> && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) >>> ^~~~~~~~~~~~~~~~~~~~~ >>> NFS_INO_ODIRECT >>> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in >>> [...] >>> >>> If anyone managed to build that, a message would be nice, else you >>> should expect problems. >>> >>> I opened a case with IBM but it is caught in the call entry ... (SF >>> TS013920636, in case any supporter wants to look after it). >>> >>> Uwe >>> >>> -- >>> Karlsruhe Institute of Technology (KIT) >>> Steinbuch Centre for Computing (SCC) >>> Scientific Data Management (SDM) >>> >>> Uwe Falke >>> >>> Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 >>> D-76344 Eggenstein-Leopoldshafen >>> >>> Tel: +49 721 608 28024 >>> Email:uwe.falke at kit.edu >>> www.scc.kit.edu >>> >>> Registered office: >>> Kaiserstra?e 12, 76131 Karlsruhe, Germany >>> >>> KIT ? The Research University in the Helmholtz Association >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email:uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From christof.schmitt at us.ibm.com Thu Aug 17 19:08:22 2023 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 17 Aug 2023 18:08:22 +0000 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> <6E40DCAC-4EB1-4414-A300-86F446A934B8@rutgers.edu> Message-ID: <95a6175d548915d6b1be28056722b0c8288204c0.camel@us.ibm.com> I cannot speak for Redhat, but the RHEL release cycles are documented: https://access.redhat.com/support/policy/updates/errata#RHEL8_and_9_Life_Cycle The rule of thumb for Scale is that a new RHEL minor release will likely be supported in the Scale release or PTF following the RHEL release. There might be exceptions dending on the timing of the release dates, possible problems found in testing, etc. Regards, Christof On Thu, 2023-08-17 at 19:58 +0200, Uwe Falke wrote: thanks, ryan. I would have loved to stay with RHEL 8.7, however RHEL chose to not patch the RHEL8.7 Kernel line it seems ... Uwe On 17.08.23 18:53, Ryan Novosielski wrote: Sorry, one more thing: while you can often upgrade from say 3.10.0-1160.92.1 to 3.10.0-1160.95.1, like the kind of upgrade you?d see within an RHEL point release ? and you?ll often see an older version get updated to list support for those kinds of version increments ? you should definitely expect to potentially need to upgrade GPFS if you?re going between RHEL point releases, like 8.7 to 8.8 ? those ones that change from, say, 4.18.0-425.x to 4.18.0-477.x. Sometimes unsupported versions will work, but often times they will not, and if you have a support contract, that difference is sort of meaningless since you wouldn?t want to run an unsupported version anyway. On Aug 17, 2023, at 12:38, Ryan Novosielski wrote: Not supported on that GPFS version; probably need a 5.1.8.x ? definitely supported on 5.1.8.1. Always check here before upgrading the kernel: https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Aug 17, 2023, at 11:54, Uwe Falke wrote: Hi, just to let you know: building the 5.1.7.1 GPL layer for Linux Kernel 4.18.0-477.21.1.el8_8.x86_64 failed for me: [...] In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) ^~~~~~~~~~~~~~~~~~~~~ NFS_INO_ODIRECT /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in [...] If anyone managed to build that, a message would be nice, else you should expect problems. I opened a case with IBM but it is caught in the call entry ... (SF TS013920636, in case any supporter wants to look after it). Uwe -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Thu Aug 17 19:38:54 2023 From: uwe.falke at kit.edu (Uwe Falke) Date: Thu, 17 Aug 2023 20:38:54 +0200 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: <95a6175d548915d6b1be28056722b0c8288204c0.camel@us.ibm.com> References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> <6E40DCAC-4EB1-4414-A300-86F446A934B8@rutgers.edu> <95a6175d548915d6b1be28056722b0c8288204c0.camel@us.ibm.com> Message-ID: thanks for that, so it is clear RHEL 8.7 won't get any more updates. Uwe On 17.08.23 20:08, Christof Schmitt wrote: > I cannot speak for Redhat, but the RHEL release cycles are documented: > https://access.redhat.com/support/policy/updates/errata#RHEL8_and_9_Life_Cycle > > The rule of thumb for Scale is that a new RHEL minor release will > likely be supported in the Scale release or PTF following the RHEL > release. There might be exceptions dending on the timing of the > release dates, possible problems found in testing, etc. > > Regards, > > Christof > > On Thu, 2023-08-17 at 19:58 +0200, Uwe Falke wrote: >> >> thanks, ryan. >> >> I would have loved to stay with RHEL 8.7, however RHEL chose to not >> patch the RHEL8.7 Kernel line it seems ... >> >> Uwe >> >> On 17.08.23 18:53, Ryan Novosielski wrote: >>> Sorry, one more thing: while you can often upgrade from say >>> 3.10.0-1160.92.1 to 3.10.0-1160.95.1, like the kind of upgrade you?d >>> see within an RHEL point release ? and you?ll often see an older >>> version get updated to list support for those kinds of version >>> increments ? you should definitely expect to potentially need to >>> upgrade GPFS if you?re going between RHEL point releases, like 8.7 >>> to 8.8 ? those ones that change from, say, 4.18.0-425.x to >>> 4.18.0-477.x. >>> >>> Sometimes unsupported versions will work, but often times they will >>> not, and if you have a support contract, that difference is sort of >>> meaningless since you wouldn?t want to run an unsupported version >>> anyway. >>> >>>> On Aug 17, 2023, at 12:38, Ryan Novosielski >>>> wrote: >>>> >>>> Not supported on that GPFS version; probably need a 5.1.8.x ? >>>> definitely supported on 5.1.8.1. >>>> >>>> Always check here before upgrading the kernel: >>>> >>>> https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html >>>> >>>> -- >>>> #BlackLivesMatter >>>> ____ >>>> ||?\\UTGERS, |---------------------------*O*--------------------------- >>>> ||_// the State ?| ? ? ? ? Ryan Novosielski - novosirj at rutgers.edu >>>> || \\ University | Sr. Technologist -?973/972.0922 (2x0922) ~*~ >>>> RBHS?Campus >>>> || ?\\ ? ?of NJ ?| Office of Advanced?Research Computing - MSB >>>> A555B,?Newark >>>> ? ? ?`' >>>> >>>>> On Aug 17, 2023, at 11:54, Uwe Falke wrote: >>>>> >>>>> Hi, just to let you know: >>>>> >>>>> building the 5.1.7.1 GPL layer for Linux Kernel >>>>> 4.18.0-477.21.1.el8_8.x86_64 failed for me: >>>>> >>>>> [...] >>>>> >>>>> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:87, >>>>> from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: >>>>> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'pcache_nfs_have_rdirplus': >>>>> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: error: 'NFS_INO_ADVISE_RDPLUS' undeclared (first use in this function); did you mean 'NFS_INO_ODIRECT'? >>>>> && test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inodeP)->flags) >>>>> ^~~~~~~~~~~~~~~~~~~~~ >>>>> NFS_INO_ODIRECT >>>>> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:4011:31: note: each undeclared identifier is reported only once for each function it appears in >>>>> [...] >>>>> >>>>> If anyone managed to build that, a message would be nice, else you >>>>> should expect problems. >>>>> >>>>> I opened a case with IBM but it is caught in the call entry ... >>>>> (SF TS013920636, in case any supporter wants to look after it). >>>>> >>>>> Uwe >>>>> >>>>> -- >>>>> Karlsruhe Institute of Technology (KIT) >>>>> Steinbuch Centre for Computing (SCC) >>>>> Scientific Data Management (SDM) >>>>> Uwe Falke >>>>> Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 >>>>> D-76344 Eggenstein-Leopoldshafen >>>>> Tel: +49 721 608 28024 >>>>> Email: >>>>> uwe.falke at kit.edu >>>>> www.scc.kit.edu >>>>> Registered office: >>>>> Kaiserstra?e 12, 76131 Karlsruhe, Germany >>>>> KIT ? The Research University in the Helmholtz Association >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> -- >> Karlsruhe Institute of Technology (KIT) >> Steinbuch Centre for Computing (SCC) >> Scientific Data Management (SDM) >> Uwe Falke >> Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 >> D-76344 Eggenstein-Leopoldshafen >> Tel: +49 721 608 28024 >> Email: >> uwe.falke at kit.edu >> www.scc.kit.edu >> Registered office: >> Kaiserstra?e 12, 76131 Karlsruhe, Germany >> KIT ? The Research University in the Helmholtz Association >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email:uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From jonathan.buzzard at strath.ac.uk Thu Aug 17 22:24:30 2023 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Aug 2023 22:24:30 +0100 Subject: [gpfsug-discuss] GPL compilation failure In-Reply-To: References: <0eadd571-5863-945f-8148-99186611bd8e@kit.edu> <21B0BB52-C8CA-4A27-9385-84D4DFA5871A@rutgers.edu> <6E40DCAC-4EB1-4414-A300-86F446A934B8@rutgers.edu> <95a6175d548915d6b1be28056722b0c8288204c0.camel@us.ibm.com> Message-ID: <66d72b9d-f1df-95b5-5cd3-a94adf6a1be9@strath.ac.uk> On 17/08/2023 19:38, Uwe Falke wrote: > > thanks for that, so it is clear RHEL 8.7 won't get any more updates. > Sure it's not an Extended Update Support (EUS) release. On the other hand 8.6 and 8.8 are. I believe that Scale is only supported on RHEL releases that are supported by Redhat. So right now in the 8.x version that is 8.6 with EUS and 8.8 which in due course will also be an EUS release. Given that Scale tends to be supported on new RHEL point releases in the Scale release or PTF following the RHEL release then as far as I can see the only way to be fully supported at all times is to stick to EUS releases. I would note that the elephant in the room is unless you have budget for genuine RHEL rather than a rebuild you need a plan to move to an alternative distribution. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From janfrode at tanso.net Fri Aug 18 08:19:11 2023 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 18 Aug 2023 09:19:11 +0200 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References:

Message-ID: If a key server go offline, scale will just go to the next one in the list -- and give a warning/error about it in mmhealth. Nothing should happen to the file system access. Also, you can tune how often scale needs to refresh the keys from the key server with encryptionKeyCacheExpiration. Setting it to 0 means that your nodes will only need to fetch the key when they mount the file system, or when you change policy. -jf On Thu, Aug 17, 2023 at 5:54?PM Alec wrote: > Yesterday I proposed treating the replicated key servers as 2 different > sets of servers. And having scale address two of the RKM servers by one > rkmid/tenant/devicegrp/client name, and having a second > rkmid/tenant/devicegrp/client name for the 2nd set of servers. > > So define the same cluster of key management servers in two separate > stanzas of RKM.conf, an upper and lower half. > > If we do that and key management team takes one set offline, everything > should work but scale would think one set of keys are offline and scream. > > I think we need an IBM ticket to help vet all that out. > > Alec > > On Thu, Aug 17, 2023, 8:11 AM Jan-Frode Myklebust > wrote: > >> >> Your second KMIP server don?t need to have an active replication >> relationship with the first one ? it just needs to contain the same MEK. So >> you could do a one time replication / copying between them, and they would >> not have to see each other anymore. >> >> I don?t think having them host different keys will work, as you won?t be >> able to fetch the second key from the one server your client is connected >> to, and then will be unable to encrypt with that key. >> >> From what I?ve seen of KMIP setups with Scale, it?s a stupidly trivial >> service. It?s just a server that will tell you the key when asked + some >> access control to make sure no one else gets it. Also MEKs never changes? >> unless you actively change them in the file system policy, and then you >> could just post the new key to all/both your independent key servers when >> you do the change. >> >> >> -jf >> >> ons. 16. aug. 2023 kl. 23:25 skrev Alec : >> >>> Ed >>> Thanks for the response, I wasn't aware of those two commands. I will >>> see if that unlocks a solution. I kind of need the test to work in a >>> production environment. So can't just be adding spare nodes onto the >>> cluster and forgetting with file systems. >>> >>> Unfortunately the logs don't indicate when a node has returned to >>> health. Only that it's in trouble but as we patch often we see these >>> regularly. >>> >>> >>> For the second question, we would add a 2nd MEK key to each file so that >>> two independent keys from two different RKM pools would be able to unlock >>> any file. This would give us two whole independent paths to encrypt and >>> decrypt a file. >>> >>> So I'm looking for a best practice example from IBM to indicate this so >>> we don't have a dependency on a single RKM environment. >>> >>> Alec >>> >>> >>> >>> On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: >>> >>>> > How can we verify that a key server is up and running when there are >>>> multiple key servers in an rkm pool serving a single key. >>>> >>>> >>>> >>>> Pretty simple. >>>> >>>> -Grab a compute node/client (and mark it offline if needed) unmount all >>>> encrypted File Systems. >>>> >>>> -Hack the RKM.conf to point to JUST the server you want to test (and >>>> maybe a backup) >>>> >>>> -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? >>>> >>>> -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a >>>> great command if you need to load new Certificates too) >>>> >>>> -Attempt to mount the encrypted FS, and then cat a few files. >>>> >>>> >>>> >>>> If you?ve not setup a 2nd server in your test you will see quarantine >>>> messages in the logs for a bad KMIP server. If it works, you can clear >>>> keys again and see how many were retrieved. >>>> >>>> >>>> >>>> >Is there any documentation or diagram officially from IBM that >>>> recommends having 2 keys from independent RKM environments for high >>>> availability as best practice that I could refer to? >>>> >>>> >>>> >>>> I am not an IBM-er? but I?m also not 100% sure what you are asking >>>> here. Two un-related SKLM setups? How would you sync the keys? How >>>> would this be better than multiple replicated servers? >>>> >>>> >>>> >>>> Ed Wahl >>>> >>>> Ohio Supercomputer Center >>>> >>>> >>>> >>>> *From:* gpfsug-discuss *On Behalf >>>> Of *Alec >>>> *Sent:* Wednesday, August 16, 2023 3:33 PM >>>> *To:* gpfsug main discussion list >>>> *Subject:* [gpfsug-discuss] RKM resilience questions testing and best >>>> practice >>>> >>>> >>>> >>>> Hello we are using a remote key server with GPFS I have two questions: >>>> First question: How can we verify that a key server is up and running when >>>> there are multiple key servers in an rkm pool serving a single key. The >>>> scenario is after maintenance >>>> >>>> Hello we are using a remote key server with GPFS I have two questions: >>>> >>>> >>>> >>>> First question: >>>> >>>> How can we verify that a key server is up and running when there are >>>> multiple key servers in an rkm pool serving a single key. >>>> >>>> >>>> >>>> The scenario is after maintenance or periodically we want to verify >>>> that all member of the pool are in service. >>>> >>>> >>>> >>>> Second question is: >>>> >>>> Is there any documentation or diagram officially from IBM that >>>> recommends having 2 keys from independent RKM environments for high >>>> availability as best practice that I could refer to? >>>> >>>> >>>> >>>> Alec >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Aug 18 09:51:13 2023 From: anacreo at gmail.com (Alec) Date: Fri, 18 Aug 2023 01:51:13 -0700 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References:

Message-ID: Okay so how do you know the backup key servers are actually functioning until you try to fail to them? We need a way to know they are actually working. Setting encryptionKeyCacheExpiration to 0 would actually help in that we shouldn't go down once we are up. But it would suck if we bounce and then find out none of the key servers are working, then we have the same disaster but just a different time to experience it. Spectrum Scale honestly needs an option to probe and complain about the backup RKM servers. Or if we could run a command to validate that all keys are visible on all key servers that could work as well. Alec On Fri, Aug 18, 2023, 12:22 AM Jan-Frode Myklebust wrote: > If a key server go offline, scale will just go to the next one in the list > -- and give a warning/error about it in mmhealth. Nothing should happen to > the file system access. Also, you can tune how often scale needs to refresh > the keys from the key server with encryptionKeyCacheExpiration. Setting it > to 0 means that your nodes will only need to fetch the key when they mount > the file system, or when you change policy. > > > -jf > > On Thu, Aug 17, 2023 at 5:54?PM Alec wrote: > >> Yesterday I proposed treating the replicated key servers as 2 different >> sets of servers. And having scale address two of the RKM servers by one >> rkmid/tenant/devicegrp/client name, and having a second >> rkmid/tenant/devicegrp/client name for the 2nd set of servers. >> >> So define the same cluster of key management servers in two separate >> stanzas of RKM.conf, an upper and lower half. >> >> If we do that and key management team takes one set offline, everything >> should work but scale would think one set of keys are offline and scream. >> >> I think we need an IBM ticket to help vet all that out. >> >> Alec >> >> On Thu, Aug 17, 2023, 8:11 AM Jan-Frode Myklebust >> wrote: >> >>> >>> Your second KMIP server don?t need to have an active replication >>> relationship with the first one ? it just needs to contain the same MEK. So >>> you could do a one time replication / copying between them, and they would >>> not have to see each other anymore. >>> >>> I don?t think having them host different keys will work, as you won?t be >>> able to fetch the second key from the one server your client is connected >>> to, and then will be unable to encrypt with that key. >>> >>> From what I?ve seen of KMIP setups with Scale, it?s a stupidly trivial >>> service. It?s just a server that will tell you the key when asked + some >>> access control to make sure no one else gets it. Also MEKs never changes? >>> unless you actively change them in the file system policy, and then you >>> could just post the new key to all/both your independent key servers when >>> you do the change. >>> >>> >>> -jf >>> >>> ons. 16. aug. 2023 kl. 23:25 skrev Alec : >>> >>>> Ed >>>> Thanks for the response, I wasn't aware of those two commands. I >>>> will see if that unlocks a solution. I kind of need the test to work in a >>>> production environment. So can't just be adding spare nodes onto the >>>> cluster and forgetting with file systems. >>>> >>>> Unfortunately the logs don't indicate when a node has returned to >>>> health. Only that it's in trouble but as we patch often we see these >>>> regularly. >>>> >>>> >>>> For the second question, we would add a 2nd MEK key to each file so >>>> that two independent keys from two different RKM pools would be able to >>>> unlock any file. This would give us two whole independent paths to encrypt >>>> and decrypt a file. >>>> >>>> So I'm looking for a best practice example from IBM to indicate this so >>>> we don't have a dependency on a single RKM environment. >>>> >>>> Alec >>>> >>>> >>>> >>>> On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: >>>> >>>>> > How can we verify that a key server is up and running when there are >>>>> multiple key servers in an rkm pool serving a single key. >>>>> >>>>> >>>>> >>>>> Pretty simple. >>>>> >>>>> -Grab a compute node/client (and mark it offline if needed) unmount >>>>> all encrypted File Systems. >>>>> >>>>> -Hack the RKM.conf to point to JUST the server you want to test (and >>>>> maybe a backup) >>>>> >>>>> -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? >>>>> >>>>> -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a >>>>> great command if you need to load new Certificates too) >>>>> >>>>> -Attempt to mount the encrypted FS, and then cat a few files. >>>>> >>>>> >>>>> >>>>> If you?ve not setup a 2nd server in your test you will see quarantine >>>>> messages in the logs for a bad KMIP server. If it works, you can clear >>>>> keys again and see how many were retrieved. >>>>> >>>>> >>>>> >>>>> >Is there any documentation or diagram officially from IBM that >>>>> recommends having 2 keys from independent RKM environments for high >>>>> availability as best practice that I could refer to? >>>>> >>>>> >>>>> >>>>> I am not an IBM-er? but I?m also not 100% sure what you are asking >>>>> here. Two un-related SKLM setups? How would you sync the keys? How >>>>> would this be better than multiple replicated servers? >>>>> >>>>> >>>>> >>>>> Ed Wahl >>>>> >>>>> Ohio Supercomputer Center >>>>> >>>>> >>>>> >>>>> *From:* gpfsug-discuss *On Behalf >>>>> Of *Alec >>>>> *Sent:* Wednesday, August 16, 2023 3:33 PM >>>>> *To:* gpfsug main discussion list >>>>> *Subject:* [gpfsug-discuss] RKM resilience questions testing and best >>>>> practice >>>>> >>>>> >>>>> >>>>> Hello we are using a remote key server with GPFS I have two questions: >>>>> First question: How can we verify that a key server is up and running when >>>>> there are multiple key servers in an rkm pool serving a single key. The >>>>> scenario is after maintenance >>>>> >>>>> Hello we are using a remote key server with GPFS I have two questions: >>>>> >>>>> >>>>> >>>>> First question: >>>>> >>>>> How can we verify that a key server is up and running when there are >>>>> multiple key servers in an rkm pool serving a single key. >>>>> >>>>> >>>>> >>>>> The scenario is after maintenance or periodically we want to verify >>>>> that all member of the pool are in service. >>>>> >>>>> >>>>> >>>>> Second question is: >>>>> >>>>> Is there any documentation or diagram officially from IBM that >>>>> recommends having 2 keys from independent RKM environments for high >>>>> availability as best practice that I could refer to? >>>>> >>>>> >>>>> >>>>> Alec >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Aug 18 10:09:38 2023 From: anacreo at gmail.com (Alec) Date: Fri, 18 Aug 2023 02:09:38 -0700 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References:

Message-ID: Hmm.. IBM mentions in 5.1.2 documentation that for performance we could just rotate the order of the keys to load balance the keys.. however because of server maintenance I would imagine all the nodes end up on the same server eventually. But I think I see a solution. If I just define 4 additional RKM configs and each one with one key server and don't do anything else with it. I am guessing that GPFS is going to monitor and complain about them if they go down. And that is easy to test... So RKM.conf with RKM_PROD { kmipServerUri1 = node1 kmipServerUri2 = node2 kmipServerUri3 = node3 kmipServerUri4 = node4 } RKM_PROD_T1 { kmipServerUri = node1 } RKM_PROD_T2 { kmipServerUri = node2 } RKM_PROD_T3 { kmipServerUri = node3 } RKM_PROD_T4 { kmipServerUri = node4 } I could then define 4 files with a key from each test RKM_PROD_T? group to monitor the availability of the individual key servers. Call it Alec's trust but verify HA. On Fri, Aug 18, 2023, 1:51 AM Alec wrote: > Okay so how do you know the backup key servers are actually functioning > until you try to fail to them? We need a way to know they are actually > working. > > Setting encryptionKeyCacheExpiration to 0 would actually help in that we > shouldn't go down once we are up. But it would suck if we bounce and then > find out none of the key servers are working, then we have the same > disaster but just a different time to experience it. > > Spectrum Scale honestly needs an option to probe and complain about the > backup RKM servers. Or if we could run a command to validate that all > keys are visible on all key servers that could work as well. > > Alec > > On Fri, Aug 18, 2023, 12:22 AM Jan-Frode Myklebust > wrote: > >> If a key server go offline, scale will just go to the next one in the >> list -- and give a warning/error about it in mmhealth. Nothing should >> happen to the file system access. Also, you can tune how often scale needs >> to refresh the keys from the key server with encryptionKeyCacheExpiration. >> Setting it to 0 means that your nodes will only need to fetch the key when >> they mount the file system, or when you change policy. >> >> >> -jf >> >> On Thu, Aug 17, 2023 at 5:54?PM Alec wrote: >> >>> Yesterday I proposed treating the replicated key servers as 2 different >>> sets of servers. And having scale address two of the RKM servers by one >>> rkmid/tenant/devicegrp/client name, and having a second >>> rkmid/tenant/devicegrp/client name for the 2nd set of servers. >>> >>> So define the same cluster of key management servers in two separate >>> stanzas of RKM.conf, an upper and lower half. >>> >>> If we do that and key management team takes one set offline, everything >>> should work but scale would think one set of keys are offline and scream. >>> >>> I think we need an IBM ticket to help vet all that out. >>> >>> Alec >>> >>> On Thu, Aug 17, 2023, 8:11 AM Jan-Frode Myklebust >>> wrote: >>> >>>> >>>> Your second KMIP server don?t need to have an active replication >>>> relationship with the first one ? it just needs to contain the same MEK. So >>>> you could do a one time replication / copying between them, and they would >>>> not have to see each other anymore. >>>> >>>> I don?t think having them host different keys will work, as you won?t >>>> be able to fetch the second key from the one server your client is >>>> connected to, and then will be unable to encrypt with that key. >>>> >>>> From what I?ve seen of KMIP setups with Scale, it?s a stupidly trivial >>>> service. It?s just a server that will tell you the key when asked + some >>>> access control to make sure no one else gets it. Also MEKs never changes? >>>> unless you actively change them in the file system policy, and then you >>>> could just post the new key to all/both your independent key servers when >>>> you do the change. >>>> >>>> >>>> -jf >>>> >>>> ons. 16. aug. 2023 kl. 23:25 skrev Alec : >>>> >>>>> Ed >>>>> Thanks for the response, I wasn't aware of those two commands. I >>>>> will see if that unlocks a solution. I kind of need the test to work in a >>>>> production environment. So can't just be adding spare nodes onto the >>>>> cluster and forgetting with file systems. >>>>> >>>>> Unfortunately the logs don't indicate when a node has returned to >>>>> health. Only that it's in trouble but as we patch often we see these >>>>> regularly. >>>>> >>>>> >>>>> For the second question, we would add a 2nd MEK key to each file so >>>>> that two independent keys from two different RKM pools would be able to >>>>> unlock any file. This would give us two whole independent paths to encrypt >>>>> and decrypt a file. >>>>> >>>>> So I'm looking for a best practice example from IBM to indicate this >>>>> so we don't have a dependency on a single RKM environment. >>>>> >>>>> Alec >>>>> >>>>> >>>>> >>>>> On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: >>>>> >>>>>> > How can we verify that a key server is up and running when there >>>>>> are multiple key servers in an rkm pool serving a single key. >>>>>> >>>>>> >>>>>> >>>>>> Pretty simple. >>>>>> >>>>>> -Grab a compute node/client (and mark it offline if needed) unmount >>>>>> all encrypted File Systems. >>>>>> >>>>>> -Hack the RKM.conf to point to JUST the server you want to test (and >>>>>> maybe a backup) >>>>>> >>>>>> -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? >>>>>> >>>>>> -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is a >>>>>> great command if you need to load new Certificates too) >>>>>> >>>>>> -Attempt to mount the encrypted FS, and then cat a few files. >>>>>> >>>>>> >>>>>> >>>>>> If you?ve not setup a 2nd server in your test you will see >>>>>> quarantine messages in the logs for a bad KMIP server. If it works, you >>>>>> can clear keys again and see how many were retrieved. >>>>>> >>>>>> >>>>>> >>>>>> >Is there any documentation or diagram officially from IBM that >>>>>> recommends having 2 keys from independent RKM environments for high >>>>>> availability as best practice that I could refer to? >>>>>> >>>>>> >>>>>> >>>>>> I am not an IBM-er? but I?m also not 100% sure what you are asking >>>>>> here. Two un-related SKLM setups? How would you sync the keys? How >>>>>> would this be better than multiple replicated servers? >>>>>> >>>>>> >>>>>> >>>>>> Ed Wahl >>>>>> >>>>>> Ohio Supercomputer Center >>>>>> >>>>>> >>>>>> >>>>>> *From:* gpfsug-discuss *On >>>>>> Behalf Of *Alec >>>>>> *Sent:* Wednesday, August 16, 2023 3:33 PM >>>>>> *To:* gpfsug main discussion list >>>>>> *Subject:* [gpfsug-discuss] RKM resilience questions testing and >>>>>> best practice >>>>>> >>>>>> >>>>>> >>>>>> Hello we are using a remote key server with GPFS I have two >>>>>> questions: First question: How can we verify that a key server is up and >>>>>> running when there are multiple key servers in an rkm pool serving a single >>>>>> key. The scenario is after maintenance >>>>>> >>>>>> Hello we are using a remote key server with GPFS I have two questions: >>>>>> >>>>>> >>>>>> >>>>>> First question: >>>>>> >>>>>> How can we verify that a key server is up and running when there are >>>>>> multiple key servers in an rkm pool serving a single key. >>>>>> >>>>>> >>>>>> >>>>>> The scenario is after maintenance or periodically we want to verify >>>>>> that all member of the pool are in service. >>>>>> >>>>>> >>>>>> >>>>>> Second question is: >>>>>> >>>>>> Is there any documentation or diagram officially from IBM that >>>>>> recommends having 2 keys from independent RKM environments for high >>>>>> availability as best practice that I could refer to? >>>>>> >>>>>> >>>>>> >>>>>> Alec >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at gpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Fri Aug 18 14:01:33 2023 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 18 Aug 2023 15:01:33 +0200 Subject: [gpfsug-discuss] RKM resilience questions testing and best practice In-Reply-To: References:

Message-ID: Maybe give a vote for this one: https://ideas.ibm.com/ideas/GPFS-I-652 Encryption - tool to check health status of all configured encryption > servers > > When Encryption is configured on a file system. the key server must be > available to allow user file access. When the key server fails, data access > is lost. We need a tools that can be run to check key server health, check > retrieval of keys, and communication health. This should be independent of > mmfsd. Inclusion in mmhealth would be ideal. > Planned for future release... -jf On Fri, Aug 18, 2023 at 11:11?AM Alec wrote: > Hmm.. IBM mentions in 5.1.2 documentation that for performance we could > just rotate the order of the keys to load balance the keys.. however > because of server maintenance I would imagine all the nodes end up on the > same server eventually. > > But I think I see a solution. If I just define 4 additional RKM configs > and each one with one key server and don't do anything else with it. I am > guessing that GPFS is going to monitor and complain about them if they go > down. And that is easy to test... > > > So RKM.conf with > RKM_PROD { > kmipServerUri1 = node1 > kmipServerUri2 = node2 > kmipServerUri3 = node3 > kmipServerUri4 = node4 > } > RKM_PROD_T1 { > kmipServerUri = node1 > } > RKM_PROD_T2 { > kmipServerUri = node2 > } > RKM_PROD_T3 { > kmipServerUri = node3 > } > RKM_PROD_T4 { > kmipServerUri = node4 > } > > I could then define 4 files with a key from each test RKM_PROD_T? group to > monitor the availability of the individual key servers. > > Call it Alec's trust but verify HA. > > On Fri, Aug 18, 2023, 1:51 AM Alec wrote: > >> Okay so how do you know the backup key servers are actually functioning >> until you try to fail to them? We need a way to know they are actually >> working. >> >> Setting encryptionKeyCacheExpiration to 0 would actually help in that we >> shouldn't go down once we are up. But it would suck if we bounce and then >> find out none of the key servers are working, then we have the same >> disaster but just a different time to experience it. >> >> Spectrum Scale honestly needs an option to probe and complain about the >> backup RKM servers. Or if we could run a command to validate that all >> keys are visible on all key servers that could work as well. >> >> Alec >> >> On Fri, Aug 18, 2023, 12:22 AM Jan-Frode Myklebust >> wrote: >> >>> If a key server go offline, scale will just go to the next one in the >>> list -- and give a warning/error about it in mmhealth. Nothing should >>> happen to the file system access. Also, you can tune how often scale needs >>> to refresh the keys from the key server with encryptionKeyCacheExpiration. >>> Setting it to 0 means that your nodes will only need to fetch the key when >>> they mount the file system, or when you change policy. >>> >>> >>> -jf >>> >>> On Thu, Aug 17, 2023 at 5:54?PM Alec wrote: >>> >>>> Yesterday I proposed treating the replicated key servers as 2 different >>>> sets of servers. And having scale address two of the RKM servers by one >>>> rkmid/tenant/devicegrp/client name, and having a second >>>> rkmid/tenant/devicegrp/client name for the 2nd set of servers. >>>> >>>> So define the same cluster of key management servers in two separate >>>> stanzas of RKM.conf, an upper and lower half. >>>> >>>> If we do that and key management team takes one set offline, everything >>>> should work but scale would think one set of keys are offline and scream. >>>> >>>> I think we need an IBM ticket to help vet all that out. >>>> >>>> Alec >>>> >>>> On Thu, Aug 17, 2023, 8:11 AM Jan-Frode Myklebust >>>> wrote: >>>> >>>>> >>>>> Your second KMIP server don?t need to have an active replication >>>>> relationship with the first one ? it just needs to contain the same MEK. So >>>>> you could do a one time replication / copying between them, and they would >>>>> not have to see each other anymore. >>>>> >>>>> I don?t think having them host different keys will work, as you won?t >>>>> be able to fetch the second key from the one server your client is >>>>> connected to, and then will be unable to encrypt with that key. >>>>> >>>>> From what I?ve seen of KMIP setups with Scale, it?s a stupidly trivial >>>>> service. It?s just a server that will tell you the key when asked + some >>>>> access control to make sure no one else gets it. Also MEKs never changes? >>>>> unless you actively change them in the file system policy, and then you >>>>> could just post the new key to all/both your independent key servers when >>>>> you do the change. >>>>> >>>>> >>>>> -jf >>>>> >>>>> ons. 16. aug. 2023 kl. 23:25 skrev Alec : >>>>> >>>>>> Ed >>>>>> Thanks for the response, I wasn't aware of those two commands. I >>>>>> will see if that unlocks a solution. I kind of need the test to work in a >>>>>> production environment. So can't just be adding spare nodes onto the >>>>>> cluster and forgetting with file systems. >>>>>> >>>>>> Unfortunately the logs don't indicate when a node has returned to >>>>>> health. Only that it's in trouble but as we patch often we see these >>>>>> regularly. >>>>>> >>>>>> >>>>>> For the second question, we would add a 2nd MEK key to each file so >>>>>> that two independent keys from two different RKM pools would be able to >>>>>> unlock any file. This would give us two whole independent paths to encrypt >>>>>> and decrypt a file. >>>>>> >>>>>> So I'm looking for a best practice example from IBM to indicate this >>>>>> so we don't have a dependency on a single RKM environment. >>>>>> >>>>>> Alec >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward wrote: >>>>>> >>>>>>> > How can we verify that a key server is up and running when there >>>>>>> are multiple key servers in an rkm pool serving a single key. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Pretty simple. >>>>>>> >>>>>>> -Grab a compute node/client (and mark it offline if needed) unmount >>>>>>> all encrypted File Systems. >>>>>>> >>>>>>> -Hack the RKM.conf to point to JUST the server you want to test (and >>>>>>> maybe a backup) >>>>>>> >>>>>>> -Clear all keys: ?/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ? >>>>>>> >>>>>>> -Reload the RKM.conf: ?/usr/lpp/mmfs/bin/tsloadikm run? (this is >>>>>>> a great command if you need to load new Certificates too) >>>>>>> >>>>>>> -Attempt to mount the encrypted FS, and then cat a few files. >>>>>>> >>>>>>> >>>>>>> >>>>>>> If you?ve not setup a 2nd server in your test you will see >>>>>>> quarantine messages in the logs for a bad KMIP server. If it works, you >>>>>>> can clear keys again and see how many were retrieved. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >Is there any documentation or diagram officially from IBM that >>>>>>> recommends having 2 keys from independent RKM environments for high >>>>>>> availability as best practice that I could refer to? >>>>>>> >>>>>>> >>>>>>> >>>>>>> I am not an IBM-er? but I?m also not 100% sure what you are asking >>>>>>> here. Two un-related SKLM setups? How would you sync the keys? How >>>>>>> would this be better than multiple replicated servers? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Ed Wahl >>>>>>> >>>>>>> Ohio Supercomputer Center >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From:* gpfsug-discuss *On >>>>>>> Behalf Of *Alec >>>>>>> *Sent:* Wednesday, August 16, 2023 3:33 PM >>>>>>> *To:* gpfsug main discussion list >>>>>>> *Subject:* [gpfsug-discuss] RKM resilience questions testing and >>>>>>> best practice >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello we are using a remote key server with GPFS I have two >>>>>>> questions: First question: How can we verify that a key server is up and >>>>>>> running when there are multiple key servers in an rkm pool serving a single >>>>>>> key. The scenario is after maintenance >>>>>>> >>>>>>> Hello we are using a remote key server with GPFS I have two >>>>>>> questions: >>>>>>> >>>>>>> >>>>>>> >>>>>>> First question: >>>>>>> >>>>>>> How can we verify that a key server is up and running when there are >>>>>>> multiple key servers in an rkm pool serving a single key. >>>>>>> >>>>>>> >>>>>>> >>>>>>> The scenario is after maintenance or periodically we want to verify >>>>>>> that all member of the pool are in service. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Second question is: >>>>>>> >>>>>>> Is there any documentation or diagram officially from IBM that >>>>>>> recommends having 2 keys from independent RKM environments for high >>>>>>> availability as best practice that I could refer to? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Alec >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gpfsug-discuss mailing list >>>>>>> gpfsug-discuss at gpfsug.org >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at gpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >> _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at hpe.com Mon Aug 21 18:50:02 2023 From: daniel.kidger at hpe.com (Kidger, Daniel) Date: Mon, 21 Aug 2023 17:50:02 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? Message-ID: I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be "routed" to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ... ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com [cid:image001.png at 01D9D45F.FC6CCA30] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2541 bytes Desc: image001.png URL: From novosirj at rutgers.edu Mon Aug 21 19:07:03 2023 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 21 Aug 2023 18:07:03 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: References: Message-ID: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> If I understand what you?re asking correctly, we used to have a cluster that did this. GPFS was on Infininiband, some of the compute nodes were too, and the rest were on Omnipath. There were routers in between with both types. Sent from my iPhone On Aug 21, 2023, at 13:55, Kidger, Daniel wrote: ? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2541 bytes Desc: image001.png URL: From novosirj at rutgers.edu Mon Aug 21 19:07:03 2023 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 21 Aug 2023 18:07:03 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: References: Message-ID: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> If I understand what you?re asking correctly, we used to have a cluster that did this. GPFS was on Infininiband, some of the compute nodes were too, and the rest were on Omnipath. There were routers in between with both types. Sent from my iPhone On Aug 21, 2023, at 13:55, Kidger, Daniel wrote: ? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2541 bytes Desc: image001.png URL: From daniel.kidger at hpe.com Mon Aug 21 19:43:03 2023 From: daniel.kidger at hpe.com (Kidger, Daniel) Date: Mon, 21 Aug 2023 18:43:03 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> References: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> Message-ID: Ryan, This sounds very interesting. Do you have more details or references of how they connected together, and what any pain points were? Daniel From: gpfsug-discuss On Behalf Of Ryan Novosielski Sent: 21 August 2023 19:07 To: gpfsug main discussion list Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Joining RDMA over different networks? If I understand what you?re asking correctly, we used to have a cluster that did this. GPFS was on Infininiband, some of the compute nodes were too, and the rest were on Omnipath. There were routers in between with both types. Sent from my iPhone On Aug 21, 2023, at 13:55, Kidger, Daniel > wrote: ? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at hpe.com Mon Aug 21 19:43:03 2023 From: daniel.kidger at hpe.com (Kidger, Daniel) Date: Mon, 21 Aug 2023 18:43:03 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> References: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> Message-ID: Ryan, This sounds very interesting. Do you have more details or references of how they connected together, and what any pain points were? Daniel From: gpfsug-discuss On Behalf Of Ryan Novosielski Sent: 21 August 2023 19:07 To: gpfsug main discussion list Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Joining RDMA over different networks? If I understand what you?re asking correctly, we used to have a cluster that did this. GPFS was on Infininiband, some of the compute nodes were too, and the rest were on Omnipath. There were routers in between with both types. Sent from my iPhone On Aug 21, 2023, at 13:55, Kidger, Daniel > wrote: ? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Mon Aug 21 21:03:31 2023 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 21 Aug 2023 20:03:31 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: References: Message-ID: I believe the new nVidia name for this type of product for IB->Ethernet is ?skyway?. Older types of this will surely get discussed on the list. Gateway Systems and Routers | NVIDIA Ed Wahl Ohio Supercomputer Center From: gpfsug-discuss On Behalf Of Kidger, Daniel Sent: Monday, August 21, 2023 1:50 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Joining RDMA over different networks? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com [cid:image001.png at 01D9D448.FE55CCF0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2541 bytes Desc: image001.png URL: From ewahl at osc.edu Mon Aug 21 21:03:31 2023 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 21 Aug 2023 20:03:31 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: References: Message-ID: I believe the new nVidia name for this type of product for IB->Ethernet is ?skyway?. Older types of this will surely get discussed on the list. Gateway Systems and Routers | NVIDIA Ed Wahl Ohio Supercomputer Center From: gpfsug-discuss On Behalf Of Kidger, Daniel Sent: Monday, August 21, 2023 1:50 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Joining RDMA over different networks? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com [cid:image001.png at 01D9D448.FE55CCF0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2541 bytes Desc: image001.png URL: From novosirj at rutgers.edu Tue Aug 22 00:27:15 2023 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 21 Aug 2023 23:27:15 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: References: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> Message-ID: <6555765F-3FA5-4EC6-B1D5-1F2E2E023541@rutgers.edu> I still have the guide from that system, and I saved some of the routing scripts and what not. But really, it wasn?t much more complicated than Ethernet routing. The routing nodes, I guess obviously, had both Omnipath and Infiniband interfaces. Compute knifes themselves I believe used a supervisord script, if I?m remembering that name right, to try to balance out which routing nide ione would use as a gateway. There were two as it was configured when I got to it, but a larger number was possible. It seems to me that there was probably a better way to do that, but it did work. The read/write rates were not as fast as our fully Inifniband clusters, but they were fast enough. The cluster was Caliburn, which was in the top 500 for some time, so there may be some papers and whatnot written on it before we inherited it. If there?s something specific you want to know, I could probably dig it up. Sent from my iPhone On Aug 21, 2023, at 14:48, Kidger, Daniel wrote: ? Ryan, This sounds very interesting. Do you have more details or references of how they connected together, and what any pain points were? Daniel From: gpfsug-discuss On Behalf Of Ryan Novosielski Sent: 21 August 2023 19:07 To: gpfsug main discussion list Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Joining RDMA over different networks? If I understand what you?re asking correctly, we used to have a cluster that did this. GPFS was on Infininiband, some of the compute nodes were too, and the rest were on Omnipath. There were routers in between with both types. Sent from my iPhone On Aug 21, 2023, at 13:55, Kidger, Daniel > wrote: ? I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be ?routed? to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, ? ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266 hpe.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Tue Aug 22 00:27:15 2023 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 21 Aug 2023 23:27:15 +0000 Subject: [gpfsug-discuss] Joining RDMA over different networks? In-Reply-To: References: <9AE82616-B931-478A-92DB-0E484DB93B60@rutgers.edu> Message-ID: <6555765F-3FA5-4EC6-B1D5-1F2E2E023541@rutgers.edu> I still have the guide from that system, and I saved some of the routing scripts and what not. But really, it wasn?t much more complicated than Ethernet routing. The routing nodes, I guess obviously, had both Omnipath and Infiniband interfaces. Compute knifes themselves I believe used a supervisord script, if I?m remembering that name right, to try to balance out which routing nide ione would use as a gateway. There were two as it was configured when I got to it, but a larger number was possible. It seems to me that there was probably a better way to do that, but it did work. The read/write rates were not as fast as our fully Inifniband clusters, but they were fast enough. The cluster was Caliburn, which was in the top 500 for some time, so there may be some papers and whatnot written on it before we inherited it. If there?s something specific you want to know, I could probably dig it up. Sent from my iPhone On Aug 21, 2023, at 14:48, Kidger, Daniel