From TROPPENS at de.ibm.com Fri Jan 5 09:06:36 2024 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Fri, 5 Jan 2024 09:06:36 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Save_the_date_=E2=80=93_German_User_M?= =?utf-8?q?eeting_2024?= Message-ID: Greetings and Happy New Year! The German User Meeting will be held first week of March 2024 in Sindelfingen, Germany. There will be a New User Day on March 5, 2024, followed by the two-day regular Storage Scale User Meeting. Details on agenda and registration will be provided later. Please join us! Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Jan 8 09:18:56 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Jan 2024 09:18:56 +0000 Subject: [gpfsug-discuss] GUI server hardware requirements Message-ID: <1930cbb8-e033-4b42-9689-323431d496b5@strath.ac.uk> Never deployed these before, well slight lie did deploy a GUI server back in the 3.x days but it didn't do anything useful. I am now looking to deploy a pair of GUI servers for the ReST API support so we can run Kubernetes workloads. The CPU and RAM requirements seem to be quite low. I have a some old spare servers that can be pressed into service for now that more than meet the these requirements with 8 cores and 144GB of RAM. However it just vaguely says "local PV for DB". At the moment the two servers have a RAID6 array of 300GB 10k disks giving ~1TB of usable. Is that sufficient space and will it be performant enough? Should I look to swap them for a RAID 1 of SSD's and what sort of capacity is needed for this local DB? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Mon Jan 8 09:37:17 2024 From: abeattie at au1.ibm.com (ANDREW BEATTIE) Date: Mon, 8 Jan 2024 09:37:17 +0000 Subject: [gpfsug-discuss] GUI server hardware requirements In-Reply-To: <1930cbb8-e033-4b42-9689-323431d496b5@strath.ac.uk> References: <1930cbb8-e033-4b42-9689-323431d496b5@strath.ac.uk> Message-ID: Would be plenty for the gui role as it stands today, The ESS EMS server does not use SSD at all, and it runs the GUI for an ESS cluster. However it?s worth noting that there are changes coming to the GUI / rest API as part of the 5.2.0 / 5.2.1 code in 1H 2024 You might want to reach out to your local IBM resource and ask them to put you in touch with one of the developers for the new experience to explain what?s coming in more detail. Regards, Andrew Beattie Technical Sales Specialist - Storage for Big Data & AI IBM Australia and New Zealand P. +61 421 337 927 E. abeattie at au1.ibm.com Twitter: AndrewJBeattie LinkedIn: ________________________________ From: gpfsug-discuss on behalf of Jonathan Buzzard Sent: Monday, January 8, 2024 7:18:56 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] GUI server hardware requirements Never deployed these before, well slight lie did deploy a GUI server back in the 3.x days but it didn't do anything useful. I am now looking to deploy a pair of GUI servers for the ReST API support so we can run Kubernetes workloads. The CPU and RAM requirements seem to be quite low. I have a some old spare servers that can be pressed into service for now that more than meet the these requirements with 8 cores and 144GB of RAM. However it just vaguely says "local PV for DB". At the moment the two servers have a RAID6 array of 300GB 10k disks giving ~1TB of usable. Is that sufficient space and will it be performant enough? Should I look to swap them for a RAID 1 of SSD's and what sort of capacity is needed for this local DB? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Jan 8 17:31:31 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Jan 2024 17:31:31 +0000 Subject: [gpfsug-discuss] GUI server hardware requirements In-Reply-To: References: <1930cbb8-e033-4b42-9689-323431d496b5@strath.ac.uk> Message-ID: <068bfa5a-bc1a-4411-adb2-b93f96390170@strath.ac.uk> On 08/01/2024 09:37, ANDREW BEATTIE wrote: > Would be plenty for the gui role as it stands today, > The ESS EMS server does not use SSD at all, and it runs the GUI for an > ESS cluster. > I care not about the GUI role, only the RestAPI support. I am old school and take the view that if you need a GUI to manage your storage then step away from the keyboard :-) > However it?s worth noting that there are changes coming to the GUI / > rest API as part of the 5.2.0 / 5.2.1 code in 1H 2024 > > You might want to reach out to your local IBM resource and ask them to > put you in touch with one of the developers for the new experience to > explain what?s coming in more detail. > Here is a question can you replace the GUI/RestAPI servers live without service interruption? Say just spin up a couple more and then retire the old ones. Or replace them one at a time? As it stands the machines are pretty old anyway and are spare because firstly the CPU is not supported in ESXi 7+, and then SAS adaptor is not supported in RHEL8+. Hence they are spare. However they work just fine under Ubuntu LTS 22.04 and we are likely to move away from RHEL after RedHat's stunt last year. However I would want to be able to replace them without downtime if required and as funds allow. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at hpe.com Mon Jan 8 18:50:45 2024 From: daniel.kidger at hpe.com (Kidger, Daniel) Date: Mon, 8 Jan 2024 18:50:45 +0000 Subject: [gpfsug-discuss] GUI server hardware requirements In-Reply-To: <068bfa5a-bc1a-4411-adb2-b93f96390170@strath.ac.uk> References: <1930cbb8-e033-4b42-9689-323431d496b5@strath.ac.uk> <068bfa5a-bc1a-4411-adb2-b93f96390170@strath.ac.uk> Message-ID: > Here is a question can you replace the GUI/RestAPI servers live without service interruption? >Say just spin up a couple more and then retire the old ones. Or replace them one at a time? I do not see why not, I have certainly had a system where the GUI was originally running on a storage node, then later a dedicated server was bought and spun up, and the GUI role moved over to it without an outage Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266?? hpe.com -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Monday, January 8, 2024 5:32 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] GUI server hardware requirements On 08/01/2024 09:37, ANDREW BEATTIE wrote: > Would be plenty for the gui role as it stands today, The ESS EMS > server does not use SSD at all, and it runs the GUI for an ESS > cluster. > I care not about the GUI role, only the RestAPI support. I am old school and take the view that if you need a GUI to manage your storage then step away from the keyboard :-) > However it?s worth noting that there are changes coming to the GUI / > rest API as part of the 5.2.0 / 5.2.1 code in 1H 2024 > > You might want to reach out to your local IBM resource and ask them to > put you in touch with one of the developers for the new experience to > explain what?s coming in more detail. > Here is a question can you replace the GUI/RestAPI servers live without service interruption? Say just spin up a couple more and then retire the old ones. Or replace them one at a time? As it stands the machines are pretty old anyway and are spare because firstly the CPU is not supported in ESXi 7+, and then SAS adaptor is not supported in RHEL8+. Hence they are spare. However they work just fine under Ubuntu LTS 22.04 and we are likely to move away from RHEL after RedHat's stunt last year. However I would want to be able to replace them without downtime if required and as funds allow. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From TROPPENS at de.ibm.com Tue Jan 9 18:34:52 2024 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Tue, 9 Jan 2024 18:34:52 +0000 Subject: [gpfsug-discuss] Chart Decks of SC23 Storage Scale User Group Meeting are now available Message-ID: Greetings, the chart decks of the Storage Scale user meeting along SC23 are now available. Many thanks to all speakers and participants for this successful event. This would not be possible without volunteers to present and an active and engaged audience. https://www.spectrumscaleug.org/chart-decks-of-sc23-storage-scale-user-group-meeting-are-now-available/ Please join our 2024 meetings. I am looking forward to meeting many of you in person there. Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From harr1 at llnl.gov Wed Jan 10 19:50:14 2024 From: harr1 at llnl.gov (Cameron Harr) Date: Wed, 10 Jan 2024 11:50:14 -0800 Subject: [gpfsug-discuss] Was 5.1.8.1 pulled? Message-ID: Does anyone know if Scale 5.1.8.1 was pulled for some reason? I have a local download of Scale 5.1.8.1 and went to check if there was something newer in my Passport account, but was surprised to see that 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 earlier because I downloaded it in July, so is 5.1.8.1 still safe to use? Is this simply a problem with the IBM Passport site? Thanks, Cameron From gcorneau at us.ibm.com Wed Jan 10 20:29:11 2024 From: gcorneau at us.ibm.com (Glen Corneau) Date: Wed, 10 Jan 2024 20:29:11 +0000 Subject: [gpfsug-discuss] Was 5.1.8.1 pulled? In-Reply-To: References: Message-ID: I still find 5.1.8.1 in FixCentral: https://www.ibm.com/support/fixcentral/swg/quickorder?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.8&platform=All&function=all&source=fc ________________________________ From: gpfsug-discuss on behalf of Cameron Harr Sent: Wednesday, January 10, 2024 13:50 To: gpfsug-discuss Subject: [EXTERNAL] [gpfsug-discuss] Was 5.1.8.1 pulled? Does anyone know if Scale 5.1.8.1 was pulled for some reason? I have a local download of Scale 5.1.8.1 and went to check if there was something newer in my Passport account, but was surprised to see that 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 earlier because I downloaded it in July, so is 5.1.8.1 still safe to use? Is this simply a problem with the IBM Passport site? Thanks, Cameron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Jan 10 20:38:13 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 10 Jan 2024 20:38:13 +0000 Subject: [gpfsug-discuss] Was 5.1.8.1 pulled? In-Reply-To: References: Message-ID: There?s at least a 5.1.8.2 at this point, though, so no reason to run 5.1.8.1 that I?m aware of. On Jan 10, 2024, at 15:29, Glen Corneau wrote: I still find 5.1.8.1 in FixCentral: https://www.ibm.com/support/fixcentral/swg/quickorder?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.8&platform=All&function=all&source=fc ________________________________ From: gpfsug-discuss on behalf of Cameron Harr Sent: Wednesday, January 10, 2024 13:50 To: gpfsug-discuss Subject: [EXTERNAL] [gpfsug-discuss] Was 5.1.8.1 pulled? Does anyone know if Scale 5.1.8.1 was pulled for some reason? I have a local download of Scale 5.1.8.1 and went to check if there was something newer in my Passport account, but was surprised to see that 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 earlier because I downloaded it in July, so is 5.1.8.1 still safe to use? Is this simply a problem with the IBM Passport site? Thanks, Cameron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From harr1 at llnl.gov Wed Jan 10 20:39:19 2024 From: harr1 at llnl.gov (Cameron Harr) Date: Wed, 10 Jan 2024 12:39:19 -0800 Subject: [gpfsug-discuss] Was 5.1.8.1 pulled? In-Reply-To: References: Message-ID: Thanks for confirming! When I follow your link, I see it too; however, if I try just searching for it, I still don't get it: I'll assume it's a glitch in my matrix. Thanks again, Cameron On 1/10/24 12:29, Glen Corneau wrote: > I still find 5.1.8.1 in FixCentral: > > https://www.ibm.com/support/fixcentral/swg/quickorder?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.8&platform=All&function=all&source=fc > > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss on behalf > of Cameron Harr > *Sent:* Wednesday, January 10, 2024 13:50 > *To:* gpfsug-discuss > *Subject:* [EXTERNAL] [gpfsug-discuss] Was 5.1.8.1 pulled? > Does anyone know if Scale 5.1.8.1 was pulled for some reason? > > I have a local download of Scale 5.1.8.1 and went to check if there was > something newer in my Passport account, but was surprised to see that > 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 > earlier because I downloaded it in July, so is 5.1.8.1 still safe to > use? Is this simply a problem with the IBM Passport site? > > Thanks, > Cameron > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > https://urldefense.us/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!G2kpM7uM-TzIFchu!zqoQj7C1_sC23hviScRf3gLghX0KXMEVMKTynRPVOKW1iSnOk3kovKWlQizGoMnB042O0U58lzJq5QVo$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dDpzdrOVra4BHUGL.png Type: image/png Size: 128973 bytes Desc: not available URL: From harr1 at llnl.gov Wed Jan 10 20:43:51 2024 From: harr1 at llnl.gov (Cameron Harr) Date: Wed, 10 Jan 2024 12:43:51 -0800 Subject: [gpfsug-discuss] Was 5.1.8.1 pulled? In-Reply-To: References: Message-ID: Yes, I see that from Glen's link as well. It doesn't show for me, AND, even though I can see it with Glen's link, I cannot download .2 or .1, both getting versions of the following: At any rate, I still have my 5.1.8.1 repo from July, so I'll stick to that for now and hope this magically gets fixed on my end. Thanks, Cameron On 1/10/24 12:38, Ryan Novosielski wrote: > There?s at least a 5.1.8.2 at this point, though, so no reason to run > 5.1.8.1 that I?m aware of. > >> On Jan 10, 2024, at 15:29, Glen Corneau wrote: >> >> I still find 5.1.8.1 in FixCentral: >> >> https://www.ibm.com/support/fixcentral/swg/quickorder?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.8&platform=All&function=all&source=fc >> >> >> ------------------------------------------------------------------------ >> *From:*gpfsug-discuss on behalf >> of Cameron Harr >> *Sent:*Wednesday, January 10, 2024 13:50 >> *To:*gpfsug-discuss >> *Subject:*[EXTERNAL] [gpfsug-discuss] Was 5.1.8.1 pulled? >> Does anyone know if Scale 5.1.8.1 was pulled for some reason? >> >> I have a local download of Scale 5.1.8.1 and went to check if there was >> something newer in my Passport account, but was surprised to see that >> 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 >> earlier because I downloaded it in July, so is 5.1.8.1 still safe to >> use? Is this simply a problem with the IBM Passport site? >> >> Thanks, >> Cameron >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atgpfsug.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > https://urldefense.us/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!G2kpM7uM-TzIFchu!0CaRV15KAs7TO-Hm6o6_q8eE3E_qaPI1OJIAETl9LFSjkqUg_fY4bbAkZbGJyktUfTbGvHQhW7eOM54Hmww$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: NVDHduCp0tfKpYwN.png Type: image/png Size: 102791 bytes Desc: not available URL: From harr1 at llnl.gov Wed Jan 10 20:56:34 2024 From: harr1 at llnl.gov (Cameron Harr) Date: Wed, 10 Jan 2024 12:56:34 -0800 Subject: [gpfsug-discuss] Was 5.1.8.1 pulled? In-Reply-To: References: Message-ID: Michael Taylor @ IBM had the solution. I had typed in "Scale" to see what it would bring up (Spectrum vs Storage) and Spectrum was what showed in the selection box. But, when I scrolled down several more options, there was a separate Storage Scale product I could choose and which gives me the patches. Seems like it would be useful to merge those two products in the search code. Thanks all. On 1/10/24 12:43, Cameron Harr wrote: > > Yes, I see that from Glen's link as well. It doesn't show for me, AND, > even though I can see it with Glen's link, I cannot download .2 or .1, > both getting versions of the following: > > At any rate, I still have my 5.1.8.1 repo from July, so I'll stick to > that for now and hope this magically gets fixed on my end. > > Thanks, > Cameron > > On 1/10/24 12:38, Ryan Novosielski wrote: >> There?s at least a 5.1.8.2 at this point, though, so no reason to run >> 5.1.8.1 that I?m aware of. >> >>> On Jan 10, 2024, at 15:29, Glen Corneau wrote: >>> >>> I still find 5.1.8.1 in FixCentral: >>> >>> https://www.ibm.com/support/fixcentral/swg/quickorder?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.8&platform=All&function=all&source=fc >>> >>> >>> ------------------------------------------------------------------------ >>> *From:*gpfsug-discuss on behalf >>> of Cameron Harr >>> *Sent:*Wednesday, January 10, 2024 13:50 >>> *To:*gpfsug-discuss >>> *Subject:*[EXTERNAL] [gpfsug-discuss] Was 5.1.8.1 pulled? >>> Does anyone know if Scale 5.1.8.1 was pulled for some reason? >>> >>> I have a local download of Scale 5.1.8.1 and went to check if there was >>> something newer in my Passport account, but was surprised to see that >>> 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 >>> earlier because I downloaded it in July, so is 5.1.8.1 still safe to >>> use? Is this simply a problem with the IBM Passport site? >>> >>> Thanks, >>> Cameron >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atgpfsug.org >>> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> https://urldefense.us/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!G2kpM7uM-TzIFchu!0CaRV15KAs7TO-Hm6o6_q8eE3E_qaPI1OJIAETl9LFSjkqUg_fY4bbAkZbGJyktUfTbGvHQhW7eOM54Hmww$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: NVDHduCp0tfKpYwN.png Type: image/png Size: 102791 bytes Desc: not available URL: From jamervi at sandia.gov Wed Jan 10 22:04:30 2024 From: jamervi at sandia.gov (Mervini, Joe) Date: Wed, 10 Jan 2024 22:04:30 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Was 5.1.8.1 pulled? In-Reply-To: References: Message-ID: Just wait a while. IBM will change the name at some point. ? -- Joe Mervini Sandia National Laboratories High Performance Computing 505-844-6770 jamervi at sandia.gov From: gpfsug-discuss on behalf of Cameron Harr Date: Wednesday, January 10, 2024 at 1:59 PM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] Re: [gpfsug-discuss] Was 5.1.8.1 pulled? Michael Taylor @ IBM had the solution. I had typed in "Scale" to see what it would bring up (Spectrum vs Storage) and Spectrum was what showed in the selection box. But, when I scrolled down several more options, there was a separate Storage Scale product I could choose and which gives me the patches. Seems like it would be useful to merge those two products in the search code. Thanks all. On 1/10/24 12:43, Cameron Harr wrote: Yes, I see that from Glen's link as well. It doesn't show for me, AND, even though I can see it with Glen's link, I cannot download .2 or .1, both getting versions of the following: [cid:part1.CpW7UVM1.78LR5OGq at llnl.gov] At any rate, I still have my 5.1.8.1 repo from July, so I'll stick to that for now and hope this magically gets fixed on my end. Thanks, Cameron On 1/10/24 12:38, Ryan Novosielski wrote: There?s at least a 5.1.8.2 at this point, though, so no reason to run 5.1.8.1 that I?m aware of. On Jan 10, 2024, at 15:29, Glen Corneau wrote: I still find 5.1.8.1 in FixCentral: https://www.ibm.com/support/fixcentral/swg/quickorder?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.8&platform=All&function=all&source=fc ________________________________ From: gpfsug-discuss on behalf of Cameron Harr Sent: Wednesday, January 10, 2024 13:50 To: gpfsug-discuss Subject: [EXTERNAL] [gpfsug-discuss] Was 5.1.8.1 pulled? Does anyone know if Scale 5.1.8.1 was pulled for some reason? I have a local download of Scale 5.1.8.1 and went to check if there was something newer in my Passport account, but was surprised to see that 5.1.8.0 is the latest that shows up there now. I obviously had 5.1.8.1 earlier because I downloaded it in July, so is 5.1.8.1 still safe to use? Is this simply a problem with the IBM Passport site? Thanks, Cameron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org https://urldefense.us/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!G2kpM7uM-TzIFchu!0CaRV15KAs7TO-Hm6o6_q8eE3E_qaPI1OJIAETl9LFSjkqUg_fY4bbAkZbGJyktUfTbGvHQhW7eOM54Hmww$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: NVDHduCp0tfKpYwN.png Type: image/png Size: 102791 bytes Desc: NVDHduCp0tfKpYwN.png URL: From TROPPENS at de.ibm.com Thu Jan 11 10:46:30 2024 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Thu, 11 Jan 2024 10:46:30 +0000 Subject: [gpfsug-discuss] Save the date - UK User Meeting 2024 Message-ID: Greetings! The UK User Meeting will be held the second week of June in London, UK. There will be a New User Day on Jun 11, 2024, followed by the two-day regular Storage Scale User Meeting. https://www.spectrumscaleug.org/save-the-date-uk-user-meeting-2024/ Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jan 12 19:10:30 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 12 Jan 2024 19:10:30 +0000 Subject: [gpfsug-discuss] GUI needs GNR? Message-ID: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> Hum I get the following error message when attempting to connect to the GUI Initializing the graphical user interface. This can take several minutes. Please wait ... com.ibm.fscc.cli.CommandException: EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. Except I don't have any gpfs.gnr packages. The is with the 5.1.8.2 download, explicitly the file Storage_Scale_Data_Access-5.1.8.2-x86_64-Linux-install Downloaded from Lenovo. At no time was a gpfs.gnr-* a dependency of the GUI packages during install. Has it somehow been added as a dependency in the GUI binary and not been properly thought through as to how it impacts various editions? I didn't think GNR was needed to run the GUI. Having never bothered with the GUI till now am I doing something obviously wrong? Note that systemctl seems to say everything is fine root at cyber1:~# systemctl status gpfsgui ? gpfsgui.service - IBM_Spectrum_Scale Administration GUI Loaded: loaded (/lib/systemd/system/gpfsgui.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2024-01-12 18:41:28 UTC; 14s ago Process: 21128 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/update-environment (code=exited, status=0/SUCCESS) Process: 21133 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4pgsql (code=exited, status=0/SUCCESS) Process: 21326 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4iptables (code=exited, status=0/SUCCESS) Process: 21394 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4sudoers (code=exited, status=0/SUCCESS) Process: 21694 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/cleanupdumps (code=exited, status=0/SUCCESS) Main PID: 21704 (java) Status: "GSS/GPFS GUI started" Tasks: 112 (limit: 173894) Memory: 534.4M (limit: 2.0G) CPU: 53.517s CGroup: /system.slice/gpfsgui.service ??21704 /usr/lpp/mmfs/java/jre/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=/usr/lpp/mmfs/gui/bin/oom.sh -Dhttps.protocols=TLSv1.2,TLSv1.3 -Djava.libra> Jan 12 18:41:33 cyber1 sudo[22830]: pam_unix(sudo:session): session closed for user root Jan 12 18:41:33 cyber1 java[21704]: (Startup) 193ms Background tasks started. Jan 12 18:41:33 cyber1 java[21704]: Systems Management JVM environment runtime: Jan 12 18:41:33 cyber1 java[21704]: Free memory in the JVM: 65MB Jan 12 18:41:33 cyber1 java[21704]: Total memory in the JVM: 240MB Jan 12 18:41:33 cyber1 java[21704]: Available memory in the JVM: 337MB Jan 12 18:41:33 cyber1 java[21704]: Max memory that the JVM will attempt to use: 512MB Jan 12 18:41:33 cyber1 java[21704]: Number of processors available to JVM: 2 Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0012I: The server installed the following features: [apiDiscovery-1.0, appSecurity-2.0, distributedMap-1.0, federatedRegistry-1.0, > Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0011I: The gpfsgui server is ready to run a smarter planet. The gpfsgui server started in 27.411 seconds. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From ncalimet at lenovo.com Fri Jan 12 23:04:47 2024 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Fri, 12 Jan 2024 23:04:47 +0000 Subject: [gpfsug-discuss] [External] GUI needs GNR? In-Reply-To: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> Message-ID: Hi, The DSS-G documentation has a dedicated guide to deploy and setup a GUI server with ressources provided by the DSS-G installation package that are needed to properly handle Lenovo hardware. Starting with the latest DSS-G release (4.5a) automatic deployment is supported with the dssg-gui-install tool leveraging a Confluent management server. HTH ________________________________ From: gpfsug-discuss on behalf of Jonathan Buzzard Sent: Friday, January 12, 2024 8:10 PM To: gpfsug main discussion list Subject: [External] [gpfsug-discuss] GUI needs GNR? Hum I get the following error message when attempting to connect to the GUI Initializing the graphical user interface. This can take several minutes. Please wait ... com.ibm.fscc.cli.CommandException: EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. Except I don't have any gpfs.gnr packages. The is with the 5.1.8.2 download, explicitly the file Storage_Scale_Data_Access-5.1.8.2-x86_64-Linux-install Downloaded from Lenovo. At no time was a gpfs.gnr-* a dependency of the GUI packages during install. Has it somehow been added as a dependency in the GUI binary and not been properly thought through as to how it impacts various editions? I didn't think GNR was needed to run the GUI. Having never bothered with the GUI till now am I doing something obviously wrong? Note that systemctl seems to say everything is fine root at cyber1:~# systemctl status gpfsgui ? gpfsgui.service - IBM_Spectrum_Scale Administration GUI Loaded: loaded (/lib/systemd/system/gpfsgui.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2024-01-12 18:41:28 UTC; 14s ago Process: 21128 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/update-environment (code=exited, status=0/SUCCESS) Process: 21133 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4pgsql (code=exited, status=0/SUCCESS) Process: 21326 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4iptables (code=exited, status=0/SUCCESS) Process: 21394 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4sudoers (code=exited, status=0/SUCCESS) Process: 21694 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/cleanupdumps (code=exited, status=0/SUCCESS) Main PID: 21704 (java) Status: "GSS/GPFS GUI started" Tasks: 112 (limit: 173894) Memory: 534.4M (limit: 2.0G) CPU: 53.517s CGroup: /system.slice/gpfsgui.service ??21704 /usr/lpp/mmfs/java/jre/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=/usr/lpp/mmfs/gui/bin/oom.sh -Dhttps.protocols=TLSv1.2,TLSv1.3 -Djava.libra> Jan 12 18:41:33 cyber1 sudo[22830]: pam_unix(sudo:session): session closed for user root Jan 12 18:41:33 cyber1 java[21704]: (Startup) 193ms Background tasks started. Jan 12 18:41:33 cyber1 java[21704]: Systems Management JVM environment runtime: Jan 12 18:41:33 cyber1 java[21704]: Free memory in the JVM: 65MB Jan 12 18:41:33 cyber1 java[21704]: Total memory in the JVM: 240MB Jan 12 18:41:33 cyber1 java[21704]: Available memory in the JVM: 337MB Jan 12 18:41:33 cyber1 java[21704]: Max memory that the JVM will attempt to use: 512MB Jan 12 18:41:33 cyber1 java[21704]: Number of processors available to JVM: 2 Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0012I: The server installed the following features: [apiDiscovery-1.0, appSecurity-2.0, distributedMap-1.0, federatedRegistry-1.0, > Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0011I: The gpfsgui server is ready to run a smarter planet. The gpfsgui server started in 27.411 seconds. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss_gpfsug.org&data=05%7C02%7Cncalimet%40lenovo.com%7C858e90861a9f494ee5a708dc13a27532%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638406835698633790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rtoZFGL9xDySA8uVX2C3x6GHUU857go83rhPp%2FMk1fk%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Jan 15 09:26:46 2024 From: alvise.dorigo at psi.ch (Dorigo Alvise) Date: Mon, 15 Jan 2024 09:26:46 +0000 Subject: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Message-ID: Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance storage system) consisting of a Power9 confluent node and two AMD canisters, and 12 NVMe drives - a Lenovo G242 "hybrid" consisting of 4 HDD enclosures, 2 SSD enclosures, 1 Intel support node and 2 Intel storage nodes. The final configuration I would expect is a single building block with 4 IO nodes, 3 declustered array: 1 for HDDs, 1 for SSDs, 1 for NVMe (the last one to be used as a cache pool). First of all, I would like to know if anyone has already tried this solution successfully. Then, below is the description of what I have done. I will preface by saying that I was able to configure the two storage clusters separately without any problem; therefore, I would exclude any inherent problem in each building block (which was installed from scratch). But when I try to have a single cluster, with different node classes, I have problems. The steps I followed (based on documentation I found in IBM pages, https://www.ibm.com/docs/en/ess-p8/5.3.1?topic=command-outline-mmvdisk-use-case) are as follows: 1 access one of the 2 building blocks (that already has a storage cluster configured, with no recoverygroups defined) 2 run "mmaddnode -N " 3 mmchlicense... 3 mmvdisk nodeclass create ... to isolate the two "new" IO nodes in a dedicated nodeclass for the purpose of differentiating configuration parameters, connected drive topology, and then recovery groups 4 perform topology discovery with: mmvdisk server list --node-class ess --disk-topology In the following the cluster and node classes: Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------- 1 sfdssio1.psi.ch 129.129.241.67 sfdssio1.psi.ch quorum-manager 2 sfdssio2.psi.ch 129.129.241.68 sfdssio2.psi.ch quorum-manager 3 sfessio1.psi.ch 129.129.241.27 sfessio1.psi.ch quorum-manager 4 sfessio2.psi.ch 129.129.241.28 sfessio2.psi.ch quorum-manager Node Class Name Members --------------------- ----------------------------------------------------------- ess sfessio1.psi.ch,sfessio2.psi.ch dss sfdssio1.psi.ch,sfdssio2.psi.ch The "mmnodeadd" operation was performed while logged into sfdssio1 (which belongs to the Lenovo G242). Then: [root at sfdssio1 ~]# mmvdisk server list --node-class ess --disk-topology node needs matching number server attention metric disk topology ------ -------------------------------- --------- -------- ------------- 3 sfessio1.psi.ch yes - unmatched server topology 4 sfessio2.psi.ch yes - unmatched server topology mmvdisk: To see what needs attention, use the command: mmvdisk: mmvdisk server list -N sfessio1.psi.ch --disk-topology -L mmvdisk: mmvdisk server list -N sfessio2.psi.ch --disk-topology -L [root at sfdssio1 ~]# mmvdisk server list -N sfessio1.psi.ch --disk-topology -L Unable to find a matching topology specification for topology file '/var/mmfs/tmp/cmdTmpDir.mmvdisk.1468913/pdisk-topology.sfessio1.psi.ch'. Topology component identification is using these CST stanza files: /usr/lpp/mmfs/data/compSpec-1304.stanza /usr/lpp/mmfs/data/compSpec-1400.stanza /usr/lpp/mmfs/data/cst/compSpec-Lenovo.stanza /usr/lpp/mmfs/data/cst/compSpec-topology.stanza Server component: serverType 'ESS3500-5141-FN2' serverArch 'x86_64' serverName 'sfessio1.psi.ch' Enclosure components: 1 found connected to HBAs Enclosure component: serialNumber '78E4395' enclosureClass 'unknown' HBA components: none found connected to enclosures Cabling: enclosure '78E4395' controller '' cabled to HBA slot 'UNKNOWN' port 'unknown' Disks: 12 SSDs 0 HDDs NVRAM: 0 devices/partitions Unable to match these components to a serverTopology specification. mmvdisk: Command failed. Examine previous error messages to determine cause. If I try to do a symmetric operation (I access an IO node of the IBM ESS3500 and try to add Lenovo nodes, trying to discover their drive topology) I get the same error; but, of course, the topology involved this time is that of the Lenovo hardware. Now, I suspect there is a (hidden?) step I would be supposed to know, but unfortunately I don't (this is my first experience with different and etherogenous building blocks merge). So I'd like to receive from you any suggestions, including?a better documentation page (if any) covering this particular use case I have. Hope the description of the context is clear enough, in case it is not I apologize and please just ask for any further details required to understand my environment. Thank you very much, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Jan 15 10:16:15 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 15 Jan 2024 10:16:15 +0000 Subject: [gpfsug-discuss] [External] GUI needs GNR? In-Reply-To: References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> Message-ID: <9ea403ad-5ba2-4bcf-85ca-2781b80c1d8f@strath.ac.uk> On 12/01/2024 23:04, Nicolas CALIMET wrote: > Hi, > > The DSS-G documentation has a dedicated guide to deploy and setup a GUI > server with ressources provided by the DSS-G installation package that > are needed to properly handle Lenovo hardware. Starting with the latest > DSS-G release (4.5a) automatic deployment is? supported with the > dssg-gui-install tool leveraging a Confluent management server. > Does anyone know if they have to be Lenovo servers or can I use other makes? The servers I was going to use don't support RHEL8 due to the SAS RAID card being dropped. I could free up a couple of other servers but they are not going to be Lenovo, though are capable of running RHEL8 and for that matter RHEL9. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From roger_eriksson at se.ibm.com Mon Jan 15 10:27:13 2024 From: roger_eriksson at se.ibm.com (ROGER ERIKSSON) Date: Mon, 15 Jan 2024 10:27:13 +0000 Subject: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover In-Reply-To: References: Message-ID: Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of Dorigo Alvise Date: Monday, 15 January 2024 at 10:29 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance storage system) consisting of a Power9 confluent node and two AMD canisters, and 12 NVMe drives - a Lenovo G242 "hybrid" consisting of 4 HDD enclosures, 2 SSD enclosures, 1 Intel support node and 2 Intel storage nodes. The final configuration I would expect is a single building block with 4 IO nodes, 3 declustered array: 1 for HDDs, 1 for SSDs, 1 for NVMe (the last one to be used as a cache pool). First of all, I would like to know if anyone has already tried this solution successfully. Then, below is the description of what I have done. I will preface by saying that I was able to configure the two storage clusters separately without any problem; therefore, I would exclude any inherent problem in each building block (which was installed from scratch). But when I try to have a single cluster, with different node classes, I have problems. The steps I followed (based on documentation I found in IBM pages, https://www.ibm.com/docs/en/ess-p8/5.3.1?topic=command-outline-mmvdisk-use-case) are as follows: 1 access one of the 2 building blocks (that already has a storage cluster configured, with no recoverygroups defined) 2 run "mmaddnode -N " 3 mmchlicense... 3 mmvdisk nodeclass create ... to isolate the two "new" IO nodes in a dedicated nodeclass for the purpose of differentiating configuration parameters, connected drive topology, and then recovery groups 4 perform topology discovery with: mmvdisk server list --node-class ess --disk-topology In the following the cluster and node classes: Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------- 1 sfdssio1.psi.ch 129.129.241.67 sfdssio1.psi.ch quorum-manager 2 sfdssio2.psi.ch 129.129.241.68 sfdssio2.psi.ch quorum-manager 3 sfessio1.psi.ch 129.129.241.27 sfessio1.psi.ch quorum-manager 4 sfessio2.psi.ch 129.129.241.28 sfessio2.psi.ch quorum-manager Node Class Name Members --------------------- ----------------------------------------------------------- ess sfessio1.psi.ch,sfessio2.psi.ch dss sfdssio1.psi.ch,sfdssio2.psi.ch The "mmnodeadd" operation was performed while logged into sfdssio1 (which belongs to the Lenovo G242). Then: [root at sfdssio1 ~]# mmvdisk server list --node-class ess --disk-topology node needs matching number server attention metric disk topology ------ -------------------------------- --------- -------- ------------- 3 sfessio1.psi.ch yes - unmatched server topology 4 sfessio2.psi.ch yes - unmatched server topology mmvdisk: To see what needs attention, use the command: mmvdisk: mmvdisk server list -N sfessio1.psi.ch --disk-topology -L mmvdisk: mmvdisk server list -N sfessio2.psi.ch --disk-topology -L [root at sfdssio1 ~]# mmvdisk server list -N sfessio1.psi.ch --disk-topology -L Unable to find a matching topology specification for topology file '/var/mmfs/tmp/cmdTmpDir.mmvdisk.1468913/pdisk-topology.sfessio1.psi.ch'. Topology component identification is using these CST stanza files: /usr/lpp/mmfs/data/compSpec-1304.stanza /usr/lpp/mmfs/data/compSpec-1400.stanza /usr/lpp/mmfs/data/cst/compSpec-Lenovo.stanza /usr/lpp/mmfs/data/cst/compSpec-topology.stanza Server component: serverType 'ESS3500-5141-FN2' serverArch 'x86_64' serverName 'sfessio1.psi.ch' Enclosure components: 1 found connected to HBAs Enclosure component: serialNumber '78E4395' enclosureClass 'unknown' HBA components: none found connected to enclosures Cabling: enclosure '78E4395' controller '' cabled to HBA slot 'UNKNOWN' port 'unknown' Disks: 12 SSDs 0 HDDs NVRAM: 0 devices/partitions Unable to match these components to a serverTopology specification. mmvdisk: Command failed. Examine previous error messages to determine cause. If I try to do a symmetric operation (I access an IO node of the IBM ESS3500 and try to add Lenovo nodes, trying to discover their drive topology) I get the same error; but, of course, the topology involved this time is that of the Lenovo hardware. Now, I suspect there is a (hidden?) step I would be supposed to know, but unfortunately I don't (this is my first experience with different and etherogenous building blocks merge). So I'd like to receive from you any suggestions, including?a better documentation page (if any) covering this particular use case I have. Hope the description of the context is clear enough, in case it is not I apologize and please just ask for any further details required to understand my environment. Thank you very much, Alvise Dorigo Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwhite at ocf.co.uk Mon Jan 15 10:32:49 2024 From: jwhite at ocf.co.uk (John White) Date: Mon, 15 Jan 2024 10:32:49 +0000 Subject: [gpfsug-discuss] [External] GUI needs GNR? In-Reply-To: <9ea403ad-5ba2-4bcf-85ca-2781b80c1d8f@strath.ac.uk> References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> <9ea403ad-5ba2-4bcf-85ca-2781b80c1d8f@strath.ac.uk> Message-ID: Hi Jonathan The servers do not have to be Lenovo Hardware but need to be able to support an OS capable of running the GUI packages required for the DSS-G document installation. Kind Regards John White Lead Storage Consultant OCF Limited Annual leave dates: 20-22/03/2024 -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Monday, January 15, 2024 10:16 AM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] [External] GUI needs GNR? On 12/01/2024 23:04, Nicolas CALIMET wrote: > Hi, > > The DSS-G documentation has a dedicated guide to deploy and setup a > GUI server with ressources provided by the DSS-G installation package > that are needed to properly handle Lenovo hardware. Starting with the > latest DSS-G release (4.5a) automatic deployment is supported with > the dssg-gui-install tool leveraging a Confluent management server. > Does anyone know if they have to be Lenovo servers or can I use other makes? The servers I was going to use don't support RHEL8 due to the SAS RAID card being dropped. I could free up a couple of other servers but they are not going to be Lenovo, though are capable of running RHEL8 and for that matter RHEL9. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From roger_eriksson at se.ibm.com Mon Jan 15 10:38:37 2024 From: roger_eriksson at se.ibm.com (ROGER ERIKSSON) Date: Mon, 15 Jan 2024 10:38:37 +0000 Subject: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover In-Reply-To: References: Message-ID: HI, I now been told that its also a Scale license violation to mix Lenovo DSS with IBM ESS in same cluster Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of ROGER ERIKSSON Date: Monday, 15 January 2024 at 11:32 To: gpfsug main discussion list , Dorigo Alvise Subject: [EXTERNAL] Re: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of Dorigo Alvise Date: Monday, 15 January 2024 at 10:29 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance storage system) consisting of a Power9 confluent node and two AMD canisters, and 12 NVMe drives - a Lenovo G242 "hybrid" consisting of 4 HDD enclosures, 2 SSD enclosures, 1 Intel support node and 2 Intel storage nodes. The final configuration I would expect is a single building block with 4 IO nodes, 3 declustered array: 1 for HDDs, 1 for SSDs, 1 for NVMe (the last one to be used as a cache pool). First of all, I would like to know if anyone has already tried this solution successfully. Then, below is the description of what I have done. I will preface by saying that I was able to configure the two storage clusters separately without any problem; therefore, I would exclude any inherent problem in each building block (which was installed from scratch). But when I try to have a single cluster, with different node classes, I have problems. The steps I followed (based on documentation I found in IBM pages, https://www.ibm.com/docs/en/ess-p8/5.3.1?topic=command-outline-mmvdisk-use-case) are as follows: 1 access one of the 2 building blocks (that already has a storage cluster configured, with no recoverygroups defined) 2 run "mmaddnode -N " 3 mmchlicense... 3 mmvdisk nodeclass create ... to isolate the two "new" IO nodes in a dedicated nodeclass for the purpose of differentiating configuration parameters, connected drive topology, and then recovery groups 4 perform topology discovery with: mmvdisk server list --node-class ess --disk-topology In the following the cluster and node classes: Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------- 1 sfdssio1.psi.ch 129.129.241.67 sfdssio1.psi.ch quorum-manager 2 sfdssio2.psi.ch 129.129.241.68 sfdssio2.psi.ch quorum-manager 3 sfessio1.psi.ch 129.129.241.27 sfessio1.psi.ch quorum-manager 4 sfessio2.psi.ch 129.129.241.28 sfessio2.psi.ch quorum-manager Node Class Name Members --------------------- ----------------------------------------------------------- ess sfessio1.psi.ch,sfessio2.psi.ch dss sfdssio1.psi.ch,sfdssio2.psi.ch The "mmnodeadd" operation was performed while logged into sfdssio1 (which belongs to the Lenovo G242). Then: [root at sfdssio1 ~]# mmvdisk server list --node-class ess --disk-topology node needs matching number server attention metric disk topology ------ -------------------------------- --------- -------- ------------- 3 sfessio1.psi.ch yes - unmatched server topology 4 sfessio2.psi.ch yes - unmatched server topology mmvdisk: To see what needs attention, use the command: mmvdisk: mmvdisk server list -N sfessio1.psi.ch --disk-topology -L mmvdisk: mmvdisk server list -N sfessio2.psi.ch --disk-topology -L [root at sfdssio1 ~]# mmvdisk server list -N sfessio1.psi.ch --disk-topology -L Unable to find a matching topology specification for topology file '/var/mmfs/tmp/cmdTmpDir.mmvdisk.1468913/pdisk-topology.sfessio1.psi.ch'. Topology component identification is using these CST stanza files: /usr/lpp/mmfs/data/compSpec-1304.stanza /usr/lpp/mmfs/data/compSpec-1400.stanza /usr/lpp/mmfs/data/cst/compSpec-Lenovo.stanza /usr/lpp/mmfs/data/cst/compSpec-topology.stanza Server component: serverType 'ESS3500-5141-FN2' serverArch 'x86_64' serverName 'sfessio1.psi.ch' Enclosure components: 1 found connected to HBAs Enclosure component: serialNumber '78E4395' enclosureClass 'unknown' HBA components: none found connected to enclosures Cabling: enclosure '78E4395' controller '' cabled to HBA slot 'UNKNOWN' port 'unknown' Disks: 12 SSDs 0 HDDs NVRAM: 0 devices/partitions Unable to match these components to a serverTopology specification. mmvdisk: Command failed. Examine previous error messages to determine cause. If I try to do a symmetric operation (I access an IO node of the IBM ESS3500 and try to add Lenovo nodes, trying to discover their drive topology) I get the same error; but, of course, the topology involved this time is that of the Lenovo hardware. Now, I suspect there is a (hidden?) step I would be supposed to know, but unfortunately I don't (this is my first experience with different and etherogenous building blocks merge). So I'd like to receive from you any suggestions, including?a better documentation page (if any) covering this particular use case I have. Hope the description of the context is clear enough, in case it is not I apologize and please just ask for any further details required to understand my environment. Thank you very much, Alvise Dorigo Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Mon Jan 15 10:52:13 2024 From: abeattie at au1.ibm.com (ANDREW BEATTIE) Date: Mon, 15 Jan 2024 10:52:13 +0000 Subject: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover In-Reply-To: References: Message-ID: You will face two issues. 1) IBM does not support mixed vendor clusters (See frequently asked questions) 2) The GNR code knows the specifications of the building blocks, and you can mix IBM building blocks with Lenovo building blocks. So you will never be able to build a single cluster of both building blocks. Regards Andrew Sent from my iPhone On 15 Jan 2024, at 20:43, ROGER ERIKSSON wrote: ? HI, I now been told that its also a Scale license violation to mix Lenovo DSS with IBM ESS in same cluster Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd HI, I now been told that its also a Scale license violation to mix Lenovo DSS with IBM ESS in same cluster Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of ROGER ERIKSSON Date: Monday, 15 January 2024 at 11:32 To: gpfsug main discussion list , Dorigo Alvise Subject: [EXTERNAL] Re: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of Dorigo Alvise Date: Monday, 15 January 2024 at 10:29 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance storage system) consisting of a Power9 confluent node and two AMD canisters, and 12 NVMe drives - a Lenovo G242 "hybrid" consisting of 4 HDD enclosures, 2 SSD enclosures, 1 Intel support node and 2 Intel storage nodes. The final configuration I would expect is a single building block with 4 IO nodes, 3 declustered array: 1 for HDDs, 1 for SSDs, 1 for NVMe (the last one to be used as a cache pool). First of all, I would like to know if anyone has already tried this solution successfully. Then, below is the description of what I have done. I will preface by saying that I was able to configure the two storage clusters separately without any problem; therefore, I would exclude any inherent problem in each building block (which was installed from scratch). But when I try to have a single cluster, with different node classes, I have problems. The steps I followed (based on documentation I found in IBM pages, https://www.ibm.com/docs/en/ess-p8/5.3.1?topic=command-outline-mmvdisk-use-case) are as follows: 1 access one of the 2 building blocks (that already has a storage cluster configured, with no recoverygroups defined) 2 run "mmaddnode -N " 3 mmchlicense... 3 mmvdisk nodeclass create ... to isolate the two "new" IO nodes in a dedicated nodeclass for the purpose of differentiating configuration parameters, connected drive topology, and then recovery groups 4 perform topology discovery with: mmvdisk server list --node-class ess --disk-topology In the following the cluster and node classes: Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------- 1 sfdssio1.psi.ch 129.129.241.67 sfdssio1.psi.ch quorum-manager 2 sfdssio2.psi.ch 129.129.241.68 sfdssio2.psi.ch quorum-manager 3 sfessio1.psi.ch 129.129.241.27 sfessio1.psi.ch quorum-manager 4 sfessio2.psi.ch 129.129.241.28 sfessio2.psi.ch quorum-manager Node Class Name Members --------------------- ----------------------------------------------------------- ess sfessio1.psi.ch,sfessio2.psi.ch dss sfdssio1.psi.ch,sfdssio2.psi.ch The "mmnodeadd" operation was performed while logged into sfdssio1 (which belongs to the Lenovo G242). Then: [root at sfdssio1 ~]# mmvdisk server list --node-class ess --disk-topology node needs matching number server attention metric disk topology ------ -------------------------------- --------- -------- ------------- 3 sfessio1.psi.ch yes - unmatched server topology 4 sfessio2.psi.ch yes - unmatched server topology mmvdisk: To see what needs attention, use the command: mmvdisk: mmvdisk server list -N sfessio1.psi.ch --disk-topology -L mmvdisk: mmvdisk server list -N sfessio2.psi.ch --disk-topology -L [root at sfdssio1 ~]# mmvdisk server list -N sfessio1.psi.ch --disk-topology -L Unable to find a matching topology specification for topology file '/var/mmfs/tmp/cmdTmpDir.mmvdisk.1468913/pdisk-topology.sfessio1.psi.ch'. Topology component identification is using these CST stanza files: /usr/lpp/mmfs/data/compSpec-1304.stanza /usr/lpp/mmfs/data/compSpec-1400.stanza /usr/lpp/mmfs/data/cst/compSpec-Lenovo.stanza /usr/lpp/mmfs/data/cst/compSpec-topology.stanza Server component: serverType 'ESS3500-5141-FN2' serverArch 'x86_64' serverName 'sfessio1.psi.ch' Enclosure components: 1 found connected to HBAs Enclosure component: serialNumber '78E4395' enclosureClass 'unknown' HBA components: none found connected to enclosures Cabling: enclosure '78E4395' controller '' cabled to HBA slot 'UNKNOWN' port 'unknown' Disks: 12 SSDs 0 HDDs NVRAM: 0 devices/partitions Unable to match these components to a serverTopology specification. mmvdisk: Command failed. Examine previous error messages to determine cause. If I try to do a symmetric operation (I access an IO node of the IBM ESS3500 and try to add Lenovo nodes, trying to discover their drive topology) I get the same error; but, of course, the topology involved this time is that of the Lenovo hardware. Now, I suspect there is a (hidden?) step I would be supposed to know, but unfortunately I don't (this is my first experience with different and etherogenous building blocks merge). So I'd like to receive from you any suggestions, including?a better documentation page (if any) covering this particular use case I have. Hope the description of the context is clear enough, in case it is not I apologize and please just ask for any further details required to understand my environment. Thank you very much, Alvise Dorigo Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Jan 15 13:24:58 2024 From: alvise.dorigo at psi.ch (Dorigo Alvise) Date: Mon, 15 Jan 2024 13:24:58 +0000 Subject: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover In-Reply-To: References: Message-ID: Thank you very much Andrew for the clarification. So, we will not proceed in this way, and will try to use multicluster. Thank you. Regards, Alvise Dorigo ________________________________ From: ANDREW BEATTIE Sent: Monday, January 15, 2024 11:52 AM To: gpfsug main discussion list Cc: Dorigo Alvise Subject: RE: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover You will face two issues. 1) IBM does not support mixed vendor clusters (See frequently asked questions) 2) The GNR code knows the specifications of the building blocks, and you can mix IBM building blocks with Lenovo building blocks. So you will never be able to build a single cluster of both building blocks. Regards Andrew Sent from my iPhone On 15 Jan 2024, at 20:43, ROGER ERIKSSON wrote: ? HI, I now been told that its also a Scale license violation to mix Lenovo DSS with IBM ESS in same cluster Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd HI, I now been told that its also a Scale license violation to mix Lenovo DSS with IBM ESS in same cluster Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of ROGER ERIKSSON Date: Monday, 15 January 2024 at 11:32 To: gpfsug main discussion list , Dorigo Alvise Subject: [EXTERNAL] Re: [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh Hi, Even if this would work technically, as far as Im aware its not supported to have Lenovo GSS and IBM ESS in same cluster, for support reasons. If you go ahead anyway, you might get into problems if you ever need to call for support. Mvh Roger Eriksson -------------------------------------------------------- IBM Partner Technical Specialist Storage Phone: +46-70-7933518 E-mail: roger_eriksson at se.ibm.com IBM Storage User Group Sweden next live meeting 29-30 May 2024 @IBM Innovation Studio Kista, Sweden Registration links and agenda to be avail in April 24 From: gpfsug-discuss on behalf of Dorigo Alvise Date: Monday, 15 January 2024 at 10:29 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] Merge of IBM and Lenovo building blocks: issue with topology discover Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance Dear All, Happy new year to everyone! My goal here at the Paul Scherrer Institut is to merge two different GPFS building blocks. In partucular, these are not the same technology or not even the same brand: - an IBM ESS-3500 (a NVMe/Performance storage system) consisting of a Power9 confluent node and two AMD canisters, and 12 NVMe drives - a Lenovo G242 "hybrid" consisting of 4 HDD enclosures, 2 SSD enclosures, 1 Intel support node and 2 Intel storage nodes. The final configuration I would expect is a single building block with 4 IO nodes, 3 declustered array: 1 for HDDs, 1 for SSDs, 1 for NVMe (the last one to be used as a cache pool). First of all, I would like to know if anyone has already tried this solution successfully. Then, below is the description of what I have done. I will preface by saying that I was able to configure the two storage clusters separately without any problem; therefore, I would exclude any inherent problem in each building block (which was installed from scratch). But when I try to have a single cluster, with different node classes, I have problems. The steps I followed (based on documentation I found in IBM pages, https://www.ibm.com/docs/en/ess-p8/5.3.1?topic=command-outline-mmvdisk-use-case) are as follows: 1 access one of the 2 building blocks (that already has a storage cluster configured, with no recoverygroups defined) 2 run "mmaddnode -N " 3 mmchlicense... 3 mmvdisk nodeclass create ... to isolate the two "new" IO nodes in a dedicated nodeclass for the purpose of differentiating configuration parameters, connected drive topology, and then recovery groups 4 perform topology discovery with: mmvdisk server list --node-class ess --disk-topology In the following the cluster and node classes: Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------- 1 sfdssio1.psi.ch 129.129.241.67 sfdssio1.psi.ch quorum-manager 2 sfdssio2.psi.ch 129.129.241.68 sfdssio2.psi.ch quorum-manager 3 sfessio1.psi.ch 129.129.241.27 sfessio1.psi.ch quorum-manager 4 sfessio2.psi.ch 129.129.241.28 sfessio2.psi.ch quorum-manager Node Class Name Members --------------------- ----------------------------------------------------------- ess sfessio1.psi.ch,sfessio2.psi.ch dss sfdssio1.psi.ch,sfdssio2.psi.ch The "mmnodeadd" operation was performed while logged into sfdssio1 (which belongs to the Lenovo G242). Then: [root at sfdssio1 ~]# mmvdisk server list --node-class ess --disk-topology node needs matching number server attention metric disk topology ------ -------------------------------- --------- -------- ------------- 3 sfessio1.psi.ch yes - unmatched server topology 4 sfessio2.psi.ch yes - unmatched server topology mmvdisk: To see what needs attention, use the command: mmvdisk: mmvdisk server list -N sfessio1.psi.ch --disk-topology -L mmvdisk: mmvdisk server list -N sfessio2.psi.ch --disk-topology -L [root at sfdssio1 ~]# mmvdisk server list -N sfessio1.psi.ch --disk-topology -L Unable to find a matching topology specification for topology file '/var/mmfs/tmp/cmdTmpDir.mmvdisk.1468913/pdisk-topology.sfessio1.psi.ch'. Topology component identification is using these CST stanza files: /usr/lpp/mmfs/data/compSpec-1304.stanza /usr/lpp/mmfs/data/compSpec-1400.stanza /usr/lpp/mmfs/data/cst/compSpec-Lenovo.stanza /usr/lpp/mmfs/data/cst/compSpec-topology.stanza Server component: serverType 'ESS3500-5141-FN2' serverArch 'x86_64' serverName 'sfessio1.psi.ch' Enclosure components: 1 found connected to HBAs Enclosure component: serialNumber '78E4395' enclosureClass 'unknown' HBA components: none found connected to enclosures Cabling: enclosure '78E4395' controller '' cabled to HBA slot 'UNKNOWN' port 'unknown' Disks: 12 SSDs 0 HDDs NVRAM: 0 devices/partitions Unable to match these components to a serverTopology specification. mmvdisk: Command failed. Examine previous error messages to determine cause. If I try to do a symmetric operation (I access an IO node of the IBM ESS3500 and try to add Lenovo nodes, trying to discover their drive topology) I get the same error; but, of course, the topology involved this time is that of the Lenovo hardware. Now, I suspect there is a (hidden?) step I would be supposed to know, but unfortunately I don't (this is my first experience with different and etherogenous building blocks merge). So I'd like to receive from you any suggestions, including?a better documentation page (if any) covering this particular use case I have. Hope the description of the context is clear enough, in case it is not I apologize and please just ask for any further details required to understand my environment. Thank you very much, Alvise Dorigo Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm Unless otherwise stated above: IBM Svenska AB Organisationsnummer: 556026-6883 Address: 164 92 Stockholm _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at hpe.com Mon Jan 15 21:32:26 2024 From: daniel.kidger at hpe.com (Kidger, Daniel) Date: Mon, 15 Jan 2024 21:32:26 +0000 Subject: [gpfsug-discuss] GUI needs GNR? In-Reply-To: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> Message-ID: I have met this too. If the filesystem uses GNR (in my case ECE), then the GUI node needs this RPM too. Remember this isn?t though a simple RPM dependency, because if the filesystem is plain NSDs, then you won?t need gpfs.gnr RPM In my case I just grabbed this RPM from one of my storage nodes. Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kidger at hpe.com +44 (0)7818 522266?? hpe.com -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Friday, January 12, 2024 7:11 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] GUI needs GNR? Hum I get the following error message when attempting to connect to the GUI Initializing the graphical user interface. This can take several minutes. Please wait ... com.ibm.fscc.cli.CommandException: EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. Except I don't have any gpfs.gnr packages. The is with the 5.1.8.2 download, explicitly the file Storage_Scale_Data_Access-5.1.8.2-x86_64-Linux-install Downloaded from Lenovo. At no time was a gpfs.gnr-* a dependency of the GUI packages during install. Has it somehow been added as a dependency in the GUI binary and not been properly thought through as to how it impacts various editions? I didn't think GNR was needed to run the GUI. Having never bothered with the GUI till now am I doing something obviously wrong? Note that systemctl seems to say everything is fine root at cyber1:~# systemctl status gpfsgui ? gpfsgui.service - IBM_Spectrum_Scale Administration GUI Loaded: loaded (/lib/systemd/system/gpfsgui.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2024-01-12 18:41:28 UTC; 14s ago Process: 21128 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/update-environment (code=exited, status=0/SUCCESS) Process: 21133 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4pgsql (code=exited, status=0/SUCCESS) Process: 21326 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4iptables (code=exited, status=0/SUCCESS) Process: 21394 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4sudoers (code=exited, status=0/SUCCESS) Process: 21694 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/cleanupdumps (code=exited, status=0/SUCCESS) Main PID: 21704 (java) Status: "GSS/GPFS GUI started" Tasks: 112 (limit: 173894) Memory: 534.4M (limit: 2.0G) CPU: 53.517s CGroup: /system.slice/gpfsgui.service ??21704 /usr/lpp/mmfs/java/jre/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=/usr/lpp/mmfs/gui/bin/oom.sh -Dhttps.protocols=TLSv1.2,TLSv1.3 -Djava.libra> Jan 12 18:41:33 cyber1 sudo[22830]: pam_unix(sudo:session): session closed for user root Jan 12 18:41:33 cyber1 java[21704]: (Startup) 193ms Background tasks started. Jan 12 18:41:33 cyber1 java[21704]: Systems Management JVM environment runtime: Jan 12 18:41:33 cyber1 java[21704]: Free memory in the JVM: 65MB Jan 12 18:41:33 cyber1 java[21704]: Total memory in the JVM: 240MB Jan 12 18:41:33 cyber1 java[21704]: Available memory in the JVM: 337MB Jan 12 18:41:33 cyber1 java[21704]: Max memory that the JVM will attempt to use: 512MB Jan 12 18:41:33 cyber1 java[21704]: Number of processors available to JVM: 2 Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0012I: The server installed the following features: [apiDiscovery-1.0, appSecurity-2.0, distributedMap-1.0, federatedRegistry-1.0, > Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0011I: The gpfsgui server is ready to run a smarter planet. The gpfsgui server started in 27.411 seconds. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From nperez at giux.com Mon Jan 15 22:08:32 2024 From: nperez at giux.com (Nicolas Perez de Arenaza) Date: Mon, 15 Jan 2024 19:08:32 -0300 Subject: [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? Message-ID: Hello, Working on offering IBM Storage Scale solution leveraging on ILM to Tier to Tape. My concerns are: - reliability - complexity ( I mean keep it simple ). - future roadmap and lifecycle (this will be used for many years) What to choose for Tape Management? IBM Storage Space Management for Linux + IBM Storage Protect. or IBM Storage Archive Any opinions are welcome, Look forward to receive some, Thanks Nicol?s. Nicol?s P?rez de Arenaza Gerente de Consultor?a | GIUX S.A. Tel Dir: (5411) 5218-0099 | Ofi: (5411) 5218-0037 x 201 | Cel: (54911) 4428-1795 nperez at giux.com | Skype ID: nperezdearenaza | http://www.giux.com -- This email has been checked for viruses by Avast antivirus software. www.avast.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Jan 15 22:10:15 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 15 Jan 2024 22:10:15 +0000 Subject: [gpfsug-discuss] GUI needs GNR? In-Reply-To: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> Message-ID: <38B3D373-CBC3-47E4-A1CA-FE2ECC83895B@rutgers.edu> I don't have any real information about this, but I can say that there are things in that RPM that might be required by the GUI that don't actually provide GNR-related services. I installed that on something this past week for one of the other utilities it contains. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Jan 12, 2024, at 14:10, Jonathan Buzzard wrote: Hum I get the following error message when attempting to connect to the GUI Initializing the graphical user interface. This can take several minutes. Please wait ... com.ibm.fscc.cli.CommandException: EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. Except I don't have any gpfs.gnr packages. The is with the 5.1.8.2 download, explicitly the file Storage_Scale_Data_Access-5.1.8.2-x86_64-Linux-install Downloaded from Lenovo. At no time was a gpfs.gnr-* a dependency of the GUI packages during install. Has it somehow been added as a dependency in the GUI binary and not been properly thought through as to how it impacts various editions? I didn't think GNR was needed to run the GUI. Having never bothered with the GUI till now am I doing something obviously wrong? Note that systemctl seems to say everything is fine root at cyber1:~# systemctl status gpfsgui ? gpfsgui.service - IBM_Spectrum_Scale Administration GUI Loaded: loaded (/lib/systemd/system/gpfsgui.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2024-01-12 18:41:28 UTC; 14s ago Process: 21128 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/update-environment (code=exited, status=0/SUCCESS) Process: 21133 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4pgsql (code=exited, status=0/SUCCESS) Process: 21326 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4iptables (code=exited, status=0/SUCCESS) Process: 21394 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4sudoers (code=exited, status=0/SUCCESS) Process: 21694 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/cleanupdumps (code=exited, status=0/SUCCESS) Main PID: 21704 (java) Status: "GSS/GPFS GUI started" Tasks: 112 (limit: 173894) Memory: 534.4M (limit: 2.0G) CPU: 53.517s CGroup: /system.slice/gpfsgui.service ??21704 /usr/lpp/mmfs/java/jre/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=/usr/lpp/mmfs/gui/bin/oom.sh -Dhttps.protocols=TLSv1.2,TLSv1.3 -Djava.libra> Jan 12 18:41:33 cyber1 sudo[22830]: pam_unix(sudo:session): session closed for user root Jan 12 18:41:33 cyber1 java[21704]: (Startup) 193ms Background tasks started. Jan 12 18:41:33 cyber1 java[21704]: Systems Management JVM environment runtime: Jan 12 18:41:33 cyber1 java[21704]: Free memory in the JVM: 65MB Jan 12 18:41:33 cyber1 java[21704]: Total memory in the JVM: 240MB Jan 12 18:41:33 cyber1 java[21704]: Available memory in the JVM: 337MB Jan 12 18:41:33 cyber1 java[21704]: Max memory that the JVM will attempt to use: 512MB Jan 12 18:41:33 cyber1 java[21704]: Number of processors available to JVM: 2 Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0012I: The server installed the following features: [apiDiscovery-1.0, appSecurity-2.0, distributedMap-1.0, federatedRegistry-1.0, > Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0011I: The gpfsgui server is ready to run a smarter planet. The gpfsgui server started in 27.411 seconds. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Mon Jan 15 23:33:13 2024 From: abeattie at au1.ibm.com (ANDREW BEATTIE) Date: Mon, 15 Jan 2024 23:33:13 +0000 Subject: [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? In-Reply-To: References: Message-ID: So full disclosure I work for IBM ? IBM Storage Scale supports multiple DMAPI aware Hierarchical Storage Management (HSM) offerings There are 3 Offerings available with an IBM logo associated. IBM Storage Archive (LTFS) - Simple light weight, - does not support tape spanning (can be an issue for LTO, not an issue for Enterprise tape with TS1170 - 50TB media), ????????????????????????can be slower for tape reclamation processes depending on how the environment is configured. ????????????????????????Does have specific tape library support - not all libraries have been validated. ????????????????????????Robust architecture, easy to scale out by adding additional Archive nodes as required, plenty of client references ????????????????????????Licensed on nodes deployed IBM Storage Protect Extended Edition + IBM Storage Protect Space Management - Solid robust architecture has been successfully deployed delivering HSM capability ????????????????????????for 15+ years. Potentially best integrated with IBM Storage scale with the mmbackup / SOBAR integration for both backup / archive for ????????????????????????same data sets. Single DB2 Database server can be seen as a performance bottleneck at very large scale. ????????????????????????2 licensing options - Capacity based (per TB) or by Processor Value Unit (PVU) IBM High Performance Storage System - (HPSS) Offered as a Managed Service (options for Partners to provide Lvl 1 support) - Stand alone fully featured archive ????????????????????????platform that does have a DMAPI integration connector for Storage Scale, Very useful for very large scale out environments as the Managed Services costs ??????????????????????? are a fixed cost per annum regardless of capacity. 20PB / 200PB the "license" is the same. HPe Logo HPE DMF7 - HPE have done extensive work to build support for DMF 7 to natively integrate with IBM Storage Scale - at least 2 referenceable clients in APAC region Kalray Logo PixitMedia Ngenea - I haven't done anything with this platform but I'm aware that it exists and has an extensive following in the film & TV verticals. In many ways your decision will come down to how the clients want their users to experience data management, Do they want the users to be responsible for the management of their data, or do they want an automated experience where data management simply happens and users don't have to worry about or think about archival policy / process / requirements. Regards, Andrew Beattie Technical Sales Specialist - Storage for Big Data & AI IBM Australia and New Zealand P. +61 421 337 927 E. abeattie at au1.ibm.com Twitter: AndrewJBeattie LinkedIn: https://www.linkedin.com/in/ajbeattie/ ________________________________ From: gpfsug-discuss on behalf of Nicolas Perez de Arenaza Sent: Tuesday, 16 January 2024 8:08 AM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? This Message Is From an External Sender This message came from outside your organization. Report Suspicious Hello, Working on offering IBM Storage Scale solution leveraging on ILM to Tier to Tape. My concerns are: - reliability - complexity ( I mean keep it simple ). - future roadmap and lifecycle (this will be used for many years) What to choose for Tape Management? IBM Storage Space Management for Linux + IBM Storage Protect. or IBM Storage Archive Any opinions are welcome, Look forward to receive some, Thanks Nicol?s. Nicol?s P?rez de Arenaza Gerente de Consultor?a | GIUX S.A. Tel Dir: (5411) 5218-0099 | Ofi: (5411) 5218-0037 x 201 | Cel: (54911) 4428-1795 nperez at giux.com | Skype ID: nperezdearenaza | http://www.giux.com [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif] Virus-free.www.avast.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Tue Jan 16 09:18:21 2024 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Tue, 16 Jan 2024 09:18:21 +0000 Subject: [gpfsug-discuss] Save the date - User Meetings along SupercomputingAsia 2024 Message-ID: Greetings, IBM is organizing a User Meeting along SupercomputingAsia 2024 in Sydney, Australia. This will be a whole day event. Details on agenda and registration will be provided later. Please join us! https://www.spectrumscaleug.org/event/storage-scale-user-meeting-sca-24/ Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgerry at us.ibm.com Tue Jan 16 15:54:10 2024 From: jgerry at us.ibm.com (Jim Gerry) Date: Tue, 16 Jan 2024 15:54:10 +0000 Subject: [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? In-Reply-To: References: Message-ID: You can find details on HPSS for IBM Storage Scale for ILM space management, and Scale-Out Backup And Restore (SOBAR): https://hpss-collaboration.org/hpss-for-ibm-storage-scale/ Some features were mentioned earlier for other solutions; HPSS supports: * Tape spanning to manage files landing on the end of a tape or to manage files larger than one tape. * Tape stripes to achieve single file transfers faster than one tape drive. * Rotating parity on tape (RAIT) for scaled transfer performance, and to cut redundant tape cost, , or to improve tape library reliability (striping across tape libraries ? RAIL). * The HPSS partitioned Db2 database runs on one or more servers to scale database transactions. * Use HPSS with or without IBM Storage Scale. If you have any questions, let me know. Jim Gerry Architect, System Engineer, Consultant IBM Consulting, Federal Mobile: (713) 256-8516 | LinkedIn: Connect High Performance Storage System ? HPSS www.hpss-collaboration.org From: gpfsug-discuss on behalf of ANDREW BEATTIE Date: Monday, 15 January 2024 at 17:37 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] Re: [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? So full disclosure I work for IBM ? IBM Storage Scale supports multiple DMAPI aware Hierarchical Storage Management (HSM) offerings There are 3 Offerings available with an IBM logo associated. IBM Storage Archive (LTFS) - Simple light weight, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd So full disclosure I work for IBM ? IBM Storage Scale supports multiple DMAPI aware Hierarchical Storage Management (HSM) offerings There are 3 Offerings available with an IBM logo associated. IBM Storage Archive (LTFS) - Simple light weight, - does not support tape spanning (can be an issue for LTO, not an issue for Enterprise tape with TS1170 - 50TB media), ????????????????????????can be slower for tape reclamation processes depending on how the environment is configured. ????????????????????????Does have specific tape library support - not all libraries have been validated. ????????????????????????Robust architecture, easy to scale out by adding additional Archive nodes as required, plenty of client references ????????????????????????Licensed on nodes deployed IBM Storage Protect Extended Edition + IBM Storage Protect Space Management - Solid robust architecture has been successfully deployed delivering HSM capability ????????????????????????for 15+ years. Potentially best integrated with IBM Storage scale with the mmbackup / SOBAR integration for both backup / archive for ????????????????????????same data sets. Single DB2 Database server can be seen as a performance bottleneck at very large scale. ????????????????????????2 licensing options - Capacity based (per TB) or by Processor Value Unit (PVU) IBM High Performance Storage System - (HPSS) Offered as a Managed Service (options for Partners to provide Lvl 1 support) - Stand alone fully featured archive ????????????????????????platform that does have a DMAPI integration connector for Storage Scale, Very useful for very large scale out environments as the Managed Services costs ??????????????????????? are a fixed cost per annum regardless of capacity. 20PB / 200PB the "license" is the same. HPe Logo HPE DMF7 - HPE have done extensive work to build support for DMF 7 to natively integrate with IBM Storage Scale - at least 2 referenceable clients in APAC region Kalray Logo PixitMedia Ngenea - I haven't done anything with this platform but I'm aware that it exists and has an extensive following in the film & TV verticals. In many ways your decision will come down to how the clients want their users to experience data management, Do they want the users to be responsible for the management of their data, or do they want an automated experience where data management simply happens and users don't have to worry about or think about archival policy / process / requirements. Regards, Andrew Beattie Technical Sales Specialist - Storage for Big Data & AI IBM Australia and New Zealand P. +61 421 337 927 E. abeattie at au1.ibm.com Twitter: AndrewJBeattie LinkedIn: https://www.linkedin.com/in/ajbeattie/ ________________________________ From: gpfsug-discuss on behalf of Nicolas Perez de Arenaza Sent: Tuesday, 16 January 2024 8:08 AM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? This Message Is From an External Sender This message came from outside your organization. Report Suspicious Hello, Working on offering IBM Storage Scale solution leveraging on ILM to Tier to Tape. My concerns are: - reliability - complexity ( I mean keep it simple ). - future roadmap and lifecycle (this will be used for many years) What to choose for Tape Management? IBM Storage Space Management for Linux + IBM Storage Protect. or IBM Storage Archive Any opinions are welcome, Look forward to receive some, Thanks Nicol?s. Nicol?s P?rez de Arenaza Gerente de Consultor?a | GIUX S.A. Tel Dir: (5411) 5218-0099 | Ofi: (5411) 5218-0037 x 201 | Cel: (54911) 4428-1795 nperez at giux.com | Skype ID: nperezdearenaza | http://www.giux.com [Image removed by sender.] Virus-free.www.avast.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jan 16 17:24:00 2024 From: scale at us.ibm.com (scale) Date: Tue, 16 Jan 2024 17:24:00 +0000 Subject: [gpfsug-discuss] GUI needs GNR? In-Reply-To: <38B3D373-CBC3-47E4-A1CA-FE2ECC83895B@rutgers.edu> References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> <38B3D373-CBC3-47E4-A1CA-FE2ECC83895B@rutgers.edu> Message-ID: Is this a new install or an upgrade? Usually, if the GUI ?sees? a recovery group or the setting "nsdRAIDTracks" it determines that it is running on a GNR system and only then it will require the GNR package. Serban Maerean GPFS Security From: gpfsug-discuss on behalf of Ryan Novosielski Date: Monday, January 15, 2024 at 4:12?PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI needs GNR? I don't have any real information about this, but I can say that there are things in that RPM that might be required by the GUI that don't actually provide GNR-related services. I installed that on something this past week for one of the other ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd I don't have any real information about this, but I can say that there are things in that RPM that might be required by the GUI that don't actually provide GNR-related services. I installed that on something this past week for one of the other utilities it contains. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Jan 12, 2024, at 14:10, Jonathan Buzzard wrote: Hum I get the following error message when attempting to connect to the GUI Initializing the graphical user interface. This can take several minutes. Please wait ... com.ibm.fscc.cli.CommandException: EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. EFSSG1900I The required GPFS GNR package (gpfs.gnr-*) is not installed on the GUI node. Please install it and initialize the GUI again. Except I don't have any gpfs.gnr packages. The is with the 5.1.8.2 download, explicitly the file Storage_Scale_Data_Access-5.1.8.2-x86_64-Linux-install Downloaded from Lenovo. At no time was a gpfs.gnr-* a dependency of the GUI packages during install. Has it somehow been added as a dependency in the GUI binary and not been properly thought through as to how it impacts various editions? I didn't think GNR was needed to run the GUI. Having never bothered with the GUI till now am I doing something obviously wrong? Note that systemctl seems to say everything is fine root at cyber1:~# systemctl status gpfsgui ? gpfsgui.service - IBM_Spectrum_Scale Administration GUI Loaded: loaded (/lib/systemd/system/gpfsgui.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2024-01-12 18:41:28 UTC; 14s ago Process: 21128 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/update-environment (code=exited, status=0/SUCCESS) Process: 21133 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4pgsql (code=exited, status=0/SUCCESS) Process: 21326 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4iptables (code=exited, status=0/SUCCESS) Process: 21394 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4sudoers (code=exited, status=0/SUCCESS) Process: 21694 ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/cleanupdumps (code=exited, status=0/SUCCESS) Main PID: 21704 (java) Status: "GSS/GPFS GUI started" Tasks: 112 (limit: 173894) Memory: 534.4M (limit: 2.0G) CPU: 53.517s CGroup: /system.slice/gpfsgui.service ??21704 /usr/lpp/mmfs/java/jre/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=/usr/lpp/mmfs/gui/bin/oom.sh -Dhttps.protocols=TLSv1.2,TLSv1.3 -Djava.libra> Jan 12 18:41:33 cyber1 sudo[22830]: pam_unix(sudo:session): session closed for user root Jan 12 18:41:33 cyber1 java[21704]: (Startup) 193ms Background tasks started. Jan 12 18:41:33 cyber1 java[21704]: Systems Management JVM environment runtime: Jan 12 18:41:33 cyber1 java[21704]: Free memory in the JVM: 65MB Jan 12 18:41:33 cyber1 java[21704]: Total memory in the JVM: 240MB Jan 12 18:41:33 cyber1 java[21704]: Available memory in the JVM: 337MB Jan 12 18:41:33 cyber1 java[21704]: Max memory that the JVM will attempt to use: 512MB Jan 12 18:41:33 cyber1 java[21704]: Number of processors available to JVM: 2 Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0012I: The server installed the following features: [apiDiscovery-1.0, appSecurity-2.0, distributedMap-1.0, federatedRegistry-1.0, > Jan 12 18:41:33 cyber1 java[21704]: [AUDIT ] CWWKF0011I: The gpfsgui server is ready to run a smarter planet. The gpfsgui server started in 27.411 seconds. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jan 16 23:23:55 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 16 Jan 2024 23:23:55 +0000 Subject: [gpfsug-discuss] GUI needs GNR? In-Reply-To: References: <6e5b347c-2a84-498f-93af-94a7a5c5676f@strath.ac.uk> <38B3D373-CBC3-47E4-A1CA-FE2ECC83895B@rutgers.edu> Message-ID: <984a5d81-f389-4d2f-9e6b-7239a33d4a73@strath.ac.uk> On 16/01/2024 17:24, scale wrote: > > Is this a new install or an upgrade? > > Usually, if the GUI ?sees? a recovery group or the setting > "nsdRAIDTracks" it determines that it is running on a GNR system and > only then it will require the GNR package. > It was a "new" install on a traditional DSS-G system with SR-650's and attached storage enclosures. However it turns out that Lenovo only support the GUI on Lenovo hardware running RHEL and deployed via xCAT/confluent. I might have been able to bodge it had the servers not being running Ubuntu, but they where because RHEL8 had dropped support for the RAID card. As such the gpfs.gui RPM's where as much use to me as a chocolate teapot. Well I guess I could have had tried using alien but that is getting a long way off the beaten path. The result is we have ordered a pair of refurb 1U X3550 M5's to run it on that are supported and quite a bit newer too. Hopefully with a fully supported configuration it will be much simpler. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Wed Jan 17 13:28:22 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 17 Jan 2024 13:28:22 +0000 Subject: [gpfsug-discuss] RDMA question Message-ID: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> I can't seem to find a straight answer to this. If you wanted to used RDMA with GPFS does every node need to be using RDMA or could it be that just a subset use RDMA? Obviously the nodes with actual storage attached need to be running RDMA, but would I need to upgrade things like protocol or GUI nodes etc. to use RDMA as well? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From enrico.tagliavini at fmi.ch Wed Jan 17 13:34:42 2024 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Wed, 17 Jan 2024 13:34:42 +0000 Subject: [gpfsug-discuss] RDMA question In-Reply-To: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: <4806ff9c5a04cade549f0713078f6f59e403e2e9.camel@fmi.ch> You can enable RDMA only on a subset of nodes by enabling the verbsRdma config for a node class (I usually create a custom RDMA node class and add / remove nodes from the nodeclass as needed) or per single nodes. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Informatics Maulbeerstrasse 66 4058 Basel Switzerland On Wed, 2024-01-17 at 13:28 +0000, Jonathan Buzzard wrote: > I can't seem to find a straight answer to this. If you wanted to used > RDMA with GPFS does every node need to be using RDMA or could it be that > just a subset use RDMA? > > Obviously the nodes with actual storage attached need to be running > RDMA, but would I need to upgrade things like protocol or GUI nodes etc. > to use RDMA as well? > > > JAB. > From Ward.Poelmans at vub.be Wed Jan 17 13:35:54 2024 From: Ward.Poelmans at vub.be (Ward POELMANS) Date: Wed, 17 Jan 2024 13:35:54 +0000 Subject: [gpfsug-discuss] RDMA question In-Reply-To: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: Hi Jonathan, Short answer: no. We have a cluster which is mixed infiniband (with RDMA) and plain ethernet. You can pick with: mmchconfig verbsRdma=enable -N nodes,... depending if RDMA is on or off by default. Ward ________________________________ From: gpfsug-discuss on behalf of Jonathan Buzzard Sent: Wednesday, 17 January 2024 14:28 To: gpfsug main discussion list Subject: [gpfsug-discuss] RDMA question I can't seem to find a straight answer to this. If you wanted to used RDMA with GPFS does every node need to be using RDMA or could it be that just a subset use RDMA? Obviously the nodes with actual storage attached need to be running RDMA, but would I need to upgrade things like protocol or GUI nodes etc. to use RDMA as well? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 17 13:34:45 2024 From: abeattie at au1.ibm.com (ANDREW BEATTIE) Date: Wed, 17 Jan 2024 13:34:45 +0000 Subject: [gpfsug-discuss] RDMA question In-Reply-To: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: <8D7BC845-532C-4F52-A4B0-DB2C6016FB25@au1.ibm.com> No you don?t have to have every node using RDMA You can split your data traffic out over RDMA and have you control traffic on straight tcp/ip Sent from my iPhone > On 17 Jan 2024, at 23:32, Jonathan Buzzard wrote: > > ? > I can't seem to find a straight answer to this. If you wanted to used RDMA with GPFS does every node need to be using RDMA or could it be that just a subset use RDMA? > > Obviously the nodes with actual storage attached need to be running RDMA, but would I need to upgrade things like protocol or GUI nodes etc. to use RDMA as well? > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From novosirj at rutgers.edu Wed Jan 17 15:11:58 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 17 Jan 2024 15:11:58 +0000 Subject: [gpfsug-discuss] RDMA question In-Reply-To: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: You just wanna be careful that whatever the non-RDMA nodes are doing, the traffic is acceptable. We have a various points ran into nodes not using RDMA, just because of a minor misconfiguration, and suddenly hundreds of megabytes a second of storage traffic we?re going over a net network designed for administration. Sent from my iPhone > On Jan 17, 2024, at 08:30, Jonathan Buzzard wrote: > > ? > I can't seem to find a straight answer to this. If you wanted to used RDMA with GPFS does every node need to be using RDMA or could it be that just a subset use RDMA? > > Obviously the nodes with actual storage attached need to be running RDMA, but would I need to upgrade things like protocol or GUI nodes etc. to use RDMA as well? > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From ward.poelmans at vub.be Wed Jan 17 15:21:45 2024 From: ward.poelmans at vub.be (Ward Poelmans) Date: Wed, 17 Jan 2024 16:21:45 +0100 Subject: [gpfsug-discuss] RDMA question In-Reply-To: References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: On 17/01/2024 16:11, Ryan Novosielski wrote: > We have a various points ran into nodes not using RDMA, just because of a > minor misconfiguration, and suddenly hundreds of megabytes a second of > storage traffic we?re going over a net network designed for > administration. You can use verbsRdmaFailBackTCPIfNotAvailable=no for that. If RDMA is not working on a node configured for it, GPFS will refuse to start. Ward -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4745 bytes Desc: S/MIME Cryptographic Signature URL: From jonathan.buzzard at strath.ac.uk Wed Jan 17 15:37:48 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 17 Jan 2024 15:37:48 +0000 Subject: [gpfsug-discuss] RDMA question In-Reply-To: References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: On 17/01/2024 15:21, Ward Poelmans wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > On 17/01/2024 16:11, Ryan Novosielski wrote: >> We have a various points ran into nodes not using RDMA, just because of a >> ? minor misconfiguration, and suddenly hundreds of megabytes a second of >> storage traffic we?re going over a net network designed for >> administration. > > You can use verbsRdmaFailBackTCPIfNotAvailable=no for that. If RDMA is > not working on a node configured for it, GPFS will refuse to start. > Interesting. Noting we run GPFS exclusively over Ethernet and the idea was still to run it over Ethernet but with RDMA. We took the decision a long time ago now to make use of the fact that we have fancy pants Ethernet switches and put the admin traffic over the same physical Ethernet link but on a separate VLAN which we then prioritise with QoS. Consequently if something where to go wrong with the RDMA and it fell back to TCP it would still be going over the same physical link :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From novosirj at rutgers.edu Wed Jan 17 16:13:30 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 17 Jan 2024 16:13:30 +0000 Subject: [gpfsug-discuss] RDMA question In-Reply-To: References: <849ad155-e31a-47cc-a284-659623e3f03b@strath.ac.uk> Message-ID: <067DE2EB-4FC9-43A1-B1C2-585C79C9353C@rutgers.edu> On Jan 17, 2024, at 10:21, Ward Poelmans wrote: On 17/01/2024 16:11, Ryan Novosielski wrote: We have a various points ran into nodes not using RDMA, just because of a minor misconfiguration, and suddenly hundreds of megabytes a second of storage traffic we?re going over a net network designed for administration. You can use verbsRdmaFailBackTCPIfNotAvailable=no for that. If RDMA is not working on a node configured for it, GPFS will refuse to start. Thank you ? that?s a neat trick. Newer than our original install (5.0.4) and I didn?t catch it. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at mbari.org Mon Jan 22 19:22:50 2024 From: neil at mbari.org (Neil Conner) Date: Mon, 22 Jan 2024 11:22:50 -0800 Subject: [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? In-Reply-To: References: Message-ID: <81CDF308-59CA-4C6C-ACB9-78519ED21706@mbari.org> We use Storage Archive and are happy with it. Policies are very customizable using SQL-like queries. We have 4.5 PB archived onto LTO8 tape. I haven?t had any issues with tape spanning. We create two copies and send full tapes offsite. Stub files create a seamless experience for users. We wrote a simple self-service tool to recall files. No complaints. If you happen to use Veritas NetBackup, NetBackup can selectively skip premigrated and migrated files so you can backup a directory with a mix of file states and the migrated files won?t be rehydrated. (Migrated = file is on tape only, premigrated = both tape and resident, resident = disk only) I?m happy to answer any questions you might have. Cheers, Neil Conner Storage, Backup & Database Administrator P (831) 775-1989 F (831) 775-1620 Monterey Bay Aquarium Research Institute 7700 Sandholdt Road, Moss Landing CA 95039 www.mbari.org Advancing marine science and engineering to understand our changing ocean. > On Jan 15, 2024, at 3:33 PM, ANDREW BEATTIE wrote: > > So full disclosure I work for IBM ? > > IBM Storage Scale supports multiple DMAPI aware Hierarchical Storage Management (HSM) offerings > > There are 3 Offerings available with an IBM logo associated. > > IBM Storage Archive (LTFS) - Simple light weight, - does not support tape spanning (can be an issue for LTO, not an issue for Enterprise tape with TS1170 - 50TB media), > ????????????????????????can be slower for tape reclamation processes depending on how the environment is configured. > ????????????????????????Does have specific tape library support - not all libraries have been validated. > ????????????????????????Robust architecture, easy to scale out by adding additional Archive nodes as required, plenty of client references > ????????????????????????Licensed on nodes deployed > > IBM Storage Protect Extended Edition + IBM Storage Protect Space Management - Solid robust architecture has been successfully deployed delivering HSM capability ????????????????????????for 15+ years. Potentially best integrated with IBM Storage scale with the mmbackup / SOBAR integration for both backup / archive for > ????????????????????????same data sets. Single DB2 Database server can be seen as a performance bottleneck at very large scale. > ????????????????????????2 licensing options - Capacity based (per TB) or by Processor Value Unit (PVU) > > IBM High Performance Storage System - (HPSS) Offered as a Managed Service (options for Partners to provide Lvl 1 support) - Stand alone fully featured archive ????????????????????????platform that does have a DMAPI integration connector for Storage Scale, Very useful for very large scale out environments as the Managed Services costs ??????????????????????? are a fixed cost per annum regardless of capacity. 20PB / 200PB the "license" is the same. > > HPe Logo > > HPE DMF7 - HPE have done extensive work to build support for DMF 7 to natively integrate with IBM Storage Scale - at least 2 referenceable clients in APAC region > > Kalray Logo > PixitMedia Ngenea - I haven't done anything with this platform but I'm aware that it exists and has an extensive following in the film & TV verticals. > > In many ways your decision will come down to how the clients want their users to experience data management, Do they want the users to be responsible for the management of their data, or do they want an automated experience where data management simply happens and users don't have to worry about or think about archival policy / process / requirements. > > > > > > > Regards, > > Andrew Beattie > Technical Sales Specialist - Storage for Big Data & AI > IBM Australia and New Zealand > P. +61 421 337 927 > E. abeattie at au1.ibm.com > Twitter: AndrewJBeattie > LinkedIn: https://www.linkedin.com/in/ajbeattie/ > > From: gpfsug-discuss > on behalf of Nicolas Perez de Arenaza > > Sent: Tuesday, 16 January 2024 8:08 AM > To: gpfsug-discuss at gpfsug.org > > Subject: [EXTERNAL] [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? > > This Message Is From an External Sender > This message came from outside your organization. > Report?Suspicious > > Hello, > > Working on offering IBM Storage Scale solution leveraging on ILM to Tier to Tape. > > My concerns are: > - reliability > - complexity ( I mean keep it simple ). > - future roadmap and lifecycle (this will be used for many years) > > What to choose for Tape Management? > IBM Storage Space Management for Linux + IBM Storage Protect. > or > IBM Storage Archive > > Any opinions are welcome, > > Look forward to receive some, > > Thanks > > Nicol?s. > > Nicol?s P?rez de Arenaza > Gerente de Consultor?a | GIUX S.A. > Tel Dir: (5411) 5218-0099 | Ofi: (5411) 5218-0037 x 201 | Cel: (54911) 4428-1795 > nperez at giux.com | Skype ID: nperezdearenaza | http://www.giux.com > > > Virus-free.www.avast.com _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From petr.plodik at mcomputers.cz Tue Jan 23 14:04:19 2024 From: petr.plodik at mcomputers.cz (=?utf-8?B?UGV0ciBQbG9kw61r?=) Date: Tue, 23 Jan 2024 14:04:19 +0000 Subject: [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue Message-ID: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> Hi, we have GPFS cluster with two IBM FlashSystem 7300 systems with HD expansion and 80x 12TB HDD each (in DRAID 8+P+Q), 3 GPFS servers connected via 32G FC. We are doing performance tuning on sequential writes to HDDs and seeing suboptimal performance. After several tests, it turns out, that the bottleneck seems to be the single HDD write performance, which is below 40MB/s and one would expect at least 100MB/s. Does anyone have experiences with IBM flashsystem sequential write performance tuning or has these arrays in the infrastructure? We would really appreciate any help/explanation. Thank you! Petr Plodik M Computers s.r.o. petr.plodik at mcomputers.cz From janfrode at tanso.net Tue Jan 23 19:30:01 2024 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 23 Jan 2024 20:30:01 +0100 Subject: [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue In-Reply-To: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> References: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> Message-ID: First thing I would check is that the GPFS block size is a multiple of a full RAID stripe. It?s been a while since I worked with SVC/FlashSystem performance, but this has been my main issue. So, 8+2p with the default 128KB ?chunk size? would work with 1 MB or larger block size. The other thing was that it?s important to disable prefetching (chsystem -cacheprefetch off), as it will always be prefetching the wrong data because of how GPFS scatters the blocks. And.. on linux side there?s some max device transfersize setting that has had huge impact on some systems.. But the exact setting escapes me right now.. HTH -jf tir. 23. jan. 2024 kl. 15:05 skrev Petr Plod?k : > Hi, > > we have GPFS cluster with two IBM FlashSystem 7300 systems with HD > expansion and 80x 12TB HDD each (in DRAID 8+P+Q), 3 GPFS servers connected > via 32G FC. We are doing performance tuning on sequential writes to HDDs > and seeing suboptimal performance. After several tests, it turns out, that > the bottleneck seems to be the single HDD write performance, which is below > 40MB/s and one would expect at least 100MB/s. > > Does anyone have experiences with IBM flashsystem sequential write > performance tuning or has these arrays in the infrastructure? We would > really appreciate any help/explanation. > > Thank you! > > Petr Plodik > M Computers s.r.o. > petr.plodik at mcomputers.cz > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Tue Jan 23 19:39:30 2024 From: YARD at il.ibm.com (YARON DANIEL) Date: Tue, 23 Jan 2024 19:39:30 +0000 Subject: [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue In-Reply-To: References: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> Message-ID: Hi Please review: https://www.ibm.com/docs/en/storage-scale/5.0.3?topic=recommendations-operating-system-configuration-tuning Regards Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.png at 01DA4E44.A419AE20] Storage and Cloud Consultant Petach Tiqva, 49527 Technology Services IBM Technology Lifecycle Service Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: gpfsug-discuss On Behalf Of Jan-Frode Myklebust Sent: Tuesday, 23 January 2024 21:30 To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue First thing I would check is that the GPFS block size is a multiple of a full RAID stripe. It?s been a while since I worked with SVC/FlashSystem performance, but this has been my main issue. So, 8+2p with the default 128KB ?chunk size? would ZjQcmQRYFpfptBannerStart This Message Is From an Untrusted Sender You have not previously corresponded with this sender. Report Suspicious ? ZjQcmQRYFpfptBannerEnd First thing I would check is that the GPFS block size is a multiple of a full RAID stripe. It?s been a while since I worked with SVC/FlashSystem performance, but this has been my main issue. So, 8+2p with the default 128KB ?chunk size? would work with 1 MB or larger block size. The other thing was that it?s important to disable prefetching (chsystem -cacheprefetch off), as it will always be prefetching the wrong data because of how GPFS scatters the blocks. And.. on linux side there?s some max device transfersize setting that has had huge impact on some systems.. But the exact setting escapes me right now.. HTH -jf tir. 23. jan. 2024 kl. 15:05 skrev Petr Plod?k >: Hi, we have GPFS cluster with two IBM FlashSystem 7300 systems with HD expansion and 80x 12TB HDD each (in DRAID 8+P+Q), 3 GPFS servers connected via 32G FC. We are doing performance tuning on sequential writes to HDDs and seeing suboptimal performance. After several tests, it turns out, that the bottleneck seems to be the single HDD write performance, which is below 40MB/s and one would expect at least 100MB/s. Does anyone have experiences with IBM flashsystem sequential write performance tuning or has these arrays in the infrastructure? We would really appreciate any help/explanation. Thank you! Petr Plodik M Computers s.r.o. petr.plodik at mcomputers.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1049 bytes Desc: image001.png URL: From cblack at nygenome.org Tue Jan 23 22:15:29 2024 From: cblack at nygenome.org (Christopher Black) Date: Tue, 23 Jan 2024 17:15:29 -0500 Subject: [gpfsug-discuss] ILM tiering to Tape - What to use Protect Space Management or Spectrum Archive? In-Reply-To: <81CDF308-59CA-4C6C-ACB9-78519ED21706@mbari.org> References: <81CDF308-59CA-4C6C-ACB9-78519ED21706@mbari.org> Message-ID: New York Genome Center runs IBM Storage Archive (eeadm/ltfsee/spectrumarchive). We have 35 PiB+ in duplicated tape pools, we only ever eject SET B tapes to send offsite, SET A remains available and via an IBM TS4500 tape library using TS1160 drives and a mix of JD and JE media. We had some scaling issues early on and now have separate ilm policy tasks for large vs small files, and the most recent very small files stay on both staging disk and tape. Persistent filesystem view makes querying what exists quick and easy. We've only had an issue with tape spanning once when someone tried to create and send a 20TB+ file to archive, but we were able to reorganize the data into smaller files. We have some hooks to "pre recall" batches of files via our overall production workflow automation. We don't have a direct way for research users to recall files but I'm sure they'd be interested. Best, Chris On Mon, Jan 22, 2024 at 2:24?PM Neil Conner wrote: > We use Storage Archive and are happy with it. Policies are very > customizable using SQL-like queries. > We have 4.5 PB archived onto LTO8 tape. I haven?t had any issues with > tape spanning. > We create two copies and send full tapes offsite. > Stub files create a seamless experience for users. We wrote a simple > self-service tool to recall files. No complaints. > If you happen to use Veritas NetBackup, NetBackup can selectively skip > premigrated and migrated files so you can backup a directory with a mix of > file states and the migrated files won?t be rehydrated. > (Migrated = file is on tape only, premigrated = both tape and resident, > resident = disk only) > > I?m happy to answer any questions you might have. > > Cheers, > > *Neil Conner* > Storage, Backup & Database Administrator > P (831) 775-1989 F (831) 775-1620 > *Monterey Bay Aquarium Research Institute* > 7700 Sandholdt Road, Moss Landing CA 95039 > www.mbari.org > Advancing marine science and engineering to understand our changing ocean. > > > On Jan 15, 2024, at 3:33 PM, ANDREW BEATTIE wrote: > > So full disclosure I work for IBM ? > > IBM Storage Scale supports multiple DMAPI aware Hierarchical Storage > Management (HSM) offerings > > There are 3 Offerings available with an IBM logo associated. > > IBM Storage Archive (LTFS) - Simple light weight, - does not support tape > spanning (can be an issue for LTO, not an issue for Enterprise tape with > TS1170 - 50TB media), > can be slower for tape reclamation processes depending on how the > environment is configured. > Does have specific tape library support - not all libraries have been > validated. > Robust architecture, easy to scale out by adding additional Archive nodes > as required, plenty of client references > Licensed on nodes deployed > > IBM Storage Protect Extended Edition + IBM Storage Protect Space > Management - Solid robust architecture has been successfully deployed > delivering HSM capability for 15+ years. Potentially best integrated with > IBM Storage scale with the mmbackup / SOBAR integration for both backup / > archive for > same data sets. Single DB2 Database server can be seen as a performance > bottleneck at very large scale. > 2 licensing options - Capacity based (per TB) or by Processor Value Unit > (PVU) > > IBM High Performance Storage System - (HPSS) Offered as a Managed Service > (options for Partners to provide Lvl 1 support) - Stand alone fully > featured archive platform that does have a DMAPI integration connector for > Storage Scale, Very useful for very large scale out environments as the > Managed Services costs are a fixed cost per annum regardless of > capacity. 20PB / 200PB the "license" is the same. > > HPe Logo > > HPE DMF7 - HPE have done extensive work to build support for DMF 7 to > natively integrate with IBM Storage Scale - at least 2 referenceable > clients in APAC region > > Kalray Logo > PixitMedia Ngenea - I haven't done anything with this platform but I'm > aware that it exists and has an extensive following in the film & TV > verticals. > > In many ways your decision will come down to how the clients want their > users to experience data management, Do they want the users to be > responsible for the management of their data, or do they want an automated > experience where data management simply happens and users don't have to > worry about or think about archival policy / process / requirements. > > > > > > > Regards, > > Andrew Beattie > Technical Sales Specialist - Storage for Big Data & AI > IBM Australia and New Zealand > P. +61 421 337 927 > E. abeattie at au1.ibm.com > Twitter: AndrewJBeattie > LinkedIn: https://www.linkedin.com/in/ajbeattie/ > > ------------------------------ > *From:* gpfsug-discuss on behalf of > Nicolas Perez de Arenaza > *Sent:* Tuesday, 16 January 2024 8:08 AM > *To:* gpfsug-discuss at gpfsug.org > *Subject:* [EXTERNAL] [gpfsug-discuss] ILM tiering to Tape - What to use > Protect Space Management or Spectrum Archive? > > This Message Is From an External Sender > This message came from outside your organization. > Report Suspicious > > > Hello, > > Working on offering IBM Storage Scale solution leveraging on ILM to Tier > to Tape. > > My concerns are: > - reliability > - complexity ( I mean keep it simple ). > - future roadmap and lifecycle (this will be used for many years) > > What to choose for Tape Management? > IBM Storage Space Management for Linux + IBM Storage Protect. > or > IBM Storage Archive > > Any opinions are welcome, > > Look forward to receive some, > > Thanks > > Nicol?s. > > Nicol?s P?rez de Arenaza > Gerente de Consultor?a | GIUX S.A. > Tel Dir: (5411) 5218-0099 | Ofi: (5411) 5218-0037 x 201 | Cel: (54911) > 4428-1795 > nperez at giux.com | Skype ID: nperezdearenaza | http://www.giux.com > > > > Virus-free.www.avast.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -- This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Tue Jan 23 22:17:12 2024 From: abeattie at au1.ibm.com (ANDREW BEATTIE) Date: Tue, 23 Jan 2024 22:17:12 +0000 Subject: [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue In-Reply-To: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> References: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> Message-ID: Suggest you reach out to your local IBm team and ask them to put you in touch with the flash system performance testing / development team @BARRY WHYTE, @andrew Martin, @evelin Perez Barry is in Europe for TechXChange roadshow atm so not sure what his response times will be like, But for the record, there are reasons why IBM won't commit to performance benchmarks for Scale filesystems on anything other than Scale Storage System / Elastic Storage System building blocks. At a high level I suspect your probably bumping into Draid overheads as well as the bandwidth limitations of the SAS storage adapters for the expansion shelves. Just because the drives have a raw performance number does not mean that it's 100% usable. Flash system performance team will be able to advise more accurately. Regards, AJ Regards, Andrew Beattie Technical Sales Specialist - Storage for Big Data & AI IBM Australia and New Zealand P. +61 421 337 927 E. abeattie at au1.ibm.com Twitter: AndrewJBeattie LinkedIn: ________________________________ From: gpfsug-discuss on behalf of Petr Plod?k Sent: Wednesday, January 24, 2024 12:04:19 AM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue Hi, we have GPFS cluster with two IBM FlashSystem 7300 systems with HD expansion and 80x 12TB HDD each (in DRAID 8+P+Q), 3 GPFS servers connected via 32G FC. We are doing performance tuning on sequential writes to HDDs and seeing suboptimal performance. After several tests, it turns out, that the bottleneck seems to be the single HDD write performance, which is below 40MB/s and one would expect at least 100MB/s. Does anyone have experiences with IBM flashsystem sequential write performance tuning or has these arrays in the infrastructure? We would really appreciate any help/explanation. Thank you! Petr Plodik M Computers s.r.o. petr.plodik at mcomputers.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Tue Jan 23 22:32:30 2024 From: anacreo at gmail.com (Alec) Date: Tue, 23 Jan 2024 14:32:30 -0800 Subject: [gpfsug-discuss] IBM Flashsystem 7300 HDD sequential write performance issue In-Reply-To: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> References: <8B681D6B-3958-419E-8A4B-DE74ACA799DC@mcomputers.cz> Message-ID: I would want to understand what your test was and how you determined it's single drive performance. If you're just taking your aggregate throughout and dividing by number of drives, you're probably missing entirely the most restrictive part of the chain. You can not pour water through a funnel then have tablespoons below it and complain about the tablespoon performance. Map out the actual bandwidth all the way through your chain, and every choke point along the way and then make sure each point isn't constrained. Starting from the test mechanism itself. You can really rule out some things easily. Go from single thread to multiple threads to rule out CPU bottlenecks. Take a path out of the mix to see if the underlying connection is the constraint, make a less wide raid config or a more wide raid config to see if your performance changes. Some of these changes will have no impact to your top throughout and you can help to eliminate the variables that way. Also are you saying that 32G is your aggregate throughout across multiple FCs? That's only 4GB/s. Check out the fiber hardware and make sure you divided your work evenly across port groups and have clear paths to the storage through each port group, or ensure all the workload is in one portgroup and make sure you're not exceeding that port groups speed. Alec On Tue, Jan 23, 2024, 6:06?AM Petr Plod?k wrote: > Hi, > > we have GPFS cluster with two IBM FlashSystem 7300 systems with HD > expansion and 80x 12TB HDD each (in DRAID 8+P+Q), 3 GPFS servers connected > via 32G FC. We are doing performance tuning on sequential writes to HDDs > and seeing suboptimal performance. After several tests, it turns out, that > the bottleneck seems to be the single HDD write performance, which is below > 40MB/s and one would expect at least 100MB/s. > > Does anyone have experiences with IBM flashsystem sequential write > performance tuning or has these arrays in the infrastructure? We would > really appreciate any help/explanation. > > Thank you! > > Petr Plodik > M Computers s.r.o. > petr.plodik at mcomputers.cz > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klbuter at sandia.gov Wed Jan 24 17:08:47 2024 From: klbuter at sandia.gov (Buterbaugh, Kevin Lynn) Date: Wed, 24 Jan 2024 17:08:47 +0000 Subject: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? Message-ID: Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? default configuration, which also matches the recommended configuration in every last piece of IBM documentation that I?ve ever come across, you WILL NOT be notified of filesystem corruption?!? Do you think IBM should fix this ASAP? If so, please up vote https://ideas.ibm.com/ideas/ESS-I-61. If you, like me, consider this a bug in the existing product and not a ?feature enhancement? to maybe be included in some future release if we?re lucky, then please keep reading. Here?s the gory details to the best of my understanding? Your IBM ESS can and will detect filesystem corruption (FS_STRUCT errors). But it currently will NOT, and cannot, let you know that it?s happened. The reason is that FS_STRUCT errors are detected only on the filesystem manager node, which makes sense. But if you?re running in the default and recommended configuration your filesystem manager node is one of the I/O nodes, not the EMS node. The I/O nodes have no way to communicate anything out to you unless IBM decides to configure them to do so ? like they ALREADY DO with other things like hardware events ? by routing the error thru the EMS node which can send it on to you. You could fix this problem yourself by writing a custom callback script to send you an e-mail (or a text) whenever an FS_STRUCT error is detected by the filesystem manager node ? EXCEPT that you?d need mailx / postfix or something like that and IBM doesn?t provide you with a way to install them on the I/O nodes. As an aside, if you?re NOT on an ESS (i.e. running GPFS on some sort of commodity hardware) you can and should do this! There is a workaround for this issue, which is to run your filesystem manager(s) on the EMS node. However, 1) this goes against IBM?s recommendations (and defaults), and 2) is not possible for larger ESS systems as the EMS node doesn?t have enough RAM to handle the filesystem manager function. Personally, I think it?s absolutely crazy that an I/O node can tell you that you?ve got a pdisk failure but can?t tell you that you?ve got filesystem corruption! If you agree, then please up vote the RFE above. Even if you don?t agree, let me ask you to consider up voting the RFE anyway. Why? To send a message to IBM that you consider it unacceptable for them to allow a customer (me, obviously) to open up a support ticket for this very issue (again, I consider this a very serious bug, not a feature enhancement) in July of 2023, work with the customer for 6 months, and then blow the customer off by telling them, and I quote: ?As per the dev team, this feature has been in this way since really old versions and has not changed which means that is not going to change soon. You can request an RFE with your idea for the development team to take it into account. Below I share the link where you can share your idea (RFE):? ?Not going to change soon.? Thanks for nothing, IBM ? well, I do appreciate your honesty. I?ve got one other RFE out there - submitted in August of 2022 - and its? status is still ?Future Consideration.? I guess I?ll just keep my fingers crossed that I never have filesystem corruption on an ESS. But if I do, let me highly recommend to you that you not assign me one of your support personnel who does not understand that 1 plus 4 does not equal 6 ? or that October comes before November on the calendar (both of which I have actually had happen to me in the last 6 months; no, sadly, I am not joking or exaggerating in the least). To all the IBMers reading this I want you to know that I personally consider the ESS and GPFS to be the best storage solution out there from a technical perspective ? I truly do. But that is rapidly becoming irrelevant when you are also doing things like the above, especially when you are overly proud (I think you know what I mean) of your support even if it was good, which it used to be but sadly no longer is. IBMers, I?m sure you don?t like this bit of public shaming. Guess what? I don?t like doing it. But I have complained directly to IBM about these things for quite some time now (ask my sales rep if you don?t believe me) and it?s done no good whatsoever. Not only did I count to 100 before composing this e-mail, I slept on it. I don?t know what else to do when things aren?t changing. But I promise you this, if you?ll stop doing stuff like this I will absolutely be more than glad to never have to send another e-mail like this one again. Deal? Thank you, all? Kevin B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 24 18:27:27 2024 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 24 Jan 2024 18:27:27 +0000 Subject: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? In-Reply-To: References: Message-ID: <985360cce7184f8ba7df64226da68f88@deshaw.com> I?m not going to wade into the non-technical aspects of this, but from a technical point-of-view I can share what some of have done about things like this? I would consider important cluster management roles like the ?manager? node role which dictates token server layout and where the filesystem manager roles are assigned to be things you want much more control over than an appliance will provide. Using a storage appliance like the ESS or DSS makes sense to me for bandwidth and parallel I/O, but I don?t think it?s a good choice for these other functions. We?ve kept those roles on systems where we manage things with our standard tools, including logging/alerting/etc for years, even before we went the commodity route which you mentioned. This approach has proven to be a long-term good choice in terms of maintaining visibility and management of our environment. I do understand that the licensing space with Scale has been a moving target over the years. But with the advent of capacity-based licensing, at least one only needs to purchase the server hardware and not also Scale PVUs in order to run your own managers, so I do think that the story has improved in terms of not discouraging customers from opting into better overall architectures using a software-defined storage approach. Thx Paul From: gpfsug-discuss On Behalf Of Buterbaugh, Kevin Lynn Sent: Wednesday, January 24, 2024 12:09 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? This message was sent by an external party. Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? default configuration, which also matches the recommended configuration in every last piece of IBM documentation that I?ve ever come across, you WILL NOT be notified of filesystem corruption?!? Do you think IBM should fix this ASAP? If so, please up vote https://ideas.ibm.com/ideas/ESS-I-61. If you, like me, consider this a bug in the existing product and not a ?feature enhancement? to maybe be included in some future release if we?re lucky, then please keep reading. Here?s the gory details to the best of my understanding? Your IBM ESS can and will detect filesystem corruption (FS_STRUCT errors). But it currently will NOT, and cannot, let you know that it?s happened. The reason is that FS_STRUCT errors are detected only on the filesystem manager node, which makes sense. But if you?re running in the default and recommended configuration your filesystem manager node is one of the I/O nodes, not the EMS node. The I/O nodes have no way to communicate anything out to you unless IBM decides to configure them to do so ? like they ALREADY DO with other things like hardware events ? by routing the error thru the EMS node which can send it on to you. You could fix this problem yourself by writing a custom callback script to send you an e-mail (or a text) whenever an FS_STRUCT error is detected by the filesystem manager node ? EXCEPT that you?d need mailx / postfix or something like that and IBM doesn?t provide you with a way to install them on the I/O nodes. As an aside, if you?re NOT on an ESS (i.e. running GPFS on some sort of commodity hardware) you can and should do this! There is a workaround for this issue, which is to run your filesystem manager(s) on the EMS node. However, 1) this goes against IBM?s recommendations (and defaults), and 2) is not possible for larger ESS systems as the EMS node doesn?t have enough RAM to handle the filesystem manager function. Personally, I think it?s absolutely crazy that an I/O node can tell you that you?ve got a pdisk failure but can?t tell you that you?ve got filesystem corruption! If you agree, then please up vote the RFE above. Even if you don?t agree, let me ask you to consider up voting the RFE anyway. Why? To send a message to IBM that you consider it unacceptable for them to allow a customer (me, obviously) to open up a support ticket for this very issue (again, I consider this a very serious bug, not a feature enhancement) in July of 2023, work with the customer for 6 months, and then blow the customer off by telling them, and I quote: ?As per the dev team, this feature has been in this way since really old versions and has not changed which means that is not going to change soon. You can request an RFE with your idea for the development team to take it into account. Below I share the link where you can share your idea (RFE):? ?Not going to change soon.? Thanks for nothing, IBM ? well, I do appreciate your honesty. I?ve got one other RFE out there - submitted in August of 2022 - and its? status is still ?Future Consideration.? I guess I?ll just keep my fingers crossed that I never have filesystem corruption on an ESS. But if I do, let me highly recommend to you that you not assign me one of your support personnel who does not understand that 1 plus 4 does not equal 6 ? or that October comes before November on the calendar (both of which I have actually had happen to me in the last 6 months; no, sadly, I am not joking or exaggerating in the least). To all the IBMers reading this I want you to know that I personally consider the ESS and GPFS to be the best storage solution out there from a technical perspective ? I truly do. But that is rapidly becoming irrelevant when you are also doing things like the above, especially when you are overly proud (I think you know what I mean) of your support even if it was good, which it used to be but sadly no longer is. IBMers, I?m sure you don?t like this bit of public shaming. Guess what? I don?t like doing it. But I have complained directly to IBM about these things for quite some time now (ask my sales rep if you don?t believe me) and it?s done no good whatsoever. Not only did I count to 100 before composing this e-mail, I slept on it. I don?t know what else to do when things aren?t changing. But I promise you this, if you?ll stop doing stuff like this I will absolutely be more than glad to never have to send another e-mail like this one again. Deal? Thank you, all? Kevin B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 24 18:27:27 2024 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 24 Jan 2024 18:27:27 +0000 Subject: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? In-Reply-To: References: Message-ID: <985360cce7184f8ba7df64226da68f88@deshaw.com> I?m not going to wade into the non-technical aspects of this, but from a technical point-of-view I can share what some of have done about things like this? I would consider important cluster management roles like the ?manager? node role which dictates token server layout and where the filesystem manager roles are assigned to be things you want much more control over than an appliance will provide. Using a storage appliance like the ESS or DSS makes sense to me for bandwidth and parallel I/O, but I don?t think it?s a good choice for these other functions. We?ve kept those roles on systems where we manage things with our standard tools, including logging/alerting/etc for years, even before we went the commodity route which you mentioned. This approach has proven to be a long-term good choice in terms of maintaining visibility and management of our environment. I do understand that the licensing space with Scale has been a moving target over the years. But with the advent of capacity-based licensing, at least one only needs to purchase the server hardware and not also Scale PVUs in order to run your own managers, so I do think that the story has improved in terms of not discouraging customers from opting into better overall architectures using a software-defined storage approach. Thx Paul From: gpfsug-discuss On Behalf Of Buterbaugh, Kevin Lynn Sent: Wednesday, January 24, 2024 12:09 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? This message was sent by an external party. Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? default configuration, which also matches the recommended configuration in every last piece of IBM documentation that I?ve ever come across, you WILL NOT be notified of filesystem corruption?!? Do you think IBM should fix this ASAP? If so, please up vote https://ideas.ibm.com/ideas/ESS-I-61. If you, like me, consider this a bug in the existing product and not a ?feature enhancement? to maybe be included in some future release if we?re lucky, then please keep reading. Here?s the gory details to the best of my understanding? Your IBM ESS can and will detect filesystem corruption (FS_STRUCT errors). But it currently will NOT, and cannot, let you know that it?s happened. The reason is that FS_STRUCT errors are detected only on the filesystem manager node, which makes sense. But if you?re running in the default and recommended configuration your filesystem manager node is one of the I/O nodes, not the EMS node. The I/O nodes have no way to communicate anything out to you unless IBM decides to configure them to do so ? like they ALREADY DO with other things like hardware events ? by routing the error thru the EMS node which can send it on to you. You could fix this problem yourself by writing a custom callback script to send you an e-mail (or a text) whenever an FS_STRUCT error is detected by the filesystem manager node ? EXCEPT that you?d need mailx / postfix or something like that and IBM doesn?t provide you with a way to install them on the I/O nodes. As an aside, if you?re NOT on an ESS (i.e. running GPFS on some sort of commodity hardware) you can and should do this! There is a workaround for this issue, which is to run your filesystem manager(s) on the EMS node. However, 1) this goes against IBM?s recommendations (and defaults), and 2) is not possible for larger ESS systems as the EMS node doesn?t have enough RAM to handle the filesystem manager function. Personally, I think it?s absolutely crazy that an I/O node can tell you that you?ve got a pdisk failure but can?t tell you that you?ve got filesystem corruption! If you agree, then please up vote the RFE above. Even if you don?t agree, let me ask you to consider up voting the RFE anyway. Why? To send a message to IBM that you consider it unacceptable for them to allow a customer (me, obviously) to open up a support ticket for this very issue (again, I consider this a very serious bug, not a feature enhancement) in July of 2023, work with the customer for 6 months, and then blow the customer off by telling them, and I quote: ?As per the dev team, this feature has been in this way since really old versions and has not changed which means that is not going to change soon. You can request an RFE with your idea for the development team to take it into account. Below I share the link where you can share your idea (RFE):? ?Not going to change soon.? Thanks for nothing, IBM ? well, I do appreciate your honesty. I?ve got one other RFE out there - submitted in August of 2022 - and its? status is still ?Future Consideration.? I guess I?ll just keep my fingers crossed that I never have filesystem corruption on an ESS. But if I do, let me highly recommend to you that you not assign me one of your support personnel who does not understand that 1 plus 4 does not equal 6 ? or that October comes before November on the calendar (both of which I have actually had happen to me in the last 6 months; no, sadly, I am not joking or exaggerating in the least). To all the IBMers reading this I want you to know that I personally consider the ESS and GPFS to be the best storage solution out there from a technical perspective ? I truly do. But that is rapidly becoming irrelevant when you are also doing things like the above, especially when you are overly proud (I think you know what I mean) of your support even if it was good, which it used to be but sadly no longer is. IBMers, I?m sure you don?t like this bit of public shaming. Guess what? I don?t like doing it. But I have complained directly to IBM about these things for quite some time now (ask my sales rep if you don?t believe me) and it?s done no good whatsoever. Not only did I count to 100 before composing this e-mail, I slept on it. I don?t know what else to do when things aren?t changing. But I promise you this, if you?ll stop doing stuff like this I will absolutely be more than glad to never have to send another e-mail like this one again. Deal? Thank you, all? Kevin B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahnasr at taltech.ee Thu Jan 25 08:44:53 2024 From: ahnasr at taltech.ee (Ahmed Nasr) Date: Thu, 25 Jan 2024 08:44:53 +0000 Subject: [gpfsug-discuss] mmbackup against snapshot backup failing In-Reply-To: References: Message-ID: Good Day, this is my first time posting. it is great to talk to everyone here. so i have an issue where mmbackup is not working properly and i don't know why. Problem: mmbackup doesn't find any files that are eligible for backup. i used to work with dsmc and it is working fine but i want to employ the snapshot option within mmbackup. here are my steps: 1. backup to tsm with mmbackup using the full backup option . that took around one month 2. tried to do an incremental backup with mmbackup against a snapshot (Failed) 3. tried to rebuild shadow database then backup but that failed as well 4. tried dsmc incremental which found eligible files and worked i want to use snapshots which seems to be problematic please let me know if you have any recommendations or ideas command used: mmbackup /gpfs/mariana/home/ -S home_snapshot_2024-01-25-09-23 -t incremental -L 2 -d --scope inodespace -v logs: DEBUGtsbackup33:oldShadow found, validShadows: 1 DEBUGtsbackup33:shadow DB /gpfs/mariana/home/.mmbackupShadow.1.gv-backup.fileset format version 1400 .. .. .. DEBUGtsbackup33: policy0: Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule DEBUGtsbackup33: policy0: 0 86091506 16736 0 0 0 RULE 'Exclude Rule' LIST 'mmbackup.1.gv-backup' EXCLUDE DIRECTORIES_PLUS WHERE(.) DEBUGtsbackup33: policy0: 1 0 0 0 0 0 RULE 'BackupR ule' LIST 'mmbackup.1.gv-backup' DIRECTORIES_PLUS SHOW(.) WHERE(.) DEBUGtsbackup33: policy0: DEBUGtsbackup33: policy0: [I] Filesystem objects with no applicable rules: 0. DEBUGtsbackup33: policy0: DEBUGtsbackup33: policy0: Predicted Data Pool Utilization in KB and %: DEBUGtsbackup33: policy0: Pool_Name KB_Occupied KB_Total Percent_Occupied DEBUGtsbackup33: policy0: system 623453700096 1595564097536 39.074187058% Thu Jan 25 10:01:09 2024 mmbackup:New list file /gpfs/mariana/home/.mmbackupCfg/prepFiles/list.mmbackup.1.gv-backup is missing or emp ty. Do the TSM include/exclude rules exclude all contents of /gpfs/mariana/home for TSM server gv-backup ? Thu Jan 25 10:01:09 2024 mmbackup:No changed or deleted files for mariana.home since mmbackup was last invoked. DEBUGtsbackup33: Changed:0 Backed:0 Excl:0 DSMC:0 anyFail:1 Audit:0 Severe:0 FailuresFound:0 Thu Jan 25 10:01:09 2024 mmbackup:Incremental backup completely failed. TSM had 0 severe errors and returned 0. See the TSM log file for more information. 0 files had errors, TSM exit status: exit 12 ---------------------------------------------------------- mmbackup: Backup of /gpfs/mariana/home completed with errors at Thu Jan 25 10:01:09 EET 2024. ---------------------------------------------------------- mmbackup: Command failed. Examine previous error messages to determine cause. ----------------------------------------------------------------------------------------------- i will also add the policy file RULE 'ExcludeRule' LIST 'mmbackup.1.gv-backup' EXCLUDE DIRECTORIES_PLUS WHERE ((PATH_NAME LIKE '%/.TsmCacheDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.TsmCacheDir/%')) OR ((PATH_NAME LIKE '%/.pip%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.pip%%%/%')) OR ((PATH_NAME LIKE '%/site-packages%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/site-packages%%%/%')) OR ((PATH_NAME LIKE '%/pip%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/pip%%%/%')) OR ((PATH_NAME LIKE '%/.local%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.local%%%/%')) OR ((PATH_NAME LIKE '%/.cache%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.cache%%%/%')) OR ((PATH_NAME LIKE '%/cache%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/cache%%%/%')) OR ((PATH_NAME LIKE '%/.tmp%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.tmp%%%/%')) OR ((PATH_NAME LIKE '%/tmp%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/tmp%%%/%')) OR ((PATH_NAME LIKE '%/.miniconda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.miniconda%%%/%')) OR ((PATH_NAME LIKE '%/miniconda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/miniconda%%%/%')) OR ((PATH_NAME LIKE '%/.snapshots' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.snapshots/%')) OR ((PATH_NAME LIKE '%/.conda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.conda%%%/%')) OR ((PATH_NAME LIKE '%/conda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/conda%%%/%')) OR ((PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/nagyim/%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/nagyim/%%%/%')) OR (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (MODE LIKE 's%') OR (PATH_NAME LIKE '/gpfs/mariana/.mmSharedTmpDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/gpfs/mariana/.mmSharedTmpDir/%') OR (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') RULE 'BackupRule' LIST 'mmbackup.1.gv-backup' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || 'resdnt' ) WHERE (MISC_ATTRIBUTES LIKE '%u%') AND (NOT (PATH_NAME LIKE '/%.swp' AND NOT MODE LIKE 'd%' )) AND (NOT (PATH_NAME LIKE '/%.tmp' AND NOT MODE LIKE 'd%' )) OR (((PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/%') AND (MISC_ATTRIBUTES LIKE '%u%') ) AND ((NOT (PATH_NAME LIKE '/%.swp' AND NOT MODE LIKE 'd%' ))) AND ((NOT (PATH_NAME LIKE '/%.tmp' AND NOT MODE LIKE 'd%' ))) ) Kind Regards, Ahmed Nasr Administrator of cloud service -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jan 25 10:38:35 2024 From: scale at us.ibm.com (scale) Date: Thu, 25 Jan 2024 10:38:35 +0000 Subject: [gpfsug-discuss] mmbackup against snapshot backup failing In-Reply-To: References: Message-ID: Please contact Scale support team and open a case. From: gpfsug-discuss on behalf of Ahmed Nasr Date: Thursday, January 25, 2024 at 3:47?AM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] mmbackup against snapshot backup failing Good Day, this is my first time posting. it is great to talk to everyone here. so i have an issue where mmbackup is not working properly and i don't know why. Problem: mmbackup doesn't find any files that are eligible for backup. i used to ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd Good Day, this is my first time posting. it is great to talk to everyone here. so i have an issue where mmbackup is not working properly and i don't know why. Problem: mmbackup doesn't find any files that are eligible for backup. i used to work with dsmc and it is working fine but i want to employ the snapshot option within mmbackup. here are my steps: 1. backup to tsm with mmbackup using the full backup option . that took around one month 2. tried to do an incremental backup with mmbackup against a snapshot (Failed) 3. tried to rebuild shadow database then backup but that failed as well 4. tried dsmc incremental which found eligible files and worked i want to use snapshots which seems to be problematic please let me know if you have any recommendations or ideas command used: mmbackup /gpfs/mariana/home/ -S home_snapshot_2024-01-25-09-23 -t incremental -L 2 -d --scope inodespace -v logs: DEBUGtsbackup33:oldShadow found, validShadows: 1 DEBUGtsbackup33:shadow DB /gpfs/mariana/home/.mmbackupShadow.1.gv-backup.fileset format version 1400 .. .. .. DEBUGtsbackup33: policy0: Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule DEBUGtsbackup33: policy0: 0 86091506 16736 0 0 0 RULE 'Exclude Rule' LIST 'mmbackup.1.gv-backup' EXCLUDE DIRECTORIES_PLUS WHERE(.) DEBUGtsbackup33: policy0: 1 0 0 0 0 0 RULE 'BackupR ule' LIST 'mmbackup.1.gv-backup' DIRECTORIES_PLUS SHOW(.) WHERE(.) DEBUGtsbackup33: policy0: DEBUGtsbackup33: policy0: [I] Filesystem objects with no applicable rules: 0. DEBUGtsbackup33: policy0: DEBUGtsbackup33: policy0: Predicted Data Pool Utilization in KB and %: DEBUGtsbackup33: policy0: Pool_Name KB_Occupied KB_Total Percent_Occupied DEBUGtsbackup33: policy0: system 623453700096 1595564097536 39.074187058% Thu Jan 25 10:01:09 2024 mmbackup:New list file /gpfs/mariana/home/.mmbackupCfg/prepFiles/list.mmbackup.1.gv-backup is missing or emp ty. Do the TSM include/exclude rules exclude all contents of /gpfs/mariana/home for TSM server gv-backup ? Thu Jan 25 10:01:09 2024 mmbackup:No changed or deleted files for mariana.home since mmbackup was last invoked. DEBUGtsbackup33: Changed:0 Backed:0 Excl:0 DSMC:0 anyFail:1 Audit:0 Severe:0 FailuresFound:0 Thu Jan 25 10:01:09 2024 mmbackup:Incremental backup completely failed. TSM had 0 severe errors and returned 0. See the TSM log file for more information. 0 files had errors, TSM exit status: exit 12 ---------------------------------------------------------- mmbackup: Backup of /gpfs/mariana/home completed with errors at Thu Jan 25 10:01:09 EET 2024. ---------------------------------------------------------- mmbackup: Command failed. Examine previous error messages to determine cause. ----------------------------------------------------------------------------------------------- i will also add the policy file RULE 'ExcludeRule' LIST 'mmbackup.1.gv-backup' EXCLUDE DIRECTORIES_PLUS WHERE ((PATH_NAME LIKE '%/.TsmCacheDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.TsmCacheDir/%')) OR ((PATH_NAME LIKE '%/.pip%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.pip%%%/%')) OR ((PATH_NAME LIKE '%/site-packages%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/site-packages%%%/%')) OR ((PATH_NAME LIKE '%/pip%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/pip%%%/%')) OR ((PATH_NAME LIKE '%/.local%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.local%%%/%')) OR ((PATH_NAME LIKE '%/.cache%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.cache%%%/%')) OR ((PATH_NAME LIKE '%/cache%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/cache%%%/%')) OR ((PATH_NAME LIKE '%/.tmp%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.tmp%%%/%')) OR ((PATH_NAME LIKE '%/tmp%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/tmp%%%/%')) OR ((PATH_NAME LIKE '%/.miniconda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.miniconda%%%/%')) OR ((PATH_NAME LIKE '%/miniconda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/miniconda%%%/%')) OR ((PATH_NAME LIKE '%/.snapshots' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.snapshots/%')) OR ((PATH_NAME LIKE '%/.conda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.conda%%%/%')) OR ((PATH_NAME LIKE '%/conda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/conda%%%/%')) OR ((PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/nagyim/%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/nagyim/%%%/%')) OR (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (MODE LIKE 's%') OR (PATH_NAME LIKE '/gpfs/mariana/.mmSharedTmpDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/gpfs/mariana/.mmSharedTmpDir/%') OR (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') RULE 'BackupRule' LIST 'mmbackup.1.gv-backup' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || 'resdnt' ) WHERE (MISC_ATTRIBUTES LIKE '%u%') AND (NOT (PATH_NAME LIKE '/%.swp' AND NOT MODE LIKE 'd%' )) AND (NOT (PATH_NAME LIKE '/%.tmp' AND NOT MODE LIKE 'd%' )) OR (((PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/%') AND (MISC_ATTRIBUTES LIKE '%u%') ) AND ((NOT (PATH_NAME LIKE '/%.swp' AND NOT MODE LIKE 'd%' ))) AND ((NOT (PATH_NAME LIKE '/%.tmp' AND NOT MODE LIKE 'd%' ))) ) Kind Regards, Ahmed Nasr Administrator of cloud service -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Jan 25 17:45:42 2024 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 25 Jan 2024 17:45:42 +0000 Subject: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? In-Reply-To: References: Message-ID: Hi Kevin, I think there is some misconception about how FSStruct errors are detected and handled. All nodes in a Storage Scale cluster have a health monitoring daemon running (backend for mmhealth cmd) which monitors the individual components and listens to callbacks to detect issues like FSStruct errors. As you correctly mentioned, the FSStruct callbacks will be fired on the Filesystem-Manager nodes only and therefore raise a new mmhealth event on that node. You can see those events running mmhealth node show on that node. Irrespective of the fact if this is an EMS node or an IO node, mmhealth will forward any event to the cluster manager to provide a consolidated cluster wide state view (mmhealth cluster show) In addition, all events will be forwarded to the GUI, which will show those events as alerts. Since many customers have their own monitoring system we provide multiple ways to get notified about new events: * Scale GUI allows to configure Email notifications or SNMP traps https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=gui-event-notifications * mmhealth offers a modern webhook interface https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=command-configuring-webhook-by-using-mmhealth * mmhealth can call user defined scripts to trigger any custom notification tool ????? https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=mhn-running-user-defined-script-when-event-is-raised * 3rd party monitoring tools can use the REST API or mmhealth CLIs to poll the system status https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=endpoints-nodesnamehealthstates-get Depending on which option you choose and where your external monitoring system is running you need to ensure that there is a network route to the system. (e.g. GUI Email & SNMP need the EMS node to talk to the server, webhook/custom script will need any node to talk to the server) ESS IO nodes are not necessarily restricted to an internal network. We have many customers who attach their ESS to their campus network for central management and monitoring. If you have further questions or want to hear more about monitoring & notifications, I can offer to schedule a webex session with you. best regards Mathias Dietz Storage Scale RAS Architect IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ________________________________ From: gpfsug-discuss on behalf of Buterbaugh, Kevin Lynn Sent: Wednesday, January 24, 2024 6:08 PM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ZjQcmQRYFpfptBannerEnd Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? default configuration, which also matches the recommended configuration in every last piece of IBM documentation that I?ve ever come across, you WILL NOT be notified of filesystem corruption?!? Do you think IBM should fix this ASAP? If so, please up vote https://ideas.ibm.com/ideas/ESS-I-61. If you, like me, consider this a bug in the existing product and not a ?feature enhancement? to maybe be included in some future release if we?re lucky, then please keep reading. Here?s the gory details to the best of my understanding? Your IBM ESS can and will detect filesystem corruption (FS_STRUCT errors). But it currently will NOT, and cannot, let you know that it?s happened. The reason is that FS_STRUCT errors are detected only on the filesystem manager node, which makes sense. But if you?re running in the default and recommended configuration your filesystem manager node is one of the I/O nodes, not the EMS node. The I/O nodes have no way to communicate anything out to you unless IBM decides to configure them to do so ? like they ALREADY DO with other things like hardware events ? by routing the error thru the EMS node which can send it on to you. You could fix this problem yourself by writing a custom callback script to send you an e-mail (or a text) whenever an FS_STRUCT error is detected by the filesystem manager node ? EXCEPT that you?d need mailx / postfix or something like that and IBM doesn?t provide you with a way to install them on the I/O nodes. As an aside, if you?re NOT on an ESS (i.e. running GPFS on some sort of commodity hardware) you can and should do this! There is a workaround for this issue, which is to run your filesystem manager(s) on the EMS node. However, 1) this goes against IBM?s recommendations (and defaults), and 2) is not possible for larger ESS systems as the EMS node doesn?t have enough RAM to handle the filesystem manager function. Personally, I think it?s absolutely crazy that an I/O node can tell you that you?ve got a pdisk failure but can?t tell you that you?ve got filesystem corruption! If you agree, then please up vote the RFE above. Even if you don?t agree, let me ask you to consider up voting the RFE anyway. Why? To send a message to IBM that you consider it unacceptable for them to allow a customer (me, obviously) to open up a support ticket for this very issue (again, I consider this a very serious bug, not a feature enhancement) in July of 2023, work with the customer for 6 months, and then blow the customer off by telling them, and I quote: ?As per the dev team, this feature has been in this way since really old versions and has not changed which means that is not going to change soon. You can request an RFE with your idea for the development team to take it into account. Below I share the link where you can share your idea (RFE):? ?Not going to change soon.? Thanks for nothing, IBM ? well, I do appreciate your honesty. I?ve got one other RFE out there - submitted in August of 2022 - and its? status is still ?Future Consideration.? I guess I?ll just keep my fingers crossed that I never have filesystem corruption on an ESS. But if I do, let me highly recommend to you that you not assign me one of your support personnel who does not understand that 1 plus 4 does not equal 6 ? or that October comes before November on the calendar (both of which I have actually had happen to me in the last 6 months; no, sadly, I am not joking or exaggerating in the least). To all the IBMers reading this I want you to know that I personally consider the ESS and GPFS to be the best storage solution out there from a technical perspective ? I truly do. But that is rapidly becoming irrelevant when you are also doing things like the above, especially when you are overly proud (I think you know what I mean) of your support even if it was good, which it used to be but sadly no longer is. IBMers, I?m sure you don?t like this bit of public shaming. Guess what? I don?t like doing it. But I have complained directly to IBM about these things for quite some time now (ask my sales rep if you don?t believe me) and it?s done no good whatsoever. Not only did I count to 100 before composing this e-mail, I slept on it. I don?t know what else to do when things aren?t changing. But I promise you this, if you?ll stop doing stuff like this I will absolutely be more than glad to never have to send another e-mail like this one again. Deal? Thank you, all? Kevin B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jan 30 15:54:07 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 30 Jan 2024 15:54:07 +0000 Subject: [gpfsug-discuss] GUI followup Message-ID: <70acbe2f-5160-4473-a932-4f7bb33126b4@strath.ac.uk> We acquired a couple of refurb x3550 M5's and I have been following the instructions to install but I have hit a stumbling block. Basically the following command fails /usr/lpp/mmfs/gui/cli/initgui --xcat giving [root at gui1 ~]# /usr/lpp/mmfs/gui/cli/initgui --xcat xcat EFSSG4309C Host "xcat" provided as xCAT host is probably not an xCAT host, because /opt/xcat/bin/lsxcatd -v cannot be executed. However that's not accurate [root at xcat ~]# /opt/xcat/bin/lsxcatd -v Version 2.16.3 (git commit d1cdf8b35aa5b0bd736796acafd8f051eb07b5bf, built Tue Dec 7 15:03:13 GMT 2021) My guess is that the GUI node needs to be able to do passwordless SSH onto the xcat/confluent server. That is not setup and I can't find any documentation to say it should be. Noting that passwordless SSH onto the xcat server has never been configured on our system either for the DSS-G nodes or any of the compute nodes. Am I missing something? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From novosirj at rutgers.edu Tue Jan 30 16:19:06 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 30 Jan 2024 16:19:06 +0000 Subject: [gpfsug-discuss] GUI followup In-Reply-To: <70acbe2f-5160-4473-a932-4f7bb33126b4@strath.ac.uk> References: <70acbe2f-5160-4473-a932-4f7bb33126b4@strath.ac.uk> Message-ID: <9914D56F-78D0-4A80-A252-366BD931BECA@rutgers.edu> Are you intending to use this with xCAT and not Confluent? -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Jan 30, 2024, at 10:54, Jonathan Buzzard wrote: We acquired a couple of refurb x3550 M5's and I have been following the instructions to install but I have hit a stumbling block. Basically the following command fails /usr/lpp/mmfs/gui/cli/initgui --xcat giving [root at gui1 ~]# /usr/lpp/mmfs/gui/cli/initgui --xcat xcat EFSSG4309C Host "xcat" provided as xCAT host is probably not an xCAT host, because /opt/xcat/bin/lsxcatd -v cannot be executed. However that's not accurate [root at xcat ~]# /opt/xcat/bin/lsxcatd -v Version 2.16.3 (git commit d1cdf8b35aa5b0bd736796acafd8f051eb07b5bf, built Tue Dec 7 15:03:13 GMT 2021) My guess is that the GUI node needs to be able to do passwordless SSH onto the xcat/confluent server. That is not setup and I can't find any documentation to say it should be. Noting that passwordless SSH onto the xcat server has never been configured on our system either for the DSS-G nodes or any of the compute nodes. Am I missing something? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncalimet at lenovo.com Tue Jan 30 18:06:57 2024 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Tue, 30 Jan 2024 18:06:57 +0000 Subject: [gpfsug-discuss] [External] GUI followup In-Reply-To: <70acbe2f-5160-4473-a932-4f7bb33126b4@strath.ac.uk> References: <70acbe2f-5160-4473-a932-4f7bb33126b4@strath.ac.uk> Message-ID: Hi, Try the following command on the Confluent management server: osdeploy initialize -l The last option is dash-lowercase-L. See meaning of the multiple initialize options with "osdeploy initialize -h". HTH -- Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Tuesday, January 30, 2024 16:54 To: gpfsug main discussion list Subject: [External] [gpfsug-discuss] GUI followup We acquired a couple of refurb x3550 M5's and I have been following the instructions to install but I have hit a stumbling block. Basically the following command fails /usr/lpp/mmfs/gui/cli/initgui --xcat giving [root at gui1 ~]# /usr/lpp/mmfs/gui/cli/initgui --xcat xcat EFSSG4309C Host "xcat" provided as xCAT host is probably not an xCAT host, because /opt/xcat/bin/lsxcatd -v cannot be executed. However that's not accurate [root at xcat ~]# /opt/xcat/bin/lsxcatd -v Version 2.16.3 (git commit d1cdf8b35aa5b0bd736796acafd8f051eb07b5bf, built Tue Dec 7 15:03:13 GMT 2021) My guess is that the GUI node needs to be able to do passwordless SSH onto the xcat/confluent server. That is not setup and I can't find any documentation to say it should be. Noting that passwordless SSH onto the xcat server has never been configured on our system either for the DSS-G nodes or any of the compute nodes. Am I missing something? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From Renar.Grunenberg at huk-coburg.de Wed Jan 31 09:29:10 2024 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 31 Jan 2024 09:29:10 +0000 Subject: [gpfsug-discuss] mmbackup against snapshot backup failing In-Reply-To: References: Message-ID: <887b879d777d49a78b250e23ffb21803@huk-coburg.de> Hallo Ahmend, at first to understand mmbackup: you should know that mmbackup use a fs or independend fileset snapshots (--scope) to compare the content of shadow db with the generated snapshot. This compare generated filelist that are then used to make a selective backup over dsmc. To generate these mentioned snapshot you must paramized mmbackup with -S. If you had trouble you should run at first a rebuild oft he shadowdb with --rebuild. I think a backup over a snapshot itself doesnt work. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. Helen Reck, Dr. J?rg Rheinl?nder, Thomas Sehn, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss Im Auftrag von Ahmed Nasr Gesendet: Donnerstag, 25. Januar 2024 09:45 An: gpfsug-discuss at gpfsug.org Betreff: [gpfsug-discuss] mmbackup against snapshot backup failing Good Day, this is my first time posting. it is great to talk to everyone here. so i have an issue where mmbackup is not working properly and i don't know why. Problem: mmbackup doesn't find any files that are eligible for backup. i used to work with dsmc and it is working fine but i want to employ the snapshot option within mmbackup. here are my steps: 1. backup to tsm with mmbackup using the full backup option . that took around one month 2. tried to do an incremental backup with mmbackup against a snapshot (Failed) 3. tried to rebuild shadow database then backup but that failed as well 4. tried dsmc incremental which found eligible files and worked i want to use snapshots which seems to be problematic please let me know if you have any recommendations or ideas command used: mmbackup /gpfs/mariana/home/ -S home_snapshot_2024-01-25-09-23 -t incremental -L 2 -d --scope inodespace -v logs: DEBUGtsbackup33:oldShadow found, validShadows: 1 DEBUGtsbackup33:shadow DB /gpfs/mariana/home/.mmbackupShadow.1.gv-backup.fileset format version 1400 .. .. .. DEBUGtsbackup33: policy0: Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule DEBUGtsbackup33: policy0: 0 86091506 16736 0 0 0 RULE 'Exclude Rule' LIST 'mmbackup.1.gv-backup' EXCLUDE DIRECTORIES_PLUS WHERE(.) DEBUGtsbackup33: policy0: 1 0 0 0 0 0 RULE 'BackupR ule' LIST 'mmbackup.1.gv-backup' DIRECTORIES_PLUS SHOW(.) WHERE(.) DEBUGtsbackup33: policy0: DEBUGtsbackup33: policy0: [I] Filesystem objects with no applicable rules: 0. DEBUGtsbackup33: policy0: DEBUGtsbackup33: policy0: Predicted Data Pool Utilization in KB and %: DEBUGtsbackup33: policy0: Pool_Name KB_Occupied KB_Total Percent_Occupied DEBUGtsbackup33: policy0: system 623453700096 1595564097536 39.074187058% Thu Jan 25 10:01:09 2024 mmbackup:New list file /gpfs/mariana/home/.mmbackupCfg/prepFiles/list.mmbackup.1.gv-backup is missing or emp ty. Do the TSM include/exclude rules exclude all contents of /gpfs/mariana/home for TSM server gv-backup ? Thu Jan 25 10:01:09 2024 mmbackup:No changed or deleted files for mariana.home since mmbackup was last invoked. DEBUGtsbackup33: Changed:0 Backed:0 Excl:0 DSMC:0 anyFail:1 Audit:0 Severe:0 FailuresFound:0 Thu Jan 25 10:01:09 2024 mmbackup:Incremental backup completely failed. TSM had 0 severe errors and returned 0. See the TSM log file for more information. 0 files had errors, TSM exit status: exit 12 ---------------------------------------------------------- mmbackup: Backup of /gpfs/mariana/home completed with errors at Thu Jan 25 10:01:09 EET 2024. ---------------------------------------------------------- mmbackup: Command failed. Examine previous error messages to determine cause. ----------------------------------------------------------------------------------------------- i will also add the policy file RULE 'ExcludeRule' LIST 'mmbackup.1.gv-backup' EXCLUDE DIRECTORIES_PLUS WHERE ((PATH_NAME LIKE '%/.TsmCacheDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.TsmCacheDir/%')) OR ((PATH_NAME LIKE '%/.pip%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.pip%%%/%')) OR ((PATH_NAME LIKE '%/site-packages%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/site-packages%%%/%')) OR ((PATH_NAME LIKE '%/pip%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/pip%%%/%')) OR ((PATH_NAME LIKE '%/.local%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.local%%%/%')) OR ((PATH_NAME LIKE '%/.cache%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.cache%%%/%')) OR ((PATH_NAME LIKE '%/cache%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/cache%%%/%')) OR ((PATH_NAME LIKE '%/.tmp%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.tmp%%%/%')) OR ((PATH_NAME LIKE '%/tmp%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/tmp%%%/%')) OR ((PATH_NAME LIKE '%/.miniconda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.miniconda%%%/%')) OR ((PATH_NAME LIKE '%/miniconda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/miniconda%%%/%')) OR ((PATH_NAME LIKE '%/.snapshots' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.snapshots/%')) OR ((PATH_NAME LIKE '%/.conda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/.conda%%%/%')) OR ((PATH_NAME LIKE '%/conda%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '%/conda%%%/%')) OR ((PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/nagyim/%%%' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/nagyim/%%%/%')) OR (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (MODE LIKE 's%') OR (PATH_NAME LIKE '/gpfs/mariana/.mmSharedTmpDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/gpfs/mariana/.mmSharedTmpDir/%') OR (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') RULE 'BackupRule' LIST 'mmbackup.1.gv-backup' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || 'resdnt' ) WHERE (MISC_ATTRIBUTES LIKE '%u%') AND (NOT (PATH_NAME LIKE '/%.swp' AND NOT MODE LIKE 'd%' )) AND (NOT (PATH_NAME LIKE '/%.tmp' AND NOT MODE LIKE 'd%' )) OR (((PATH_NAME LIKE '/gpfs/mariana/home/.snapshots/home_snapshot_2024-01-25-09-23/%') AND (MISC_ATTRIBUTES LIKE '%u%') ) AND ((NOT (PATH_NAME LIKE '/%.swp' AND NOT MODE LIKE 'd%' ))) AND ((NOT (PATH_NAME LIKE '/%.tmp' AND NOT MODE LIKE 'd%' ))) ) Kind Regards, Ahmed Nasr Administrator of cloud service -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Jan 31 13:06:00 2024 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 31 Jan 2024 13:06:00 +0000 Subject: [gpfsug-discuss] Registration for German Storage Scale User Meeting 2024 In-Reply-To: References: Message-ID: Greetings! The registration for the German User Meeting 2024 is open. Please see the event page for registration link and draft agenda. Please join us! https://www.spectrumscaleug.org/event/german-user-meeting-2024/ I am looking forward to seeing you there. Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: gpfsug-discuss On Behalf Of Ulf Troppens Sent: Friday, 5 January 2024 10:07 To: gpfsug-discuss Subject: [EXTERNAL] [gpfsug-discuss] Save the date ? German User Meeting 2024 Greetings and Happy New Year! The German User Meeting will be held first week of March 2024 in Sindelfingen, Germany. There will be a New User Day on March 5, 2024, followed by the two-day regular Storage Scale User Meeting. Details on agenda ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ? ZjQcmQRYFpfptBannerEnd Greetings and Happy New Year! The German User Meeting will be held first week of March 2024 in Sindelfingen, Germany. There will be a New User Day on March 5, 2024, followed by the two-day regular Storage Scale User Meeting. Details on agenda and registration will be provided later. Please join us! Best, Ulf Ulf Troppens Product Manager - IBM Storage for Data and AI, Data-Intensive Workflows IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen / Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From klbuter at sandia.gov Wed Jan 31 17:37:05 2024 From: klbuter at sandia.gov (Buterbaugh, Kevin Lynn) Date: Wed, 31 Jan 2024 17:37:05 +0000 Subject: [gpfsug-discuss] ***SPAM*** Re: Wouldn't you like to know if you had filesystem corruption? In-Reply-To: References: Message-ID: Hi All, Mathias is exactly correct ? there was some misconception both on my part _and_ on the part of IBM support in regards to fsstruct errors! Mathias and I discussed this outside of this mailing list and he was able to clear up my misconceptions, for which I owe him many thanks. For those who may not read thru the rest of this e-mail I want to apologize up front for my misconceptions. As I?ll detail below, the truth is out there (), just not obvious nor easy to find. My rant against the capabilities of GPFS was misinformed and misguided and for that I apologize. My rant against IBM Support, unfortunately, was not. It turns out that with the way I had my ESS configured (default configuration plus e-mail notifications enabled and configured) we _will_ be notified of any fsstruct errors (we did a test which confirmed this) ? and many other things as well that are most definitely _not_ obvious if you just run ?mmlscallback? and see what it lists out. For a list of events that are monitored Mathias pointed me to this page in the IBM documentation: https://www.ibm.com/docs/en/storage-scale/5.1.8?topic=references-events (you can adjust the GPFS version appropriately for your site). My RFE will therefore be closed with ?feature already implemented.? We did agree that the lack of a clear linkage between the callbacks listed and the events that are actually monitored for is an area of potential confusion ? as evidently I?m not the only customer who has had a similar misunderstanding of the way things actually work. The frustrating thing for me is that I opened up a ticket, worked with IBM Support for 6 months, and then was blown off and told to file an RFE ? when what I was wanting _was in GPFS all along!_ Sigh. Maybe IBM could consider spending a little less money on marketers who change the name of the product every few years and take the money saved and apply it to improving the support organization? To be fair to the support personnel, as the quote from the ticket below indicates, they were in communication with at least one ?dev team? member who had the same lack of knowledge. I?m thankful to have had my misconceptions corrected by Mathias, but that also just makes it even more clear that there are some severe issues with the IBM Support organization. I hope that IBM can take steps to turn that ship around. Thanks all? Kevin B. From: gpfsug-discuss on behalf of Mathias Dietz Date: Thursday, January 25, 2024 at 10:52?AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? You don't often get email from mdietz at de.ibm.com. Learn why this is important Hi Kevin, I think there is some misconception about how FSStruct errors are detected and handled. All nodes in a Storage Scale cluster have a health monitoring daemon running (backend for mmhealth cmd) which monitors the individual components and listens to callbacks to detect issues like FSStruct errors. As you correctly mentioned, the FSStruct callbacks will be fired on the Filesystem-Manager nodes only and therefore raise a new mmhealth event on that node. You can see those events running mmhealth node show on that node. Irrespective of the fact if this is an EMS node or an IO node, mmhealth will forward any event to the cluster manager to provide a consolidated cluster wide state view (mmhealth cluster show) In addition, all events will be forwarded to the GUI, which will show those events as alerts. Since many customers have their own monitoring system we provide multiple ways to get notified about new events: * Scale GUI allows to configure Email notifications or SNMP traps https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=gui-event-notifications * mmhealth offers a modern webhook interface https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=command-configuring-webhook-by-using-mmhealth * mmhealth can call user defined scripts to trigger any custom notification tool ????? https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=mhn-running-user-defined-script-when-event-is-raised * 3rd party monitoring tools can use the REST API or mmhealth CLIs to poll the system status https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=endpoints-nodesnamehealthstates-get Depending on which option you choose and where your external monitoring system is running you need to ensure that there is a network route to the system. (e.g. GUI Email & SNMP need the EMS node to talk to the server, webhook/custom script will need any node to talk to the server) ESS IO nodes are not necessarily restricted to an internal network. We have many customers who attach their ESS to their campus network for central management and monitoring. If you have further questions or want to hear more about monitoring & notifications, I can offer to schedule a webex session with you. best regards Mathias Dietz Storage Scale RAS Architect IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Gesch?ftsf?hrung: David Faller Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ________________________________ From: gpfsug-discuss on behalf of Buterbaugh, Kevin Lynn Sent: Wednesday, January 24, 2024 6:08 PM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] Wouldn't you like to know if you had filesystem corruption? Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? Hi All, Wouldn?t you like to know if your IBM ESS had filesystem corruption? If you answered ?no? my guess is that you?ve never experienced undetected filesystem corruption! ? Did you know that if you?ve got an IBM ESS set up in its? default configuration, which also matches the recommended configuration in every last piece of IBM documentation that I?ve ever come across, you WILL NOT be notified of filesystem corruption?!? Do you think IBM should fix this ASAP? If so, please up vote https://ideas.ibm.com/ideas/ESS-I-61. If you, like me, consider this a bug in the existing product and not a ?feature enhancement? to maybe be included in some future release if we?re lucky, then please keep reading. Here?s the gory details to the best of my understanding? Your IBM ESS can and will detect filesystem corruption (FS_STRUCT errors). But it currently will NOT, and cannot, let you know that it?s happened. The reason is that FS_STRUCT errors are detected only on the filesystem manager node, which makes sense. But if you?re running in the default and recommended configuration your filesystem manager node is one of the I/O nodes, not the EMS node. The I/O nodes have no way to communicate anything out to you unless IBM decides to configure them to do so ? like they ALREADY DO with other things like hardware events ? by routing the error thru the EMS node which can send it on to you. You could fix this problem yourself by writing a custom callback script to send you an e-mail (or a text) whenever an FS_STRUCT error is detected by the filesystem manager node ? EXCEPT that you?d need mailx / postfix or something like that and IBM doesn?t provide you with a way to install them on the I/O nodes. As an aside, if you?re NOT on an ESS (i.e. running GPFS on some sort of commodity hardware) you can and should do this! There is a workaround for this issue, which is to run your filesystem manager(s) on the EMS node. However, 1) this goes against IBM?s recommendations (and defaults), and 2) is not possible for larger ESS systems as the EMS node doesn?t have enough RAM to handle the filesystem manager function. Personally, I think it?s absolutely crazy that an I/O node can tell you that you?ve got a pdisk failure but can?t tell you that you?ve got filesystem corruption! If you agree, then please up vote the RFE above. Even if you don?t agree, let me ask you to consider up voting the RFE anyway. Why? To send a message to IBM that you consider it unacceptable for them to allow a customer (me, obviously) to open up a support ticket for this very issue (again, I consider this a very serious bug, not a feature enhancement) in July of 2023, work with the customer for 6 months, and then blow the customer off by telling them, and I quote: ?As per the dev team, this feature has been in this way since really old versions and has not changed which means that is not going to change soon. You can request an RFE with your idea for the development team to take it into account. Below I share the link where you can share your idea (RFE):? ?Not going to change soon.? Thanks for nothing, IBM ? well, I do appreciate your honesty. I?ve got one other RFE out there - submitted in August of 2022 - and its? status is still ?Future Consideration.? I guess I?ll just keep my fingers crossed that I never have filesystem corruption on an ESS. But if I do, let me highly recommend to you that you not assign me one of your support personnel who does not understand that 1 plus 4 does not equal 6 ? or that October comes before November on the calendar (both of which I have actually had happen to me in the last 6 months; no, sadly, I am not joking or exaggerating in the least). To all the IBMers reading this I want you to know that I personally consider the ESS and GPFS to be the best storage solution out there from a technical perspective ? I truly do. But that is rapidly becoming irrelevant when you are also doing things like the above, especially when you are overly proud (I think you know what I mean) of your support even if it was good, which it used to be but sadly no longer is. IBMers, I?m sure you don?t like this bit of public shaming. Guess what? I don?t like doing it. But I have complained directly to IBM about these things for quite some time now (ask my sales rep if you don?t believe me) and it?s done no good whatsoever. Not only did I count to 100 before composing this e-mail, I slept on it. I don?t know what else to do when things aren?t changing. But I promise you this, if you?ll stop doing stuff like this I will absolutely be more than glad to never have to send another e-mail like this one again. Deal? Thank you, all? Kevin B. -------------- next part -------------- An HTML attachment was scrubbed... URL: