From cdmaestas at us.ibm.com Wed Jul 3 17:47:32 2024 From: cdmaestas at us.ibm.com (CHRIS MAESTAS) Date: Wed, 3 Jul 2024 16:47:32 +0000 Subject: [gpfsug-discuss] Summer 2024 US Based Scale User Group Meetings! Message-ID: Hello everyone! There are a few US based Scale Meetings that are coming soon in July and August. Registration, details for these events are posted here: https://gpfsug.org/events/ and the main page. 30-Jul ? Scale User Group Meeting in Miami, FL, USA 6-7-August ? US West Coast Scale User Group Meeting in San Jose, CA, USA -- Chris Maestas, Chief Troublemaking Officer. 8) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amjadcsu at gmail.com Fri Jul 5 15:17:56 2024 From: amjadcsu at gmail.com (Amjad Syed) Date: Fri, 5 Jul 2024 15:17:56 +0100 Subject: [gpfsug-discuss] changing directory linked to fileset Message-ID: Hello all We are using an old GPFS version 5.0.5 and in process of upgrading I have created a fileset xyz and linked it to directory /gpfs/xyoz my executing the following command mmlinkfileset gpfs xyz -J /gpfs/xyoz But now i want to change the linked directory as i realized there is a typo there should be xyz , how should i proceed ? Majid -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ivan.Patrick.Lambert at ibm.com Fri Jul 5 15:29:51 2024 From: Ivan.Patrick.Lambert at ibm.com (Ivan Patrick Lambert) Date: Fri, 5 Jul 2024 14:29:51 +0000 Subject: [gpfsug-discuss] changing directory linked to fileset In-Reply-To: References: Message-ID: Hello Majid, basically you can try to unmount the fileset with: mmunlinkfileset xyz -J /gpfs/xyoz and linking it back with proper path. Kind regards, Ivan Patrick Lambert EMEA Storage Scale / Storage Scale System Engineer From: gpfsug-discuss on behalf of Amjad Syed Date: Friday, 5 July 2024 at 16:20 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] changing directory linked to fileset Hello all We are using an old GPFS version 5.?0.?5 and in process of upgrading I have created a fileset xyz and linked it to directory /gpfs/xyoz my executing the following command mmlinkfileset gpfs xyz -J /gpfs/xyoz But now i want to change Hello all We are using an old GPFS version 5.0.5 and in process of upgrading I have created a fileset xyz and linked it to directory /gpfs/xyoz my executing the following command mmlinkfileset gpfs xyz -J /gpfs/xyoz But now i want to change the linked directory as i realized there is a typo there should be xyz , how should i proceed ? Majid Unless otherwise stated above: IBM Hrvatska d.o.o. za proizvodnju i trgovinu Miramarska 23, 10 000 Zagreb, Hrvatska Upisan kod Trgova?kog suda u Zagrebu pod br. 080011422 Temeljni kapital: 788.000,00 kuna / 104.585,57 EUR (fiksni te?aj konverzije 7.53450) - upla?en u cijelosti Uprava dru?tva: Tomislav Balun, direktor i Nata?a Krupljanin, direktorica Predsjednik nadzornog odbora: Igor Pravica Ra?un kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69, 10000 Zagreb, Hrvatska IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ivan.Patrick.Lambert at ibm.com Fri Jul 5 15:33:47 2024 From: Ivan.Patrick.Lambert at ibm.com (Ivan Patrick Lambert) Date: Fri, 5 Jul 2024 14:33:47 +0000 Subject: [gpfsug-discuss] changing directory linked to fileset In-Reply-To: References: Message-ID: Correction: basically you can try to unmount the fileset with: mmunlinkfileset gpfs -J /gpfs/xyoz and link back with mmlinkfileset gpfs xyz -J /gpfs/xyz That should work Kind regards, Ivan Patrick Lambert EMEA Storage Scale / Storage Scale System Engineer From: gpfsug-discuss on behalf of Ivan Patrick Lambert Date: Friday, 5 July 2024 at 16:32 To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] changing directory linked to fileset Hello Majid, basically you can try to unmount the fileset with: mmunlinkfileset xyz -J /gpfs/xyoz and linking it back with proper path. Kind regards, Ivan Patrick Lambert EMEA Storage Scale / Storage Scale System Engineer From: gpfsug-discuss Hello Majid, basically you can try to unmount the fileset with: mmunlinkfileset xyz -J /gpfs/xyoz and linking it back with proper path. Kind regards, Ivan Patrick Lambert EMEA Storage Scale / Storage Scale System Engineer From: gpfsug-discuss on behalf of Amjad Syed Date: Friday, 5 July 2024 at 16:20 To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] changing directory linked to fileset Hello all We are using an old GPFS version 5.?0.?5 and in process of upgrading I have created a fileset xyz and linked it to directory /gpfs/xyoz my executing the following command mmlinkfileset gpfs xyz -J /gpfs/xyoz But now i want to change Hello all We are using an old GPFS version 5.0.5 and in process of upgrading I have created a fileset xyz and linked it to directory /gpfs/xyoz my executing the following command mmlinkfileset gpfs xyz -J /gpfs/xyoz But now i want to change the linked directory as i realized there is a typo there should be xyz , how should i proceed ? Majid Unless otherwise stated above: IBM Hrvatska d.o.o. za proizvodnju i trgovinu Miramarska 23, 10 000 Zagreb, Hrvatska Upisan kod Trgova?kog suda u Zagrebu pod br. 080011422 Temeljni kapital: 788.000,00 kuna / 104.585,57 EUR (fiksni te?aj konverzije 7.53450) - upla?en u cijelosti Uprava dru?tva: Tomislav Balun, direktor i Nata?a Krupljanin, direktorica Predsjednik nadzornog odbora: Igor Pravica Ra?un kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69, 10000 Zagreb, Hrvatska IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622 Unless otherwise stated above: IBM Hrvatska d.o.o. za proizvodnju i trgovinu Miramarska 23, 10 000 Zagreb, Hrvatska Upisan kod Trgova?kog suda u Zagrebu pod br. 080011422 Temeljni kapital: 788.000,00 kuna / 104.585,57 EUR (fiksni te?aj konverzije 7.53450) - upla?en u cijelosti Uprava dru?tva: Tomislav Balun, direktor i Nata?a Krupljanin, direktorica Predsjednik nadzornog odbora: Igor Pravica Ra?un kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69, 10000 Zagreb, Hrvatska IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 5 16:58:46 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Jul 2024 16:58:46 +0100 Subject: [gpfsug-discuss] changing directory linked to fileset In-Reply-To: References: Message-ID: <713a0ddb-284b-4c01-8818-967bd2f79ae3@strath.ac.uk> On 05/07/2024 15:17, Amjad Syed wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Hello all > > We are using an old GPFS version 5.0.5 and in process of? upgrading > > I have created a fileset xyz? and linked it to directory /gpfs/xyoz my > executing the following command > mmlinkfileset gpfs? xyz -J /gpfs/xyoz > > But now? i want to change the linked directory as i realized there is a > typo there should be xyz , how should i proceed ? > Use mmunlinkfileset to unlink the fileset and then link it in the new location. It of course requires that there are no open files in the fileset. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From l.r.sudbery at bham.ac.uk Fri Jul 5 18:31:09 2024 From: l.r.sudbery at bham.ac.uk (Luke Sudbery) Date: Fri, 5 Jul 2024 17:31:09 +0000 Subject: [gpfsug-discuss] Bad disk but not failed in DSS-G In-Reply-To: References: <82308b7e-9872-42f6-902f-850596840a4a@strath.ac.uk> <6103c18f3945d39c1950a9be759408ad15b0014b.camel@de.ibm.com> Message-ID: Have you opened a ticket with Lenovo and/or IBM about this? If there is a genuine bug here (and it seems there might be), that's the way to get it fixed. Generally find the disk hospital very reliable and it takes disks out of "rotation" (pun intended) for slow performance long before they cause any problems, but have yet to see it the other way round - although if it's not reporting things we could be missing them... Cheers, Luke -- Luke Sudbery Principal Engineer (HPC and Storage). Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don?t work on Monday. -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Monday, June 24, 2024 1:52 PM To: Achim Rehor ; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Bad disk but not failed in DSS-G CAUTION: This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe. On 24/06/2024 13:16, Achim Rehor wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > well ... not necessarily ? > but on the disk ... just as i expected ... taking it out helps a lot. > > Now on taking it out automatically when raising too many errors was a > discussion i had several times with the GNR development. > The issue really is .. I/O errors on disks (as seen in the > mmlsrecoverygroupevent logs) can be due to several issues (the disk > itself, > the expander, the IOM, the adapter, the cable ... ) > in case of a more general part serving like 5 or more pdisks, that would > risk the FT , if we took them out automatically. > Thus ... we dont do that .. > When smartctl for the disk says Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 33839 32 0 0 137434.705 32 write: 0 36 0 0 0 178408.893 0 Non-medium error count: 0 A disk with 32 read errors in smartctl is fubar, no ifs no buts. Whatever the balance in ejecting bad disks is, IMHO currently it's in the wrong place because it failed to eject an actual bad disk. At an absolute bare minimum mmhealth should be not be saying everything is fine and dandy because clearly it was not. That's the bigger issue. I can live with them not been taken out automatically, it is unacceptable that mmhealth was giving false and inaccurate information about the state of the filesystem. Had it even just changed something to a "degraded" state the problems could have been picked up much much sooner. Presumably the disk category was still good because the vdisk's where theoretically good. I suggest renaming that to VDISK to more accurately reflect what it is about and add a PDISK category. Then when a pdisk starts showing IO errors you can increment the number of disks in a degraded state and it can be picked up without end users having to roll their own monitoring. > The idea is to improve the disk hospital more and more, so that the > decision to switch a disk back to OK is more accurate, over time. > > Until then .. it might always be a good idea to scan the event log for > pdisk errors ... > That is my conclusion, that mmhealth is as useful as a chocolate teapot because you can't rely on it to provide correct information and I need to do my own health monitoring of the system. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From novosirj at rutgers.edu Fri Jul 5 23:27:59 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 5 Jul 2024 22:27:59 +0000 Subject: [gpfsug-discuss] Bad disk but not failed in DSS-G In-Reply-To: References: <82308b7e-9872-42f6-902f-850596840a4a@strath.ac.uk> <6103c18f3945d39c1950a9be759408ad15b0014b.camel@de.ibm.com> Message-ID: This is not common in my experience; I wouldn?t worry about it that much. We have 5 of these things currently between GSS and DSS-G, from GPFS 4.1.0-8 to 5.1.8-2 and have only seen a similar situation once. Ours fail disks all the time before we even notice anything is wrong. But do report it as a bug. What?s the hardware on this thing? Sent from my iPhone > On Jun 24, 2024, at 08:53, Jonathan Buzzard wrote: > > ?On 24/06/2024 13:16, Achim Rehor wrote: >> CAUTION: This email originated outside the University. Check before clicking links or attachments. >> well ... not necessarily ? >> but on the disk ... just as i expected ... taking it out helps a lot. >> Now on taking it out automatically when raising too many errors was a discussion i had several times with the GNR development. >> The issue really is .. I/O errors on disks (as seen in the mmlsrecoverygroupevent logs) can be due to several issues (the disk itself, >> the expander, the IOM, the adapter, the cable ... ) >> in case of a more general part serving like 5 or more pdisks, that would risk the FT , if we took them out automatically. >> Thus ... we dont do that .. > > When smartctl for the disk says > > Error counter log: > Errors Corrected by Total Correction Gigabytes Total > ECC rereads/ errors algorithm processed uncorrected > fast | delayed rewrites corrected invocations [10^9 bytes] errors > read: 0 33839 32 0 0 137434.705 32 > write: 0 36 0 0 0 178408.893 0 > > Non-medium error count: 0 > > > A disk with 32 read errors in smartctl is fubar, no ifs no buts. Whatever the balance in ejecting bad disks is, IMHO currently it's in the wrong place because it failed to eject an actual bad disk. > > At an absolute bare minimum mmhealth should be not be saying everything is fine and dandy because clearly it was not. That's the bigger issue. I can live with them not been taken out automatically, it is unacceptable that mmhealth was giving false and inaccurate information about the state of the filesystem. Had it even just changed something to a "degraded" state the problems could have been picked up much much sooner. > > Presumably the disk category was still good because the vdisk's where theoretically good. I suggest renaming that to VDISK to more accurately reflect what it is about and add a PDISK category. Then when a pdisk starts showing IO errors you can increment the number of disks in a degraded state and it can be picked up without end users having to roll their own monitoring. > >> The idea is to improve the disk hospital more and more, so that the decision to switch a disk back to OK is more accurate, over time. >> Until then .. it might always be a good idea to scan the event log for pdisk errors ... > > That is my conclusion, that mmhealth is as useful as a chocolate teapot because you can't rely on it to provide correct information and I need to do my own health monitoring of the system. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From novosirj at rutgers.edu Sat Jul 6 05:50:22 2024 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 6 Jul 2024 04:50:22 +0000 Subject: [gpfsug-discuss] Bad disk but not failed in DSS-G In-Reply-To: <82308b7e-9872-42f6-902f-850596840a4a@strath.ac.uk> References: <82308b7e-9872-42f6-902f-850596840a4a@strath.ac.uk> Message-ID: On Jun 24, 2024, at 05:41, Jonathan Buzzard wrote: On 20/06/2024 23:32, Achim Rehor wrote: [SNIP] Fred is most probably correct here. the two errors are not necessarily the same. Turns out Fred was incorrect and having pushed the bad disk out the file system the backups magically started working again. Not that, that should come as the slightest surprise to anyone. Not saying it?s impossible but I strongly suspect that that is a coincidence, or your drive was causing enough problems that somehow it was also affecting the storage server?s ability to respond to a client or something. I?ve never once seen any sort of media errors making it all the way to a filesystem issue. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' -------------- next part -------------- An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Thu Jul 11 11:16:31 2024 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Thu, 11 Jul 2024 12:16:31 +0200 Subject: [gpfsug-discuss] Job opportunity at PSI - HPC SysAdmin Message-ID: Dear all, sorry to profit from this channel, but we are looking for a sysadmin to help us supporting the photon facilities at PSI, where we do maintain multiple Storage Scale clusters. Info and application are available here: https://www.psi.ch/en/hr/job-opportunities/63794-devops-systems-engineer-80-100 Feel free to forward this to any interest party :) Thanks! Leo -- Paul Scherrer Institut Dr. Leonardo Sala Group Leader Data Analysis and Research Infrastructure Group Leader Data Curation a.i. Deputy Department Head Science IT Infrastructure and Services department Science IT Infrastructure and Services department (AWI) OBBA/230 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From juergen.hannappel at desy.de Fri Jul 12 15:31:18 2024 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Fri, 12 Jul 2024 16:31:18 +0200 (CEST) Subject: [gpfsug-discuss] RFE: mmled Message-ID: <1739364124.19745840.1720794678204.JavaMail.zimbra@desy.de> Hi, with 5 different ESS headnode hardware models (ok, in a reasonable time scale 4) it's always annoying if an orange LED indicates a prpblem and you have no idea why, finding out what the problem is is different between different models.... It would be nice to have an mmled command, wich would accorning to the hardware it's run on say eg: mmled show ... gives an outpu like No LEDs on. ... or: Led1 is lit; that probably means that foo is broken in bar... mmled clear ... and all LEDs which are left on from some transient past problem are switched of.... -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 From Marcy.D.Cortes at wellsfargo.com Fri Jul 12 21:54:32 2024 From: Marcy.D.Cortes at wellsfargo.com (Cortes, Marcy D.) Date: Fri, 12 Jul 2024 20:54:32 +0000 Subject: [gpfsug-discuss] Remove a client node from one cluster and add to a different one Message-ID: Is it as simple as ? stop gpfs on client with mmshutdown mmdelnode from cluster A go to new cluster B mmaddnode and then mmstart it? This is on Linux. Wondering if anything needs to be cleaned up in /var/mmfs or /tmp/mmfs Marcy -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jul 12 23:49:17 2024 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 12 Jul 2024 22:49:17 +0000 Subject: [gpfsug-discuss] Remove a client node from one cluster and add to a different one In-Reply-To: References: Message-ID: Marcy, I believe this should work. The mmshutdown is crucial in ensuring the operation is successful, even though mmdelnode checks to ensure the node is no longer up. I?d probably wait a few mins between the delete and the add to the other cluster, to ensure the cluster membership and config files get properly propagated. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 From: gpfsug-discuss on behalf of Cortes, Marcy D. Date: Friday, July 12, 2024 at 4:56 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Remove a client node from one cluster and add to a different one Is it as simple as ? stop gpfs on client with mmshutdown mmdelnode from cluster A go to new cluster B mmaddnode and then mmstart it? This is on Linux. Wondering if anything needs to be cleaned up in /var/mmfs or /tmp/mmfs Marcy ? ? ? ? ? ? ? ? ? ? ? Is it as simple as ? stop gpfs on client with mmshutdown mmdelnode from cluster A go to new cluster B mmaddnode and then mmstart it? This is on Linux. Wondering if anything needs to be cleaned up in /var/mmfs or /tmp/mmfs Marcy -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Mon Jul 15 09:16:48 2024 From: uwe.falke at kit.edu (Uwe Falke) Date: Mon, 15 Jul 2024 10:16:48 +0200 Subject: [gpfsug-discuss] RFE: mmled In-Reply-To: <1739364124.19745840.1720794678204.JavaMail.zimbra@desy.de> References: <1739364124.19745840.1720794678204.JavaMail.zimbra@desy.de> Message-ID: <5e47b56d-d6e4-4ca0-b417-08bdcd737c25@kit.edu> Hi, Juergen, have you proposed that at https://ideas.ibm.com/ ? I think it gets more attention by the relevant people there than here. Cheers Uwe On 12.07.24 16:31, Hannappel, Juergen wrote: > Hi, > with 5 different ESS headnode hardware models (ok, in a reasonable time scale 4) > it's always annoying if an orange LED indicates a prpblem and you have no idea why, > finding out what the problem is is different between different models.... > > It would be nice to have an mmled command, wich would accorning to the hardware > it's run on say eg: > > mmled show > ... gives an outpu like > No LEDs on. > ... or: > Led1 is lit; that probably means that foo is broken in bar... > > > mmled clear > ... and all LEDs which are left on from some transient past problem are switched of.... > -- Karlsruhe Institute of Technology (KIT) Scientific Computing Centre (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From Achim.Rehor at de.ibm.com Mon Jul 15 13:03:54 2024 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 15 Jul 2024 12:03:54 +0000 Subject: [gpfsug-discuss] Remove a client node from one cluster and add to a different one In-Reply-To: References: Message-ID: <1bcb733bce5640c0377134b42fa68147e46c7ffe.camel@de.ibm.com> I would tend to think the mmaddnode would fail, as GPFS recognizes that this node is already part of an existing cluster (due to the content of /var/mmfs) So following the steps in the admin guide, how to permanently remove GPFS from a node (related to the /tmp/mmfs and /var/mmfs dir) would be recommended -- Mit freundlichen Gr??en / Kind regards Achim Rehor Technical Support Specialist S?pectrum Scale and ESS (SME) Advisory Product Services Professional IBM Systems Storage Support - EMEA Achim.Rehor at de.ibm.com +49-170-4521194 IBM Deutschland GmbH -----Original Message----- From: Felipe Knop > Reply-To: gpfsug main discussion list > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Remove a client node from one cluster and add to a different one Date: Fri, 12 Jul 2024 22:49:17 +0000 Marcy, I believe this should work. The mmshutdown is crucial in ensuring the operation is successful, even though mmdelnode checks to ensure the node is no longer up. I?d probably wait a few mins between the delete and the add to the other Marcy, I believe this should work. Themmshutdown is crucial in ensuring the operation is successful, even thoughmmdelnode checks to ensure the node is no longer up. I?d probably wait a few mins between the delete and the add to the other cluster, to ensure the cluster membership and config files get properly propagated. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 From:gpfsug-discuss on behalf of Cortes, Marcy D. Date: Friday, July 12, 2024 at 4:56 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Remove a client node from one cluster and add to a different one Is it as simple as ? stop gpfs on client with mmshutdown mmdelnode from cluster A go to new cluster B mmaddnode and then mmstart it? This is on Linux. Wondering if anything needs to be cleaned up in /var/mmfs or /tmp/mmfs Marcy ? ? ? ? ? ? ? ? ? ? ? Is it as simple as ? stop gpfs on client with mmshutdown mmdelnode from cluster A go to new cluster B mmaddnode and then mmstart it? This is on Linux. Wondering if anything needs to be cleaned up in /var/mmfs or /tmp/mmfs Marcy _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Thu Jul 18 08:05:28 2024 From: uwe.falke at kit.edu (Uwe Falke) Date: Thu, 18 Jul 2024 09:05:28 +0200 Subject: [gpfsug-discuss] MLNX OFED 23.10? Message-ID: <748b80fd-b031-4927-9cb4-ceb6172b6ebf@kit.edu> Dear all, the MLNX OFED comes in version branches 5.x and 23.x Currently, RHEL 8.10 which we want to move to is only supported by 23.x (not by any 5.x). Are there any concerns using a 23.x OFED ( here:? 23.10-3.2.2.0) ? Thanks Uwe -- Karlsruhe Institute of Technology (KIT) Scientific Computing Centre (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From olaf.weiser at de.ibm.com Thu Jul 18 10:01:54 2024 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 18 Jul 2024 09:01:54 +0000 Subject: [gpfsug-discuss] MLNX OFED 23.10? In-Reply-To: <748b80fd-b031-4927-9cb4-ceb6172b6ebf@kit.edu> References: <748b80fd-b031-4927-9cb4-ceb6172b6ebf@kit.edu> Message-ID: Hallo Uwe, we have multiple customers and installations out there with a mix of , 5.x , 23.x, 24.x and Distro OFED... it may not by fully supported and tested in all combinations from NVIDIA or from us, but as said.. it works fine in many many scenarios .. no obstacles , which 'ld be based on the (mixed) release levels of OFED more important... please take care about adapter's firmware ..here we see issues from time to time in the field Olaf Weiser IBM Research&Development - client adoption ________________________________ Von: gpfsug-discuss im Auftrag von Uwe Falke Gesendet: Donnerstag, 18. Juli 2024 09:05 An: 'gpfsug main discussion list' Betreff: [EXTERNAL] [gpfsug-discuss] MLNX OFED 23.10? Dear all, the MLNX OFED comes in version branches 5.x and 23.x Currently, RHEL 8.10 which we want to move to is only supported by 23.x (not by any 5.x). Are there any concerns using a 23.x OFED ( here: 23.10-3.2.2.0) ? Thanks Uwe -- Karlsruhe Institute of Technology (KIT) Scientific Computing Centre (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivano.talamo at psi.ch Mon Jul 22 14:53:20 2024 From: ivano.talamo at psi.ch (Talamo Ivano Giuseppe) Date: Mon, 22 Jul 2024 13:53:20 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes Message-ID: Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, since we can't login with our personal/central accounts and then sudo. Neither we can use winbind, since samba-winbind-modules package (that provides the necessary PAM module) conflicts with the gpfs.smb package. We will probably end up creating one or more local accounts and using ssh keys for access. But I wonder if someone with a similar problem found a better workaround. Thanks, Ivano [1] https://www.ibm.com/docs/en/storage-scale/5.2.0?topic=authentication-limitations __________________________________________ Paul Scherrer Institut Ivano Talamo OBBA/230 Forschungsstrasse 111 5232 Villigen PSI Schweiz Phone: +41 56 310 47 11 E-Mail: ivano.talamo at psi.ch Available: Monday - Wednesday -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjarsulic at bsd.uchicago.edu Mon Jul 22 15:17:00 2024 From: mjarsulic at bsd.uchicago.edu (Jarsulic, Michael [BSD]) Date: Mon, 22 Jul 2024 14:17:00 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: Ivano, I am running SSSD on the CES nodes (we need it for file authorization for NFS and SMB, but rely on AD for authentication). IBM set this up for us, had no issues doing it, and there were no library conflicts. -- Mike Jarsulic Associate Director, Scientific Computing Center for Research Informatics | Biological Sciences Division University of Chicago 5454 South Shore Drive, Chicago, IL 60615 | (773) 702-2066 From: gpfsug-discuss on behalf of Talamo Ivano Giuseppe Date: Monday, July 22, 2024 at 8:55?AM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] ssh authentication on CES nodes Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, ZjQcmQRYFpfptBannerStart External: Use caution with links, attachments, and providing information. Report Suspicious ZjQcmQRYFpfptBannerEnd Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, since we can't login with our personal/central accounts and then sudo. Neither we can use winbind, since samba-winbind-modules package (that provides the necessary PAM module) conflicts with the gpfs.smb package. We will probably end up creating one or more local accounts and using ssh keys for access. But I wonder if someone with a similar problem found a better workaround. Thanks, Ivano [1] https://www.ibm.com/docs/en/storage-scale/5.2.0?topic=authentication-limitations __________________________________________ Paul Scherrer Institut Ivano Talamo OBBA/230 Forschungsstrasse 111 5232 Villigen PSI Schweiz Phone: +41 56 310 47 11 E-Mail: ivano.talamo at psi.ch Available: Monday - Wednesday ________________________________ ?This message was received from outside of the organization. Please pay special attention and practice care when clicking on any links, or providing any information to the sender. Cyber attacks commonly attempt to trick you in to thinking the sender is a reputable individual who you can trust.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjarsulic at bsd.uchicago.edu Mon Jul 22 15:17:00 2024 From: mjarsulic at bsd.uchicago.edu (Jarsulic, Michael [BSD]) Date: Mon, 22 Jul 2024 14:17:00 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: Ivano, I am running SSSD on the CES nodes (we need it for file authorization for NFS and SMB, but rely on AD for authentication). IBM set this up for us, had no issues doing it, and there were no library conflicts. -- Mike Jarsulic Associate Director, Scientific Computing Center for Research Informatics | Biological Sciences Division University of Chicago 5454 South Shore Drive, Chicago, IL 60615 | (773) 702-2066 From: gpfsug-discuss on behalf of Talamo Ivano Giuseppe Date: Monday, July 22, 2024 at 8:55?AM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] ssh authentication on CES nodes Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, ZjQcmQRYFpfptBannerStart External: Use caution with links, attachments, and providing information. Report Suspicious ZjQcmQRYFpfptBannerEnd Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, since we can't login with our personal/central accounts and then sudo. Neither we can use winbind, since samba-winbind-modules package (that provides the necessary PAM module) conflicts with the gpfs.smb package. We will probably end up creating one or more local accounts and using ssh keys for access. But I wonder if someone with a similar problem found a better workaround. Thanks, Ivano [1] https://www.ibm.com/docs/en/storage-scale/5.2.0?topic=authentication-limitations __________________________________________ Paul Scherrer Institut Ivano Talamo OBBA/230 Forschungsstrasse 111 5232 Villigen PSI Schweiz Phone: +41 56 310 47 11 E-Mail: ivano.talamo at psi.ch Available: Monday - Wednesday ________________________________ ?This message was received from outside of the organization. Please pay special attention and practice care when clicking on any links, or providing any information to the sender. Cyber attacks commonly attempt to trick you in to thinking the sender is a reputable individual who you can trust.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From LJHenson at mdanderson.org Mon Jul 22 15:43:49 2024 From: LJHenson at mdanderson.org (Henson Jr.,Larry J) Date: Mon, 22 Jul 2024 14:43:49 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: As far as I know, you cannot run SSSD and GPFS-Winbind on the same CES server there will be conflicts. Since GPFS uses Winbind, the SSSD has to be disabled and point everything to use GPFS-Winbind (PAM files included). This is what we do on our CES nodes and they work just fine, users can authenticate using AD credentials, etc. The GPFS-Winbind works just like SSSD if configured correctly. The only issue is once GPFS is shutdown then GPFS-Winbind no longer works but can still passwordless ssh from another cluster server if access is needed, but GPFS is always up so this is not really a deal breaker. Best Regards, Larry Henson IT Engineering Storage Team Cell (713) 702-4896 [cid:image001.png at 01DADC1B.A6836E70] From: gpfsug-discuss On Behalf Of Jarsulic, Michael [BSD] Sent: Monday, July 22, 2024 9:17 AM To: gpfsug main discussion list ; gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] Re: [gpfsug-discuss] ssh authentication on CES nodes SLOW DOWN! - EXTERNAL SENDER: gpfsug-discuss-bounces at gpfsug.org Be suspicious of tone, urgency, and formatting. Do not click/open links or attachments on a mobile device. Wait until you are at a computer to confirm you are absolutely certain it is a trusted source. If you are at all uncertain use the Report Phish button and our Cybersecurity team will investigate. Ivano, I am running SSSD on the CES nodes (we need it for file authorization for NFS and SMB, but rely on AD for authentication). IBM set this up for us, had no issues doing it, and there were no library conflicts. -- Mike Jarsulic Associate Director, Scientific Computing Center for Research Informatics | Biological Sciences Division University of Chicago 5454 South Shore Drive, Chicago, IL 60615 | (773) 702-2066 From: gpfsug-discuss > on behalf of Talamo Ivano Giuseppe > Date: Monday, July 22, 2024 at 8:55?AM To: gpfsug-discuss at spectrumscale.org

> Subject: [EXTERNAL] [gpfsug-discuss] ssh authentication on CES nodes Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, ZjQcmQRYFpfptBannerStart External: Use caution with links, attachments, and providing information. Report Suspicious ZjQcmQRYFpfptBannerEnd Dear all, I have a question regarding the CES service, aka protocol nodes. Our CES cluster is configured with the AD authentication and, accordingly to the documentation [1], SSSD should not be running on the CES nodes. For us that's quite annoying, since we can't login with our personal/central accounts and then sudo. Neither we can use winbind, since samba-winbind-modules package (that provides the necessary PAM module) conflicts with the gpfs.smb package. We will probably end up creating one or more local accounts and using ssh keys for access. But I wonder if someone with a similar problem found a better workaround. Thanks, Ivano [1] https://www.ibm.com/docs/en/storage-scale/5.2.0?topic=authentication-limitations __________________________________________ Paul Scherrer Institut Ivano Talamo OBBA/230 Forschungsstrasse 111 5232 Villigen PSI Schweiz Phone: +41 56 310 47 11 E-Mail: ivano.talamo at psi.ch Available: Monday - Wednesday ________________________________ ?This message was received from outside of the organization. Please pay special attention and practice care when clicking on any links, or providing any information to the sender. Cyber attacks commonly attempt to trick you in to thinking the sender is a reputable individual who you can trust.? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 12974 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Tue Jul 23 12:29:56 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 23 Jul 2024 11:29:56 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: > Hi Ivano, > > I am curious about this line of your message: > ?For us that's quite annoying, since we can't login with our > personal/central accounts and then sudo.? > > We only allow administrator access to the GPFS cluster via the EMS > nodes. We will be restricting them to MFA based access. > We then navigate to all other nodes from one of them. > > My guess would be that administrators log onto the cluster using their personal/central accounts and then use sudo to issue administrative commands. This creates a log of who issued what commands at what time. Useful when you have more than one administrator and provides a level of tracking. Though personally I think using your "personal" everyday account for this is suboptimal. Best practice would suggest have a separate personal administrator account. So for example in a previous life my normal everyday account was njab14 no different than anyone else's account, but my I had a separate account administrator account was sjab14. That could do things like sudo had rights in the AD etc. etc. You can also do things like create groups of users that can log onto things that normal users cant. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Tue Jul 23 12:29:56 2024 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 23 Jul 2024 11:29:56 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: > Hi Ivano, > > I am curious about this line of your message: > ?For us that's quite annoying, since we can't login with our > personal/central accounts and then sudo.? > > We only allow administrator access to the GPFS cluster via the EMS > nodes. We will be restricting them to MFA based access. > We then navigate to all other nodes from one of them. > > My guess would be that administrators log onto the cluster using their personal/central accounts and then use sudo to issue administrative commands. This creates a log of who issued what commands at what time. Useful when you have more than one administrator and provides a level of tracking. Though personally I think using your "personal" everyday account for this is suboptimal. Best practice would suggest have a separate personal administrator account. So for example in a previous life my normal everyday account was njab14 no different than anyone else's account, but my I had a separate account administrator account was sjab14. That could do things like sudo had rights in the AD etc. etc. You can also do things like create groups of users that can log onto things that normal users cant. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From p.ward at nhm.ac.uk Tue Jul 23 13:55:15 2024 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 23 Jul 2024 12:55:15 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: Yes we follow that principle too. With access to GPFS administration, soon to be restricted to allow access only from specific 'bastions' with mfa implemented on them, to specific management nodes only, not protocol nodes. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Tuesday, July 23, 2024 12:30 PM To: gpfsug-discuss at gpfsug.org; gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] ssh authentication on CES nodes On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: > Hi Ivano, > > I am curious about this line of your message: > "For us that's quite annoying, since we can't login with our > personal/central accounts and then sudo." > > We only allow administrator access to the GPFS cluster via the EMS > nodes. We will be restricting them to MFA based access. > We then navigate to all other nodes from one of them. > > My guess would be that administrators log onto the cluster using their personal/central accounts and then use sudo to issue administrative commands. This creates a log of who issued what commands at what time. Useful when you have more than one administrator and provides a level of tracking. Though personally I think using your "personal" everyday account for this is suboptimal. Best practice would suggest have a separate personal administrator account. So for example in a previous life my normal everyday account was njab14 no different than anyone else's account, but my I had a separate account administrator account was sjab14. That could do things like sudo had rights in the AD etc. etc. You can also do things like create groups of users that can log onto things that normal users cant. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From p.ward at nhm.ac.uk Tue Jul 23 13:55:15 2024 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 23 Jul 2024 12:55:15 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: Yes we follow that principle too. With access to GPFS administration, soon to be restricted to allow access only from specific 'bastions' with mfa implemented on them, to specific management nodes only, not protocol nodes. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss On Behalf Of Jonathan Buzzard Sent: Tuesday, July 23, 2024 12:30 PM To: gpfsug-discuss at gpfsug.org; gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] ssh authentication on CES nodes On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: > Hi Ivano, > > I am curious about this line of your message: > "For us that's quite annoying, since we can't login with our > personal/central accounts and then sudo." > > We only allow administrator access to the GPFS cluster via the EMS > nodes. We will be restricting them to MFA based access. > We then navigate to all other nodes from one of them. > > My guess would be that administrators log onto the cluster using their personal/central accounts and then use sudo to issue administrative commands. This creates a log of who issued what commands at what time. Useful when you have more than one administrator and provides a level of tracking. Though personally I think using your "personal" everyday account for this is suboptimal. Best practice would suggest have a separate personal administrator account. So for example in a previous life my normal everyday account was njab14 no different than anyone else's account, but my I had a separate account administrator account was sjab14. That could do things like sudo had rights in the AD etc. etc. You can also do things like create groups of users that can log onto things that normal users cant. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org From ott.oopkaup at ut.ee Tue Jul 23 14:23:11 2024 From: ott.oopkaup at ut.ee (Ott Oopkaup) Date: Tue, 23 Jul 2024 16:23:11 +0300 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: <267755a3-8f64-4d3a-8e35-649b43a23670@ut.ee> Hi all, While not strictly by-the-book, we have ran SSSD alongside gpfs-winbind for ~7 years now. Might be a case of running the protocol nodes on RHEL exclusively, but we have never had any issues. Might help that our LDAP and AD are kept in sync as best as possible so any conflicts will still resolve to the same values. Even further, thanks to some old legacy documentation I recently moved from regular gpfs-winbind (that was basically connected using net ads join) to actual mmuserauth and AD. In my mind, having even SSSD installed would already cause the library conflicts. Obviously, there are more than 2 ways to skin this particular cat and for a really dirty fix you could map the admin users locally with the same UIDS etc.. Best, Ott Oopkaup University of Tartu, High Performance Computing Centre Systems Administrator On 7/23/24 2:29 PM, Jonathan Buzzard wrote: > On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: >> Hi Ivano, >> >> I am curious about this line of your message: >> ?For us that's quite annoying, since we can't login with our >> personal/central accounts and then sudo.? >> >> We only allow administrator access to the GPFS cluster via the EMS >> nodes. We will be restricting them to MFA based access. >> We then navigate to all other nodes from one of them. >> >> > My guess would be that administrators log onto the cluster using their > personal/central accounts and then use sudo to issue administrative > commands. This creates a log of who issued what commands at what time. > Useful when you have more than one administrator and provides a level > of tracking. > > Though personally I think using your "personal" everyday account for > this is suboptimal. Best practice would suggest have a separate > personal administrator account. So for example in a previous life my > normal everyday account was njab14 no different than anyone else's > account, but my I had a separate account administrator account was > sjab14. That could do things like sudo had rights in the AD etc. etc. > > You can also do things like create groups of users that can log onto > things that normal users cant. > > > JAB. > From ott.oopkaup at ut.ee Tue Jul 23 14:23:11 2024 From: ott.oopkaup at ut.ee (Ott Oopkaup) Date: Tue, 23 Jul 2024 16:23:11 +0300 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: <267755a3-8f64-4d3a-8e35-649b43a23670@ut.ee> Hi all, While not strictly by-the-book, we have ran SSSD alongside gpfs-winbind for ~7 years now. Might be a case of running the protocol nodes on RHEL exclusively, but we have never had any issues. Might help that our LDAP and AD are kept in sync as best as possible so any conflicts will still resolve to the same values. Even further, thanks to some old legacy documentation I recently moved from regular gpfs-winbind (that was basically connected using net ads join) to actual mmuserauth and AD. In my mind, having even SSSD installed would already cause the library conflicts. Obviously, there are more than 2 ways to skin this particular cat and for a really dirty fix you could map the admin users locally with the same UIDS etc.. Best, Ott Oopkaup University of Tartu, High Performance Computing Centre Systems Administrator On 7/23/24 2:29 PM, Jonathan Buzzard wrote: > On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: >> Hi Ivano, >> >> I am curious about this line of your message: >> ?For us that's quite annoying, since we can't login with our >> personal/central accounts and then sudo.? >> >> We only allow administrator access to the GPFS cluster via the EMS >> nodes. We will be restricting them to MFA based access. >> We then navigate to all other nodes from one of them. >> >> > My guess would be that administrators log onto the cluster using their > personal/central accounts and then use sudo to issue administrative > commands. This creates a log of who issued what commands at what time. > Useful when you have more than one administrator and provides a level > of tracking. > > Though personally I think using your "personal" everyday account for > this is suboptimal. Best practice would suggest have a separate > personal administrator account. So for example in a previous life my > normal everyday account was njab14 no different than anyone else's > account, but my I had a separate account administrator account was > sjab14. That could do things like sudo had rights in the AD etc. etc. > > You can also do things like create groups of users that can log onto > things that normal users cant. > > > JAB. > From cdmaestas at us.ibm.com Tue Jul 23 16:00:01 2024 From: cdmaestas at us.ibm.com (CHRIS MAESTAS) Date: Tue, 23 Jul 2024 15:00:01 +0000 Subject: [gpfsug-discuss] Summer 2024 US Based Scale User Group Meetings! In-Reply-To: References: Message-ID: Hello everyone! I hope you are well. Sorry to cause some trouble, but currently we will be postponing the Miami event on July 30th. Stay tuned for a new date to be announced! For the West Coast Scale User Group Meeting in San Jose, CA USA on August 6-7, we have updated the agenda to include stories from IBM Research regarding the Vela and Blue Vela projects! If you haven?t read the paper yet check out: https://arxiv.org/pdf/2407.05467 -- Chris Maestas, Chief Troublemaking Officier From: gpfsug-discuss on behalf of CHRIS MAESTAS Date: Wednesday, July 3, 2024 at 10:50?AM To: gpfsug-discuss at gpfsug.org Subject: [EXTERNAL] [gpfsug-discuss] Summer 2024 US Based Scale User Group Meetings! Hello everyone! There are a few US based Scale Meetings that are coming soon in July and August. Registration, details for these events are posted here: https:?//gpfsug.?org/events/ and the main page. 30-Jul ? Scale User Group Meeting in Miami, Hello everyone! There are a few US based Scale Meetings that are coming soon in July and August. Registration, details for these events are posted here: https://gpfsug.org/events/ and the main page. 30-Jul ? Scale User Group Meeting in Miami, FL, USA 6-7-August ? US West Coast Scale User Group Meeting in San Jose, CA, USA -- Chris Maestas, Chief Troublemaking Officer. 8) -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Fri Jul 26 20:11:17 2024 From: uwe.falke at kit.edu (Uwe Falke) Date: Fri, 26 Jul 2024 21:11:17 +0200 Subject: [gpfsug-discuss] preventing / delaying deletion of scheduled snapshots? Message-ID: Drear all, we have configured scheduled creation of snapshots for several filesets in a file system. Now, a couple of users discovered they'd been erroneously deleting files since EoJune (some script misbehaving). The last snapshot before dates from Jun 1st and is to be deleted Aug 1st. The time for the users might be too short to check and fix their deleted data (as several might not have a chance to do so before Aug 1st). I could not see how I could change the schedule for an existing snapshot (i.e. delay / prevent its deletion) in the GUI (invsetigating the GUI and studying the GPFS documentation / admin guide) The GUI commands include something like /usr/lpp/mmfs/gui/cli/chsnapassoc but I do not know whether that applies to existing snapshots as well or just for future ones (as the rules cover both creation and lifetime of the snaps). How I could prevent the scheduled deletion of a single snapshot (created by the GUI as scheduled snapshot)? Is that possible at all (o.k., I might stop the GUI but as there are several filesets with their scheduled snapshots, so that is not what I want actually). Thanks Uwe -- Karlsruhe Institute of Technology (KIT) Scientific Computing Centre (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From uwe.falke at kit.edu Fri Jul 26 20:33:43 2024 From: uwe.falke at kit.edu (Uwe Falke) Date: Fri, 26 Jul 2024 21:33:43 +0200 Subject: [gpfsug-discuss] preventing / delaying deletion of scheduled snapshots? In-Reply-To: References: Message-ID: Just thought it over: maybe the schedule applied is not fixed at snapshot creation time, but always re-evaluated according to the associated rules. If "monthly" is our current rule and "radarmonthly" is a montly creation rule with a (sufficiently) longer retention, could I just change? the rule for the fileset by ?chsnapassoc -n radarmonthly -o monthly -j ? If that prevents the deletion of the snapshot from June 1st, I am done :-) Would that work? Thanks Uwe On 26.07.24 21:11, Uwe Falke wrote: > Drear all, > > we have configured scheduled creation of snapshots for several > filesets in a file system. > > Now, a couple of users discovered they'd been erroneously deleting > files since EoJune (some script misbehaving). > > The last snapshot before dates from Jun 1st and is to be deleted Aug > 1st. The time for the users might be too short to check and fix their > deleted data (as several might not have a chance to do so before Aug > 1st). > > I could not see how I could change the schedule for an existing > snapshot (i.e. delay / prevent its deletion) in the GUI (invsetigating > the GUI and studying the GPFS documentation / admin guide) > The GUI commands include something like > /usr/lpp/mmfs/gui/cli/chsnapassoc but I do not know whether that > applies to existing snapshots as well or just for future ones (as the > rules cover both creation and lifetime of the snaps). > How I could prevent the scheduled deletion of a single snapshot > (created by the GUI as scheduled snapshot)? Is that possible at all > (o.k., I might stop the GUI but as there are several filesets with > their scheduled snapshots, so that is not what I want actually). > > Thanks > > Uwe > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Scientific Computing Centre (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: From ivano.talamo at psi.ch Mon Jul 29 07:30:24 2024 From: ivano.talamo at psi.ch (Talamo Ivano Giuseppe) Date: Mon, 29 Jul 2024 06:30:24 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: Hi Jonathan, Yes, we have dedicated personal admins accounts. But they're also centrally configured on AD. The problem stays the same. We don't have an EMS for those nodes. Our CES nodes are in a storage-less cluster (the storage is accessed via remote-cluster mount) and we install them via our puppet-based infrastructure. Thanks for the suggestion of using pam_krb5. I'm not a big fan since RHEL discontinued it in favour of SSSD, but I'll check it out. Regards, Ivano __________________________________________ Paul Scherrer Institut Ivano Talamo OBBA/230 Forschungsstrasse 111 5232 Villigen PSI Schweiz Phone: +41 56 310 47 11 E-Mail: ivano.talamo at psi.ch Available: Monday - Wednesday ________________________________ From: gpfsug-discuss on behalf of Jonathan Buzzard Sent: 23 July 2024 13:29 To: gpfsug-discuss at gpfsug.org ; gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] ssh authentication on CES nodes On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: > Hi Ivano, > > I am curious about this line of your message: > ?For us that's quite annoying, since we can't login with our > personal/central accounts and then sudo.? > > We only allow administrator access to the GPFS cluster via the EMS > nodes. We will be restricting them to MFA based access. > We then navigate to all other nodes from one of them. > > My guess would be that administrators log onto the cluster using their personal/central accounts and then use sudo to issue administrative commands. This creates a log of who issued what commands at what time. Useful when you have more than one administrator and provides a level of tracking. Though personally I think using your "personal" everyday account for this is suboptimal. Best practice would suggest have a separate personal administrator account. So for example in a previous life my normal everyday account was njab14 no different than anyone else's account, but my I had a separate account administrator account was sjab14. That could do things like sudo had rights in the AD etc. etc. You can also do things like create groups of users that can log onto things that normal users cant. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivano.talamo at psi.ch Mon Jul 29 07:30:24 2024 From: ivano.talamo at psi.ch (Talamo Ivano Giuseppe) Date: Mon, 29 Jul 2024 06:30:24 +0000 Subject: [gpfsug-discuss] ssh authentication on CES nodes In-Reply-To: References: Message-ID: Hi Jonathan, Yes, we have dedicated personal admins accounts. But they're also centrally configured on AD. The problem stays the same. We don't have an EMS for those nodes. Our CES nodes are in a storage-less cluster (the storage is accessed via remote-cluster mount) and we install them via our puppet-based infrastructure. Thanks for the suggestion of using pam_krb5. I'm not a big fan since RHEL discontinued it in favour of SSSD, but I'll check it out. Regards, Ivano __________________________________________ Paul Scherrer Institut Ivano Talamo OBBA/230 Forschungsstrasse 111 5232 Villigen PSI Schweiz Phone: +41 56 310 47 11 E-Mail: ivano.talamo at psi.ch Available: Monday - Wednesday ________________________________ From: gpfsug-discuss on behalf of Jonathan Buzzard Sent: 23 July 2024 13:29 To: gpfsug-discuss at gpfsug.org ; gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] ssh authentication on CES nodes On Tue, 2024-07-23 at 10:11 +0000, Paul Ward wrote: > Hi Ivano, > > I am curious about this line of your message: > ?For us that's quite annoying, since we can't login with our > personal/central accounts and then sudo.? > > We only allow administrator access to the GPFS cluster via the EMS > nodes. We will be restricting them to MFA based access. > We then navigate to all other nodes from one of them. > > My guess would be that administrators log onto the cluster using their personal/central accounts and then use sudo to issue administrative commands. This creates a log of who issued what commands at what time. Useful when you have more than one administrator and provides a level of tracking. Though personally I think using your "personal" everyday account for this is suboptimal. Best practice would suggest have a separate personal administrator account. So for example in a previous life my normal everyday account was njab14 no different than anyone else's account, but my I had a separate account administrator account was sjab14. That could do things like sudo had rights in the AD etc. etc. You can also do things like create groups of users that can log onto things that normal users cant. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uwe.falke at kit.edu Tue Jul 30 12:11:23 2024 From: uwe.falke at kit.edu (Uwe Falke) Date: Tue, 30 Jul 2024 13:11:23 +0200 Subject: [gpfsug-discuss] preventing / delaying deletion of scheduled snapshots? In-Reply-To: References:

Message-ID: <16e98261-9ba0-4713-bddd-7650914e6941@kit.edu> If anybody is interested: I tested the chsnapassoc command on another fileset associating a new daily rule with maxDays=10 instead of the old daily rule with maxDays=3. The snapshot due to be deleted today by the old rule was not deleted. Hence it seems assigning a rule with a longer max time affects also already existing snapshots. I initially assumed the expiration rule is bound to the snapshot, but it is more like a policy applied periodically to the existing snapshots regardless what the rule was when these were created. Consequently , I associated the fileset with the to-be preserved monthly snapshot with a new rule having a larger maxMonth value. Cheers Uwe On 26.07.24 21:33, Uwe Falke wrote: > Just thought it over: maybe the schedule applied is not fixed at > snapshot creation time, but always re-evaluated according to the > associated rules. > > If "monthly" is our current rule and "radarmonthly" is a montly > creation rule with a (sufficiently) longer retention, could I just > change? the rule for the fileset by > > ?chsnapassoc -n radarmonthly -o monthly -j > > ? If that prevents the deletion of the snapshot from June 1st, I am > done :-) Would that work? > > Thanks > > Uwe > > On 26.07.24 21:11, Uwe Falke wrote: >> Drear all, >> >> we have configured scheduled creation of snapshots for several >> filesets in a file system. >> >> Now, a couple of users discovered they'd been erroneously deleting >> files since EoJune (some script misbehaving). >> >> The last snapshot before dates from Jun 1st and is to be deleted Aug >> 1st. The time for the users might be too short to check and fix their >> deleted data (as several might not have a chance to do so before Aug >> 1st). >> >> I could not see how I could change the schedule for an existing >> snapshot (i.e. delay / prevent its deletion) in the GUI >> (invsetigating the GUI and studying the GPFS documentation / admin >> guide) >> The GUI commands include something like >> /usr/lpp/mmfs/gui/cli/chsnapassoc but I do not know whether that >> applies to existing snapshots as well or just for future ones (as the >> rules cover both creation and lifetime of the snaps). >> How I could prevent the scheduled deletion of a single snapshot >> (created by the GUI as scheduled snapshot)? Is that possible at all >> (o.k., I might stop the GUI but as there are several filesets with >> their scheduled snapshots, so that is not what I want actually). >> >> Thanks >> >> Uwe >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org -- Karlsruhe Institute of Technology (KIT) Scientific Computing Centre (SCC) Scientific Data Management (SDM) Uwe Falke Hermann-von-Helmholtz-Platz 1, Building 442, Room 187 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 28024 Email: uwe.falke at kit.edu www.scc.kit.edu Registered office: Kaiserstra?e 12, 76131 Karlsruhe, Germany KIT ? The Research University in the Helmholtz Association -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5814 bytes Desc: S/MIME Cryptographic Signature URL: