[gpfsug-discuss] Replicated cluster - failure groups
scale
scale at us.ibm.com
Tue Mar 18 17:21:26 GMT 2025
There is no advantage to having more failure group than maximum number of replica supported by a file system plus 1 for tie breaker disks. In a multiple site setup, you will want 1 failure group per site in order to ensure 1 replica is placed at each site as GPFS will place replica using round-robin amount the failure groups.
From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of Luke Sudbery <l.r.sudbery at bham.ac.uk>
Date: Tuesday, March 18, 2025 at 9:28 AM
To: gpfsug-discuss at gpfsug.org <gpfsug-discuss at gpfsug.org>
Subject: [EXTERNAL] [gpfsug-discuss] Replicated cluster - failure groups
We are planning a replicated cluster. Due to a combination of purchasing cycles, floor loading and VAT-exemption status for half the equipment/data, this will be built over time using a total 8 Lenovo DSS building blocks. 2 main pools, in 2
We are planning a replicated cluster. Due to a combination of purchasing cycles, floor loading and VAT-exemption status for half the equipment/data, this will be built over time using a total 8 Lenovo DSS building blocks. 2 main pools, in 2 data centres, with 2 DSSG per pool, and a quorum/manager node with a local tie breaker disk in a 3rd physical location.
My main question is about failure groups - so far, with 2 DSS and 1 tiebreaker, we would have had 1 failure group per DSS and 1 for the tie breaker disk, giving us a total of 3. But if we did that now we would have 9 failure groups in 1 filesystem, which is more than the maximum number of replicas of the file system descriptor and not desirable, as I understand it.
So we could have either:
* 1 FG per physical site, and assign all 4 DSS per site to 1 FG, and a 3rd to the tiebreaker
* 1 FG per pool per site, with 2 DSS in each FG. This makes sense as the pairs of DSSG will both always need to be up for all the data in the pool to be accessible.
The second option would give us 5 failure groups, but what would be the advantage and disadvantages of more failure groups?
Many thanks,
Luke
--
Luke Sudbery
Principal Engineer (HPC and Storage).
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road
Please note I don’t work on Monday.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20250318/744f67f7/attachment.htm>
More information about the gpfsug-discuss
mailing list