Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- After reading over most of this thread... The requirements are vague, but I'll take a stab at a interpretation of the requirements and a solution to fulfill those requirements.
- Sidenote, in the following stream of thinking, I realized I am using byte and tibibyte measurements interchangeably (GB/GiB, TB/TiB, PB/PiB, etc). If this triggers your inner pedant, you will get over it...
- Requirements:
- 1PB +
- Two system - replicate data
- Ability to grow the filesystem without rebuilding
- Standard hybrid performance
- Backup solution that keeps all changes for 1 year
- To get you anything better than that, the following list of information would be helpful.
- Current system specs
- IOPS and throughput metrics during normal use
- Network utilization metrics during normal use
- The output from the following commands
- lsblk
- lsblk -d -o VENDOR,MODEL,NAME,LOG-SEC,PHY-SEC,MIN-IO,SIZE,HCTL,ROTA,TRAN,TYPE
- zpool status
- zpool list -o health,capacity,size,free,allocated,fragmentation,dedupratio,dedup_table_size,ashift
- sudo zfs list -o type,volsize,used,available,referenced,usedbysnapshots,usedbydataset,usedbychildren,dedup,logicalused,logicalreferenced,recordsize,volblocksize,compression,compressratio,atime,special_small_blocks
- Replacement Systems Spec:
- If it was me in your shoes... With the information about your situation that we have...
- I'd do the following.
- Get two of the following systems. One for the primary storage and the other as your replica target.
- Dell R750/R760/R770 (or similar, and brand will do)
- 24 x 2.5" nvme
- NVME is key here.
- 2 x Xeon Gold (or AMD equiv. I'm just not as well versed in AMD server CPUs)
- 12+ core / CPU
- Fewer fast cores is better than many slow cores, but it's a balance
- IMHO, I'm open to others thoughts on this.
- It's a bit difficult to know how much CPU overhead will be required, so better to spec too much than not enough.
- 512GB+ memory
- More if possible, your ARC will thank you.
- Recent Xeon CPU's have 8 memory channels each
- 8 x 2 = 16 sticks of mem
- 16 x 32GB = 512GB
- 16 x 64GB = 1TB
- Dell Boss card
- or any raid1 boot device
- multiple 10/25Gbe NIC Ports
- or 40/50/100Gbe if your usage justifies it
- SAS HBA with external ports
- JBOD Expansion Disk Shelf(s)
- SAS connected
- 3.5" Drive Slots
- Enough drive slots to hit space requirements + redundance and spares
- Multiple options for this part.
- Lets go with the Dell ME484 (For the sake of discussion...)
- SAS JBOD
- 84 x 3.5" SAS Drive Slots
- Storage Setup:
- Let's assume we have all of our hardware except the storage drives.
- Our hardware is racked, connected, powered on, and OS installed. (I'll ramble about the OS selection later)
- We now need to select the drives and pool configuration for our new storage server.
- What we have to work with:
- 24 x 2.5" NVME drive slots
- 84 x 3.5" SAS drive slots
- Assumptions:
- 3.5" Capacity Drives
- Intended use: Primary storage
- 84 x 20TiB SAS
- 2.5" NVME Drives
- Intended Use:
- Special vdev
- SLOG
- L2ARC
- Multiple possibilities here
- Option 1 - Easy Setup/Good Performance
- 3 x 3.2TiB NVME mixed-use SSD
- Special
- This could be a single mirror if your risk tolerance allows it
- 2 x 400/800GiB NVME write-intensive/mixed-use SSD
- SLOG
- This could be a single disk if your risk tolerance allows it
- 400GiB+ is way overkill for an slog. But the best performing NVME don't come in 10-20 GiB sizes...
- 1 x 3TiB+ NVME/SAS mixed-use/read-intensive SSD
- L2ARC
- Option 2 - More challenging setup/Better Performance
- 6 x 3.2TiB or 6.4TiB NVME mixed-use SSD
- Special/SLOG/L2ARC
- For a general use workload, I'd buildout something like this...
- zPool Structure:
- 8 RAIDz2 vDEVs
- Each vdev = 10 x 3.5" 20TiB
- Usable Space = 1.28PiB
- Support VDEVs
- Option 1 (Easy setup/Slower/Boring)
- Special VDEV
- triple mirror - 3.2TiB
- SLOG
- mirror - 400/800GiB
- Depending on your risk tolerance, this could be a stripe
- L2ARC
- Single 3TiB+
- Option 2 - (Significantly better performance/challenging setup)
- 6 x 3.2TiB+ mixed-use
- Split each NVME disk into three separate namespaces
- NS1 - slog - 10GiB
- This will likely never need to be larger than 10GiB
- NS2 - L2ARC - 1TiB
- ~30% remaining space (loose guideline I made up just now)
- NS3 - Special - 2 TiB
- The rest of the remaining free space
- Config
- SLOG - NS1
- 3 x mirrors (Safe option)
- 1 x 6 disk stripe (Double performance/slightly higher risk)
- Likely, either option will be bottlenecked by the spinning disks.
- L2ARC - NS2
- 1 x 6 disk stripe
- 6TiB Total size of L2ARC
- Special VDEV - NS3
- 2 x triple mirror (Safe option)
- 4TiB for metadata
- 3 x mirror (50% faster/slightly higher risk)
- 6TiB for metadata
- Storage Summary:
- 1.28 Petabytes = Total Usable Space
- 4/6 Terabytes = NVME SSD storage for metadata
- 6 Terabytes = NVME SSD storage for L2ARC (Read cache)
- 60 Gigabytes = NVME SSD storage for SLOG (Write cache)
- Future Expansion:
- Primary storage:
- Add another disk shelf that is populated with a minimum of 10 disks.
- zpool add POOL-NAME raidz2 new-disk1..10
- Boom! you just added 160TiB to your pool.
- Support vdev's:
- This gets a bit more complicated since it will vary based on which support vdev config you picked. But, the minimum number of disks to expand the SSD pools is equal to the single mirrored vdev with the most disks. So if you have a triple mirror, you have to add 3 disks to expand. If you only have a single mirror, you would need two disks to expand.
- Let's assume you went with the better performing and more complex config.
- Now, since all three support vdevs occupy part of each of the NVME disks, when we expand one, for simplicity sake, we expand all.
- SLOG and L2ARC are both single disk stripes. They can be expanded with only a single new disk. But, the Special vdev is made of multiple 2-disk mirrors. So to expand it, we need 2 new disks.
- So, pop two new matching NVME disks into the available slots. Create your three namespaces on each. Then...
- zpool add POOL-NAME log new-disk1 new-disk2
- zpool add POOL-NAME special mirror new-disk1 new-disk2
- zpool add POOL-NAME cache new-disk1 new-disk2
- I have thoughts on your backups too. But that will need to wait for another time.
Add Comment
Please, Sign In to add comment