Introduction
My original storage setup has grown organically and isn't reliable, performant or scalable. All storage is built on the Proxmox hosts in my home cluster. The goal is to change this setup to give me better storage availablity across my cluster with better redudancy.
Original setup
I orginally started out with just host1, which is a custom SuperMicro build. This is used as the main fileserver, for which I used a ZFS striped-mirror (RAID10) pool. To host the original set of VMs I had a mirror of two 500GB SSDs. None of this storage was setup to be shared. Theoretically I could have setup the 500GB mirror as ZFS over iSCI, but the pool was already pretty much full.
host1 (SuperMicro)
|-240GB Boot SSD
|-500GB Scratch HDD
|-500GB SSD ZFS tank-spool
|-500GB SSD ZFS tank-spool
|-4TB HDD raid10-pool
|-4TB HDD raid10-pool
|-4TB HDD raid10-pool
\-4TB HDD raid10-pool
Over time I also added two R320s, both of which came with 4 60GB SSDs. These hosts were mainly used as some compute hosts for some projects I had at university, thus not setup for any shared storage.
host2 (R320)
|-500GB SSD Boot
|-500GB SSD ZFS Pool
|-500GB SSD ZFS Pool
|-500GB SSD ZFS Pool
\-500GB SSD ZFS Pool
host3 (R320)
|-60GB SSD Boot zfs-mirror
\-60GB SSD Boot zfs-mirror
Finally, none of my storage is correctly or automatically backed up. This is all done manually, by copying files to a set of external drives. These serve as an offline backup for the most important files.
New storage considerations
With the idea of rebuilding the storage, there are three points to think about.
- I require a boot drive for each of the three hosts, likely based on SSDs
- I need bulk storage for my backups/media/etc, likely based on HDDs
- I need VM/Guest storage, ideally based on SSDs.
- I should build some way to backup most of this data
A classic architecture would be to deploy FreeNAS/TrueNAS as a storage host. However, I would rather build my system hyper converged. Proxmox gives me enough configuration possibilities to allow me this to do.
Boot disks
The boot disks for each of the drives does not need to be that reliable, since Proxmox makes sure the important files are synced over the cluster. Potentially I could put these on more reliable ZFS mirrors, but that takes up more drives. If I use ZFS pools, these could potentially also be shared over iSCI, but then would be highly localized storage, and potentially not close to the compute. Plus this creates a bunch of storage items in Proxmox, making the administration more confusing.
Bulk storage
This is pretty much already settled, as it will reuse the 4x4TB pool. I could potentially expand this pool with another mirror of 4TBs. This could be shared via ZFS over iSCI for bulk storage and storage for certain VMs that need large amount of storage. A key point is backing up this storage.
VM/Guest storage.
For most VMs, this needs to based on SSDs for good performance. This should be shared to all three of the hosts to allow for easy migration of VMs and containers. So this could be centralized and then shared over iSCI. However, I would rather the storage to be distributed, giving me more resilience in case I take some of the nodes down (which does happen quite a bit on my home cluster, to allow for tinkering).
Backups
This point I am still not sure about. I currently already backup my VMs/containers from their various SSDs to the raid10-pool, however this isn't a real 1-2-3 backup situation. Ideally I want an offline host in my network as well as an offset, or 'far' location in case the house burns down down. The one advantage I have in this respect is that on the farm where I live I could relatively easily put a box with a bunch of drives on the other end of the property so that even in case of fire it would likely remain untouched. However it might just be cheaper to encrypt all data and upload it. To further complicate this matter, Proxmox just released their Backup Server, which would work nicely with this setup (https://forum.proxmox.com/threads/proxmox-backup-server-beta.72676/)[https://forum.proxmox.com/threads/proxmox-backup-server-beta.72676/]. For now, this remains unsettled and I just use an offline box for some backup storage with a weekly backup cadence.
The final new architecture
For now, the architecture I decided upon is as follows:
host1 (SuperMicro build)
|-240GB Boot SSD
|-500GB Scratch HDD
|-500GB SSD ZFS tank-spool
|-500GB SSD ZFS tank-spool
|-500GB SSD Ceph
|-240GB SSD Ceph
|-4TB HDD raid10-pool
|-4TB HDD raid10-pool
|-4TB HDD raid10-pool
\-4TB HDD raid10-pool
host2 (R320)
|-60GB SSD Boot zfs-mirror
|-60GB SSD Boot zfs-mirror
|-500GB SSD Ceph
\-500GB SSD Ceph
host3 (R320)
|-60GB SSD Boot zfs-mirror
|-60GB SSD Boot zfs-mirror
|-500GB SSD Ceph
\-500GB SSD Ceph
Key observations to make:
The compute hosts R320s both have zfs mirrors as boot drives as I had plenty of 60GB SSDs.Having two OSDs for each of these compute hosts gives me okay placement group spread.
Finally, the main SuperMicro build is not very good. It has two uneven Ceph OSDs with no redudenacy in its boot drive. The main 16TB storage pool is not yet exported via iSCI or NFS.
Conclusion & Problems
With all of this deployed I have a good amount of Ceph storage, but I've noticed a key problem. All my servers are connected through 2 link LACP bonds, giving each 2Gbits of theoretical bandwidth. However, LACP is not very nice for the Ceph system I've noticed. I get stuk at rather low read and write speeds which is annoying when deploying new VMs. Around 30MB/s writes with 80MB/s reads MAX. I'm not entirely sure how much of this is the 1Gbit links and how much of this is the fairly low quality SSDs I have. The next step will likely be upgrading to 10Gbit and potentially expanding the main 16TB storage pool.