This implements the second step of full dynamic wear-leveling, block
allocation randomization. This is the key part the uniformly distributes
wear across the filesystem, even through reboots.
The entropy actually comes from the filesystem itself, by xoring
together all of the CRCs in the metadata-pairs on the filesystem. While
this sounds like a ridiculous operation, it's easy to do when we already
scan the metadata-pairs at mount time.
This gives us a random number we can use for block allocation.
Unfortunately it's not a great general purpose random generator as the
output only changes every filesystem write. Fortunately that's exactly
when we need our allocator.
---
Additionally, the randomization created a mess for the testing
framework. Fortunately, this method of randomization is deterministic.
A very useful property for reproducing bugs.
The main thing to consider was how lfs_dir_fetchwith reacts to
corruption it finds and to make sure falling back to old values works
correctly.
Some of the tricky bits involved making sure we could fall back to both old
commits and old metadata blocks while still handling things like
synthetic moves correctly.
The main issue here was that the old orphan test relied on deleting the
block that contained the most recent update. In the new design this
doesn't really work since updates get appended to metadata-pairs
incrementally.
This is fixed by instead using the truncate command on the appropriate
block. We're now passing orphan tests.
Making the superblock look like "just another entry" allows us to treat
the superblock like "just another entry" and reuse a decent amount of
logic that would otherwise only be used a format and mount time. In this
case we can use append to write out the superblock like it was creating
a new entry on the filesystem.
This is only an issue in the weird case that are worn down block is
left in the odd state of not being able to change the data that resides
on the block. That being said, this does pop up often when simulating
wear on block devices.
Currently, directory commits checked if the write succeeded by crcing the
block to avoid the additional RAM cost for another buffer. However,
before this commit, directory commits just checked if the block crc was
valid, rather than comparing to the expected crc. This would usually
work, unless the block was stuck in a state with valid crc.
The fix is to simply compare with the expected crc to find errors.
This provides a limited form of wear leveling. While wear is
not actually balanced across blocks, the filesystem can recover
from corrupted blocks and extend the lifetime of a device nearly
as much as dynamic wear leveling.
For use-cases where wear is important, it would be better to use
a full form of dynamic wear-leveling at the block level. (or
consider a logging filesystem).
Corrupted block handling was simply added on top of the existing
logic in place for the filesystem, so it's a bit more noodly than
it may have to be, but it gets the work done.