Currently littlefs uses a separate mutable state struct and immutable
config struct. This lets users place the config struct in ROM where
possible.
However the recent addition of LFS_STATICCFG raises the question of if
this split is still valuable.
If the config is copied into the mutable struct at runtime, this allows
a couple things:
1. Easier user interface, config can be stack allocated, no need to store
the config struct for the lifetime of littlefs in OSs.
2. Avoids duplication when littlefs would need to change config based on
observed superblocks, such as LFS_NAME_MAX limits
3. In theory, access to a single struct is faster/smaller as it avoids
an additional load instruction.
Unfortunately, inlining the dynamic config runs into several issues:
1. The code size actually increases with this change! From digging into
this it's for a couple reasons:
- Copying the config over takes code.
- C has notorious problems with pointer aliasing, accessing
constants from a const struct actually allows C to assume the
values aren't going to change in more situations.
This suggests it may be possible to reduce the code size more by
loading the config pointer into a variable, but I haven't explored
this probably not-worth-it optimization.
- Even assuming deduplication of superblock-dependent configuration,
the config struct is significantly larger than the mutable struct,
and it turns out combining these two exceeds the limits of
immediate-relative-loads, discarding the possible code size
improvement from avoiding a second dereference.
2. The implementation of dynamic configuration differs significantly from
the static configuration. This adds mess into the compile-time #ifdefs
needed to support both options.
This is a style change to make littlefs's callbacks consistent with most
callback declarations found in C. That is, taking in a user-provided
`void*`.
Previously, these callbacks took a pointer to the config struct itself,
which indirectly contained a user provided context, and this gets the
job done, but taking in a callback with a `void*` is arguably more
expected, has a better chance of integrating with C++/OS-specific code,
and is more likely to be optimized out by a clever compiler.
---
As a part of these changes, the geometry for the test bds needed to be
moved into bd specific configuration objects. This is a good change as
it also allows for testing situations where littlefs's geometry does not
match the underlying bd.
Since this is already going to be a breaking API change, this renames
structs/variables named _config -> _cfg. This is in order to be
consistent with functions such as lfs_file_opencfg.
As an embedded library, littlefs's configuration straddles two worlds.
In most cases the configuration is usually constant at build time, but
when integrated into OSs, the configuration needs to be dynamically
configurable.
To help with this, littlefs has a separate lfs_config struct that can be
placed into ROM when possible.
But you know what's better than ROM configuration? Truely inlinable
static configuration known at compile-time. In addition to avoiding the
RAM cost, compile-time configuration allows for additional compiler
optimizations, such as constexpr-elimination and removal of unused
code-paths.
So how to enable static configuration?
1. define LFS_STATICCFG
2. implement callbacks as global functions:
- lfs_read
- lfs_prog
- lfs_erase
- lfs_sync
2. define the now-required constants that configure littlefs:
- LFS_READ_SIZE
- LFS_PROG_SIZE
- LFS_BLOCK_SIZE
- LFS_BLOCK_COUNT
- LFS_BLOCK_CYCLES
- LFS_CACHE_SIZE
- LFS_LOOKAHEAD_SIZE
- LFS_READ_BUFFER (optional)
- LFS_PROG_BUFFER (optional)
- LFS_LOOKAHEAD_BUFFER (optional)
- LFS_NAME_MAX (optional)
- LFS_FILE_MAX (optional)
- LFS_ATTR_MAX (optional)
Note, there is a separate configuration for the file configuration, this
can be enabled/disabled independently of LFS_STATICCFG. You will likely
want to define this as well if you are looking for the smallest code
size.
In order to avoid a mess of #ifdefs, the internals of littlefs use a
simple macro that redirects to either the dynamic or static config at
compile time:
#ifdef LFS_STATICCFG
#define LFS_CFG_READ_SIZE(lfs) ((void)lfs, LFS_READ_SIZE)
#else
#define LFS_CFG_READ_SIZE(lfs) lfs->cfg->read_size
#endif
Unfortunately it does look like there still may be a lot of issues
related to warnings of comparisons against constants... If only C had
a way to ignore warnings on individual statements...
Original idea by apmorton
- Changed the name of the LFS_CONFIG macro to LFS_UTIL to avoid
confusion with the lfs_config struct. This also hints that LFS_UTIL
is related to lfs_util.h.
LFS_UTIL allows the user to override lfs_util.h so they can provide
their own system-level dependencies such as malloc, tracing, builtins,
stdint definitions, string.h, and others.
- Removed stdlib includes from lfs.h, these should all go through
lfs_util.h to let users override these definitions if stdlib is
unavailable on their system.
- Moved error code definitions to lfs_util.h. This lets users override
the error codes to replace them with their own error codes and avoid
a translation layer in some situations. Note the error codes must
still be in the range of a negative int.
- Used proper stdint definitions in lfs_scmp.
__VA_ARGS__ are frustrating in C. Even for their main purpose (printf),
they fall short in that they don't have a _portable_ way to have zero
arguments after the format string in a printf call.
Even if we detect compilers and use ##__VA_ARGS__ where available, GCC
emits a warning with -pedantic that is _impossible_ to explicitly
disable.
This commit contains the best solution we can think of. A bit of
indirection that adds a hidden "%s" % "" to the end of the format
string. This solution does not work everywhere as it has a runtime
cost, but it is hopefully ok for debug statements.
- Standardized littlefs debug statements to use hex prefixes and
brackets for printing pairs.
- Removed the entry behavior for readtree and made -t the default.
This is because 1. the CTZ skip-list parsing was broken, which is not
surprising, and 2. the entry parsing was more complicated than useful.
This functionality may be better implemented as a proper filesystem
read script, complete with directory tree dumping.
- Changed test.py's --gdb argument to take [init, main, assert],
this matches the names of the stages in C's startup.
- Added printing of tail to all mdir dumps in readtree/readmdir.
- Added a print for if any mdirs are corrupted in readtree.
- Added debug script side-effects to .gitignore.
Block device tracing has a lot of potential uses, of course debugging,
but it can also be used for profiling and externally tracking littlefs's
usage of the block device. However, block device tracing emits a massive
amount of output. So keeping block device tracing on by default limits
the usefulness of the filesystem tracing.
So, instead, I've moved the block device tracing into a separate
LFS_TESTBD_YES_TRACE define which switches on the LFS_TESTBD_TRACE
macro. Note that this means in order to get block device tracing, you
need to define both LFS_YES_TRACE and LFS_TESTBD_YES_TRACE. This is
needed as the LFS_TRACE definition is gated by LFS_YES_TRACE in
lfs_util.h.
These two features have been much requested by users, and have even had
several PRs proposed to fix these in several cases. Before this, these
error conditions usually were caught by internal asserts, however
asserts prevented users from implementing their own workarounds.
It's taken me a while to provide/accept a useful recovery mechanism
(returning LFS_ERR_CORRUPT instead of asserting) because my original thinking
was that these error conditions only occur due to bugs in the filesystem, and
these bugs should be fixed properly.
While I still think this is mostly true, the point has been made clear
that being able to recover from these conditions is definitely worth the
code cost. Hopefully this new behaviour helps the longevity of devices
even if the storage code fails.
Another, less important, reason I didn't want to accept fixes for these
situations was the lack of tests that prove the code's value. This has
been fixed with the new testing framework thanks to the additional of
"internal tests" which can call C static functions and really take
advantage of the internal information of the filesystem.
- Changed readmdir.py to print the metadata pair and revision count,
which is useful when debugging commit issues.
- Added truncated data view to readtree.py by default. This does mean
readtree.py must read all files on the filesystem to show the
truncated data, hopefully this does not end up being a problem.
- Made overall representation hopefully more readable, including moving
superblock under the root dir, userattrs under files, fixing a gstate
rendering issue.
- Added rendering of soft-tails as dotted-arrows, hopefully this isn't
too noisy.
- Fixed explode_asserts.py off-by-1 in #line mapping caused by a strip
call in the assert generation eating newlines. The script matches
line numbers between the original+modified files by emitting assert
statements that use the same number of lines. An off-by-1 here causes
the entire file to map lines incorrectly, which can be very annoying.
With the superblock expansion stuff, the test_format tests have grown
to test more advanced superblock-related features. This is fine but
deserves a rename so it's more clear.
Also fixed a typo that meant tests never ran with block cycles.
Byte-level writes are expensive and not suggested (caches >= 4 bytes
make much more sense), however there are many corner cases with
byte-level writes that can be easy to miss (power-loss leaving single
bytes written to disk).
Unfortunately, byte-level writes mixed with power-loss testing, the
Travis infrastructure, and Arm Thumb instruction set simulation
exceeds the 50-minute budget Travis allocates for jobs.
For now I'm disabling the byte-level tests under Qemu, with the hope that
performance improvements in littlefs will let us turn these tests back
on in the future.
- Added caching to Travis install dirs, because otherwise
pip3 install fails randomly
- Increased size of littlefs-fuse disk because test script has
a larger footprint now
- Skip a couple of reentrant tests under byte-level writes because
the tests just take too long and cause Travis to bail due to no
output for 10m
- Fixed various Valgrind errors
- Suppressed uninit checks for tests where LFS_BLOCK_ERASE_VALUE == -1.
In this case rambd goes uninitialized, which is fine for rambd's
purposes. Note I couldn't figure out how to limit this suppression
to only the malloc in rambd, this doesn't seem possible with Valgrind.
- Fixed memory leaks in exhaustion tests
- Fixed off-by-1 string null-terminator issue in paths tests
- Fixed lfs_file_sync issue caused by revealed by fixing memory leaks
in exhaustion tests. Getting ENOSPC during a file write puts the file
in a bad state where littlefs doesn't know how to write it out safely.
In this case, lfs_file_sync and lfs_file_close return 0 without
writing out state so that device-side resources can still be cleaned
up. To recover from ENOSPC, the file needs to be reopened and the
writes recreated. Not sure if there is a better way to handle this.
- Added some quality-of-life improvements to Valgrind testing
- Fit Valgrind messages into truncated output when not in verbose mode
- Turned on origin tracking
The core of littlefs's CI testing is the full test suite, `make test`, run
under a number of configurations:
- Processor architecture:
- x86 (native)
- Arm Thumb
- MIPS
- PowerPC
- Storage geometry:
- rs=16 ps=16 cs=64 bs=512 (default)
- rs=1 ps=1 cs=64 bs=4KiB (NOR flash)
- rs=512 ps=512 cs=512 bs=512 (eMMC)
- rs=4KiB ps=4KiB cs=4KiB bs=32KiB (NAND flash)
- Other corner cases:
- no intrinsics
- no inline
- byte-level read/writes
- single block-cycles
- odd block counts
- odd block sizes
The number of different configurations we need to test quickly exceeds the
50 minute time limit Travis has on jobs. Fortunately, we can split these
tests out into multiple jobs. This seems to be the intended course of
action for large CI "builds" in Travis, as this gives Travis a finer
grain of control over limiting builds.
Unfortunately, this created a couple issues:
1. The Travis configuration isn't actually that flexible. It allows a
single "matrix expansion" which can be generated from top-level lists
of different configurations. But it doesn't let you generate a matrix
from two seperate environment variable lists (for arch + geometry).
Without multiple matrix expansions, we're stuck writing out each test
permutation by hand.
On the bright-side, this was a good chance to really learn how YAML
anchors work. I'm torn because on one hand anchors add what feels
like unnecessary complexity to a config language, on the other hand,
they did help quite a bit in working around Travis's limitations.
2. Now that we have 47 jobs instead of 7, reporting a separate status
for each job stops making sense.
What I've opted for here is to use a special NAME variable to
deduplicate jobs, and used a few state-less rules to hopefully have
the reported status make sense most of the time.
- Overwrite "pending" statuses so that the last job to start owns the
most recent "pending" status
- Don't overwrite "failure" statuses unless the job number matches
our own (in the case of CI restarts)
- Don't write "success" statuses unless the job number matches our
own, this should delay a green check-mark until the last-to-start
job finishes
- Always overwrite non-failures with "failure" statuses
This does mean a temporary "success" may appear if the last job
terminates before earlier jobs. But this is the simpliest solution
I can think of without storing some complex state somewhere.
Note we can only report the size this way because it's cheap to
calculate in every job.
RAM-backed testing is faster than file-backed testing. This is why
test.py uses rambd by default.
So why add support for tmpfs-backed disks if we can already run tests in
RAM? For reentrant testing.
Under reentrant testing we simulate power-loss by forcefully exiting the
test program at specific times. To make this power-loss meaningful, we need to
persist the disk across these power-losses. However, it's interesting to
note this persistence doesn't need to be actually backed by the
filesystem.
It may be possible to rearchitecture the tests to simulate power-loss a
different way, by say, using coroutines or setjmp/longjmp to leave
behind ongoing filesystem operations without terminating the program
completely. But at this point, I think it's best to work with what we
have.
And simply putting the test disks into a tmpfs mount-point seems to
work just fine.
Note this does force serialization of the tests, which isn't required
otherwise. Currently they are only serialized due to limitations in
test.py. If a future change wants to perallelize the tests, it may need
to rework RAM-backed reentrant tests.
Moved .travis.yml over to use the new test framework. A part of this
involved testing all of the configurations ran on the old framework
and deciding which to carry over. The new framework duplicates some of
the cases tested by the configurations so some configurations could be
dropped.
The .travis.yml includes some extreme ones, such as no inline files,
relocations every cycle, no intrinsics, power-loss every byte, unaligned
block_count and lookahead, and odd read_sizes.
There were several configurations were some tests failed because of
limitations in the tests themselves, so many conditions were added
to make sure the configurations can run on as many tests as possible.
This is a bit of a strange case that can be caused by storage with
very large prog sizes, such as NAND flash. We only have 10 bits to store
the size of our padding, so when the prog_size gets larger than 1024
bytes, we have to use multiple padding tags to commit to the next
prog_size boundary.
This causes some complication for the new logic that checks CRCs in case
our block becomes "readonly" and contains existing commits that just happen
to match our new commit size.
Here we just check the CRC of the first commit. This isn't perfect but
does protect against pure "readonly" blocks.
These should probably have been cleaned up in each commit to allow
cherry-picking, but due to time I haven't been able to.
- Went with creating an mdir copy in lfs_dir_commit. This handles a
number of related cleanup issues in lfs_dir_compact and it does so
more robustly. As a plus we can use the copy to update dependencies
in the mlist.
- Eliminated code left by the ENOSPC file outlining
- Cleaned up TODOs and lingering comments
- Changed the reentrant many directory create/rename/remove test to use
a smaller set of directories because of space issues when
READ/PROG_SIZE=512
This was caused by the previous fix for allocations during
lfs_fs_deorphan in this branch. To catch half-orphans during block
allocations we needed to duplicate all metadata-pairs reported to
lfs_fs_traverse. Unfortunately this causes lfs_fs_size to report 2x the
number of metadata-pairs, which would undoubtably confuse users.
The fix here is inelegantly simple, just do a different traversale for
allocations and size measurements. It reuses the same code but touches
slightly different sets of blocks.
Unfortunately, this causes the public lfs_fs_traverse and lfs_fs_size
functions to split in how they report blocks. This is technically
allowed, since lfs_fs_traverse may report blocks multiple times due to
CoW behavior, however it's undesirable and I'm sure there will be some
confusion.
But I don't have a better solution, so from this point lfs_fs_traverse
will be reporting 2x metadata-blocks and shouldn't be used for finding
the number of available blocks on the filesystem.
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
Added indention so there was a more clear separation between the tag
description and tag data.
Also took the best parts of readmdir.py and added it to readtree.py.
Initially I was thinking it was best for these to have completely
independent data representations, since you could always call readtree
to get more info, but this becomes tedius when needed to look at
low-level tag info across multiple directories on the filesystem.
This was initially added as protection against the case where a file
grew to no longer fit in a metadata-pair. While in most cases this
should be caught by the math in lfs_file_write, it doesn't handle a
problem that can happen if the files metadata is large enough that even
small inline files can't fit. This can happen if you combine a small
block size with large file names and many custom attributes.
But trying to outline on ENOSPC creates creates a lot of problems.
If we are actually low on space, this is one of the worst things we can
do. Inline files take up less space than CTZ skip-lists, but inline
files are rendered useless if we outline inline files as soon as we run
low on space.
On top of this, the outlining logic tries multiple mdir commits if it
gets ENOSPC, which can hide errors if ENOSPC is returned for other
reasons.
In a perfect world, we would be using a different error code for
no-room-in-metadata-pair, and no-blocks-on-disk.
For now I've removed the outlining logic and we will need to figure out
how to handle this situation more robustly.
It's interesting how many ways block devices can show failed writes:
1. prog can error
2. erase can error
3. read can error after writing (ECC failure)
4. prog doesn't error but doesn't write the data correctly
5. erase doesn't error but doesn't erase correctly
Can read fail without an error? Yes, though this appears the same as
prog and erase failing.
These weren't all simulated by testbd since I unintentionally assumed
the block device could always error. Fixed by added additional bad-black
behaviors to testbd.
Note: This also includes a small fix where we can miss bad writes if the
underlying block device contains a valid commit with the exact same
size in the exact same offset.
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
- Removed old tests and test scripts
- Reorganize the block devices to live under one directory
- Plugged new test framework into Makefile
renamed:
- scripts/test_.py -> scripts/test.py
- tests_ -> tests
- {file,ram,test}bd/* -> bd/*
It took a surprising amount of effort to make the Makefile behave since
it turns out the "test_%" rule could override "tests/test_%.toml.test"
which is generated as part of test.py.
The root of the problem was the notorious Python quirk with mutable
default parameters. The default defines for the TestSuite class ended
up being mutated as the class determined the permutations to test,
corrupting other test's defines.
However, the only define that was mutated this way was the CACHE_SIZE
config in test_entries.
The crazy thing was how this small innocuous change would cause
"./scripts/test.py -nr test_relocations" and "./scripts/test.py -nr"
to drift out of sync only after a commit spanning the different cache
sizes would be written out with a different number of prog calls. This
offset the power-cycle counter enough to cause one case to make it to
an erase, and the other to not.
Normally, the difference between a successful/unsuccessful erase
wouldn't change the result of a test, but in this case it offset the
revision count used for wear-leveling, causing one run run expand the
superblock and the other to not.
This change to the filesystem would then propogate through the rest of
the test, making it difficult to reproduce test failures.
Fortunately the fix was to just make a copy of the default define
dictionary. This should also prevent accidently mutating of dicts
belonging to our caller.
Oh, also fixed a buffer overflow in test_files.
Normally I wouldn't consider optimizing this sort of script, but
explode_asserts.py proved to be terribly inefficient and dominated
the build time for running tests. It was slow enough to be distracting
when attempting to test patches while debugging. Just running
explode_asserts.py was ~10x slower than the rest of the compilation
process.
After implementing a proper tokenizer and switching to a handwritten
recursive descent parser, I was able to speed up explode_asserts.py
by ~5x and make test compilation much more tolerable.
I don't think this was a limitaiton of parsy, but rather switching
to a recursive descent parser made it much easier to find the hotspots
where parsing was wasting cycles (string slicing for one).
It's interesting to note that while the assert patterns can be parsed
with a LL(1) parser (by dumping seen tokens if a pattern fails),
I didn't bother as it's much easier to write the patterns with LL(k)
and parsing asserts is predicated by the "assert" string.
A few other tweaks:
- allowed combining different test modes in one run
- added a --no-internal option
- changed test_.py to start counting cases from 1
- added assert(memcmp(a, b) == 0) matching
- added better handling of string escapes in assert messages
time to run tests:
before: 1m31.122s
after: 0m41.447s
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.