State stealing is a tricky part of managing the xored-globals. When
removing a metadata-pair from the metadata chain, whichever
metadata-pair does the removing is also responsible for stealing the
removed metadata-pair's global delta and incorporating it into it's own
global delta. Otherwise the global state would become corrupted.
Result of testing on zero-granularity blocks, where the prog size and
read size equals the block size. This represents SD cards and other
traditional forms of block storage where we don't really get a benefit
from the metadata logging.
Unfortunately, since updates in both are tested by the same script,
we can't really use simple bash commands. Added a more complex
script to simulate corruption. Fortunately this should be more robust
than the previous solutions.
The main fixes were around corner cases where the commit logic fell
apart when it didn't have room to complete commits, but these were
fixable in the current design.
I've been trying to keep tag types organized with an encoding that hints
if a tag uses its id field for file ids. However this seem to have been
a mistake. Using a null id of 0x3ff greatly simplified quite a bit of
the logic around managing file related tags.
The downside is one less id we can use, but if we look at the encoding
cost, donating one full bit costs us 2^9 id permutations vs 1 id
permutation. So even if we had a perfect encoding it's in our favor to
use a null id. The cost of null ids is code size, but with the
complexity around figuring out if a type used it's id or not it just
works out better to use a null id.
The issue lies in the reuse of the id field for globals. Before globals,
the only tags with a non-null (0x3ff) id field were names, structs, and
other file-specific metadata. But globals are also using this field for
the indirect delete, since otherwise the globals structure would be very
unaligned (74-bits long).
To make matters worse, the id field for globals contains the delta used
to reconstruct the globals at mount time. Which means the id field could
take on very absurd values and break the dir fetch logic if we're not
careful.
Solution is to use the scope portion of the type field where necessary,
although unforunately this does add some code cost.
Originally I tried to reuse the indirect delete to accomplish truely
atomic directory removes, however this fell apart when it came to
implementing directory removes as a side-effect of renames.
A single indirect-delete simply can't handle renames with removes as
a side effects. When copying an entry to its destination, we need to
atomically delete both the old entry, and the source of our copy. We
can't delete both with only a single indirect-delete. It is possible to
accomplish this with two indirect-deletes, but this is such an uncommon
case that it's really not worth supporting efficiently due to how
expensive globals are.
I also dropped indirect-deletes for normal directory removes. I may add
it back later, but at the moment it's extra code cost for that's not
traveled very often.
As a result, restructured the indirect delete handling to be a bit more
generic, now with a multipurpose lfs_globals_t struct instead of the
delete specific lfs_entry_t struct.
Also worked on integrating xored-globals, now with several primitive
global operations to manage fetching/updating globals on disk.
lfs_dir_fetchwith did not recover from failed dir fetches correctly,
added a temporary dir variable to hold dir contents while being
populated, allowing us to fall back to a known good dir state if a
commit is corrupted.
There is a RAM cost, but the upside is that our lfs_dir_fetchwith
actually works.
Also added better handling of move ids during some get functions.
The "move problem" has been present in littlefs for a while, but I haven't
come across a solution worth implementing for various reasons.
The problem is simple: how do we move directory entries across
directories atomically? Since multiple directory entries are involved,
we can't rely entirely on the atomic block updates. It ends up being
a bit of a puzzle.
To make the problem more complicated, any directory block update can
fail due to wear, and cause the directory block to need to be relocated.
This happens rarely, but brings a large number of corner cases.
---
The solution in this patch is simple:
1. Mark source as "moving"
2. Copy source to destination
3. Remove source
If littlefs ever runs into a "moving" entry, that means a power loss
occured during a move. Either the destination entry exists or it
doesn't. In this case we just search the entire filesystem for the
destination entry.
This is expensive, however the chance of a power loss during a move
is relatively low.