The root of the problem was some assumptions about what tags could be
sent to lfs_dir_commit.
- The first assumption is that there could be only one splice (create/delete)
tag at a time, which is trivially broken by the core commit in lfs_rename.
- The second assumption is that there is at most one create and one delete in
a single commit. This is less obvious but turns out to not be true in
the case that we rename a file such that it overwrites another file in
the same directory (1 delete for source file, 1 delete for destination).
- The third assumption was that there was an ordering to the
delete/creates passed to lfs_dir_commit. It may be possible to force all
deletes to follow creates by rearranging the tags in lfs_rename, but
this risks overflowing tag ids.
The way the lfs_dir_commit first collected the "deletetag" and "createtag"
broke all three of these assumptions. And because we lose the ordering
information we can no longer apply the directory changes to open files
correctly. The file ids may be shifted in a way that doesn't reflect the
actual operations on disk.
These problems were made worst by lfs_dir_commit cleaning up moves
implicitly, which also creates deletes implicitly. While cleaning up moves
in lfs_dir_commit may save some code size, it makes the commit logic much more
difficult to implement correctly.
This bug turned into pulling out a dead tree stump, roots and all.
I ended up reworking how lfs_dir_commit updates open files so that it
has less assumptions, now it just traverses the commit tags multiple
times in order to update file ids after a successful commit in the
correct order.
This also got rid of the dir copy by carefully updating split dirs
after all files have an up-to-date copy of the original dir.
I also just removed the implicit move cleanup. It turns out the only
commits that can occur before we have cleaned up the move is in
lfs_fs_relocate, so it was simple enough to explicitly handle this case
when we update our parent and pred during a relocate.
Cases where we may need to fix moves:
- In lfs_rename when we move a file/dir
- In lfs_demove if we lose power
- In lfs_fs_relocate if we have to relocate our parent and we find it
had a pending move (or else the move will be outdated)
- In lfs_fs_relocate if we have to relocate our predecessor and we find it
had a pending move (or else the move will be outdated)
Note the two cases in lfs_fs_relocate may be recursive. But
lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not
possible for pending moves to spill out into other filesystem commits
And of couse, I added several tests to cover these situations. Hopefully
the rename-with-open-files logic should be fairly locked down now.
found with initial fix by eastmoutain
Also fixed a bug in dir splitting when there's a large number of open
files, which was the main reason I was trying to make it easier to debug
disk images.
One part of the recent test changes was to move away from the
file-per-block emubd and instead simulate storage with a single
contiguous file. The file-per-block format was marginally useful
at the beginning, but as the remaining bugs get more subtle, it
becomes more useful to inspect littlefs through scripts that
make the underlying metadata more human-readable.
The key benefit of switching to a contiguous file is these same
scripts can be reused for real disk images and can even read through
/dev/sdb or similar.
- ./scripts/readblock.py disk block_size block
off data
00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73 q.......littlefs
00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00 /...............
00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19 ...............
00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00 a....b 0......d.
...
readblock.py prints a hex dump of a given block on disk. It's basically
just "dd if=disk bs=block_size count=1 skip=block | xxd -g1 -" but with
less typing.
- ./scripts/readmdir.py disk block_size block1 block2
off tag type id len data (truncated)
0000003b: 0020000a dir 0 10 63 6f 6c 64 63 6f 66 66 coldcoff
00000049: 20000008 dirstruct 0 8 02 02 00 00 03 02 00 00 ........
00000008: 00200409 dir 1 9 68 6f 74 63 6f 66 66 65 hotcoffe
00000015: 20000408 dirstruct 1 8 fe 01 00 00 ff 01 00 00 ........
readmdir.py prints info about the tags in a metadata pair on disk. It
can print the currently active tags as well as the raw log of the
metadata pair.
- ./scripts/readtree.py disk block_size
superblock "littlefs"
version v2.0
block_size 512
block_count 1024
name_max 255
file_max 2147483647
attr_max 1022
gstate 0x000000000000000000000000
dir "/"
mdir {0x0, 0x1} rev 3
v id 0 superblock "littlefs" inline size 24
mdir {0x77, 0x78} rev 1
id 0 dir "coffee" dir {0x1fc, 0x1fd}
dir "/coffee"
mdir {0x1fd, 0x1fc} rev 2
id 0 dir "coldcoffee" dir {0x202, 0x203}
id 1 dir "hotcoffee" dir {0x1fe, 0x1ff}
dir "/coffee/coldcoffee"
mdir {0x202, 0x203} rev 1
dir "/coffee/warmcoffee"
mdir {0x200, 0x201} rev 1
readtree.py parses the littlefs tree and prints info about the
semantics of what's on disk. This includes the superblock,
global-state, and directories/metadata-pairs. It doesn't print
the filesystem tree though, that could be a different tool.
Also finished migrating tests with test_relocations and test_exhaustion.
The issue I was running into when migrating these tests was a lack of
flexibility with what you could do with the block devices. It was
possible to hack in some hooks for things like bad blocks and power
loss, but it wasn't clean or easily extendable.
The solution here was to just put all of these test extensions into a
third block device, testbd, that uses the other two example block
devices internally.
testbd has several useful features for testing. Note this makes it a
pretty terrible block device _example_ since these hooks look more
complicated than a block device needs to be.
- testbd can simulate different erase values, supporting 1s, 0s, other byte
patterns, or no erases at all (which can cause surprising bugs). This
actually depends on the simulated erase values in ramdb and filebd.
I did try to move this out of rambd/filebd, but it's not possible to
simulate erases in testbd without buffering entire blocks and creating
an excessive amount of extra write operations.
- testbd also helps simulate power-loss by containing a "power cycles"
counter that is decremented every write operation until it calls exit.
This is notably faster than the previous gdb approach, which is
valuable since the reentrant tests tend to take a while to resolve.
- testbd also tracks wear, which can be manually set and read. This is
very useful for testing things like bad block handling, wear leveling,
or even changing the effective size of the block device at runtime.
Sometimes small, single line code change hides behind it a complicated
story. This is one of those times.
If you look at this diff, you may note that this is a case of
lfs_dir_fetchmatch not correctly handling a tag that invalidates a
callback used to search for some condition, in this case a search for a
parent, which is invalidated by a later dir tag overwritting the
previous dir pair.
But how can this happen? Dir-pair-tags are only overwritten during
relocations (when a block goes bad or exceeds the block_cycles config
option for dynamic wear-leveling). Other dir operations create new
directory entries. And the only lfs_dir_fetchmatch condition that relies
on overwrites (as opposed to proper deletes) is when we need to find a
directory's parent, an operation that only occurs during a _different_
relocation. And a false _positive_, can only happen if we don't have a
parent. Which is really unlikely when we search for directory parents!
This bug and minimal test case was found by Matthew Renzelmann. In a
unfortunate series of events, first a file creation causes a directory
split to occur. This creates a new, orphaned metadata-pair containing
our new file. However, the revision count on this metadata-pair
indicates the pair is due for relocation as a part of wear-leveling.
Normally, this is fine, even though this metadata-pair has no parent,
the lfs_dir_find should return ENOENT and continue without error.
However, here we get hit by our fetchmatch bug. A previous, unrelated
relocation overwrites a pair which just happens to contain the block
allocated for a new metadata-pair. When we search for a parent,
lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair
and incorrectly tells our orphan it's found its parent.
As you can imagine the orphan's dissapointment must be immense.
So an unfortunately timed dir split triggers a relocation which
incorrectly finds a previously written parent that has been outdated
by another relocation.
As a solution we can outdate our found tag if it is overwritten by
an exact match during lfs_dir_fetchmatch.
As a part of this I started adding a new set of tests: tests/test_relocations,
for aggressive relocations tests. This is already by appended to by
another PR. I suspect relocations is relatively under-tested and is
becoming more important due to recent improvements in wear-leveling.
The superblock entry takes up id 0 in the root directory (not all
entries are files, though currently the superblock is the only
exception). Normally, reading a directory correctly skips the
superblock and only reports non-superblock files.
However, this doesn't work perfectly for lfs_dir_seek, which tries
to be clever to not touch the disk.
Fortunately, we can fix this by adding an offset for the superblock.
This will only work while the superblock is the only non-file entry,
otherwise we would need to touch the disk to properly seek in a
directory (though we already touch the disk a bit to get dir-tails
during seeks).
Found by jhartika
Stop proactively relocate blocks during migrations, this can cause a number of
failure states such: clobbering the v1 superblock if we relocate root, and
invalidating directory pointers if we relocate the head of a directory. On top
of this, relocations increase the overall complexity of lfs_migration, which is
already a delicate operation.
This is caused by dir->head not being updated when dir->m.pair may be.
This causes the two to fall out of sync and later dir rewinds to fail.
This bug stems all the way back from the first commits of littlefs, so
it's surprising it has avoided detection for this long. Perhaps because
lfs_dir_rewind is not used often.
To ensure 16 bit devices do not invalidly truncate lfs_file_write return codes, change
the return variable to be lfs_ssize_t which is the lfs_file_write return code and
cast to int if it is a negative error code.
lfs_dir_find returns either a negative return code or a tag.
For 32 bit machines with int as 32 bits this co-incides, but for smaller
bit processors, we need to ensure a 32 bit value is returned so change
the return type to lfs_stag_t.
Build warnings exist on a gcc based 16 bit compiler. Cast relevant types
to fix.
littlefs/lfs.c: In function 'lfs_gstate_xororphans':
littlefs/lfs.c:355:5: warning: left shift count >= width of type
littlefs/lfs.c: In function 'lfs_dir_fetchmatch':
littlefs/lfs.c:849:17: warning: left shift count >= width of type
littlefs/lfs.c: In function 'lfs_dir_commitcrc':
littlefs/lfs.c:1278:9: warning: left shift count >= width of type
This reverts commit fdd239fe21.
Bypassing cache turned out to be a mistake which causes more problems
than it solves. Device driver should deal with alignment if this is
required - trying to do that in a file system is not a viable solution
anyway.
When using lfs_file_truncate() to make a file shorter the file block and
off were incorrectly positioned at the new end, resulting in invalid
data accessed when reading. Lift the seek pointer restoration to apply
to both increasing and reducing truncates.
Signed-off-by: Peter A. Bigot <pab@pabigot.com>
The difference between 0xffffffff and 0xfffffffe is too subtle. Use
names that reflect what the value represents.
Signed-off-by: Peter A. Bigot <pab@pabigot.com>
Found during testing, the issue was with lfs_migrate in combination with
wear leveling.
Normally, we can expect lfs_migrate to be able to respect the user-configured
block_cycles. It already has allocation information on which blocks are
used by both v1 and v2, so it should be safe to relocate blocks as
needed.
However, this fell apart when root was relocated. If lfs_migrate found a
root that needed migration, it would happily relocate the root. This
would normally be fine, except relocating the root has a side-effect of
needed to update the superblock. Which, during migration, is in a
delicate state of containing both v1's and v2's superblocks in the same
metadata pair. If the superblock ends up needing to compact, this would
clobber the v1 superblock and corrupt the filesystem during migration.
The best fix I could come up with is to specifically dissallow migrating the
root directory during migration. Fortunately this is behind the
LFS_MIGRATE macro, so the code cost for this check is not normally paid.
Due to the logging nature of metadata pairs, switching from inline files
(type3 = 0x201) to CTZ skip-lists (type3 = 0x202) does not explicitly
erase inline files, but instead leaves them up to compaction to omit.
To save code size, this is handled by the same logic that deduplicates
tags.
Unfortunately, this wasn't working. Due to a relatively late change in v2
the struct's type field was changed to no longer be a part of determining a
tag's "uniqueness". A part of this should have been the modification of
directory traversal filtering to respect type-dependent uniqueness, but
I missed this.
The fix is to add in correct type-dependent filtering. Also there was
some clean up necessary around removing delete tags during compaction
and outlining files.
Note that while this appears to conflict with the possibility of
combining inline + ctz files, we still have the device-side-only
LFS_TYPE_FROM tag that can be repurposed for 256 additional inline
"chunks".
Found by Johnxjj
Previously these returned LFS_ERR_BADF. But attempting to modify a file
opened read-only, or reading a write-only flie, is a user error and
should not occur in normal use.
Changing this to an assert allows the logic to be omitted if the user
disables asserts to reduce the code footprint (not suggested unless the
user really really knows what they're doing).
This is a minor quality of life change to help debugging, specifically
when debugging test failures.
Before, the test framework used hex, while the log output used decimal.
This was slightly annoying to convert between.
Why not output lengths/offset in hex? I don't have a big reason. I find
it easier to reason about lengths in decimal and ids (such as addresses
or block numbers) in hex. But this may just be me.
A current limitation of the lfs tag is the 10-bit (1024) length field.
This field is used to indicate padding for commits and effectively
limits the size of commits to 1KiB. Because commits must be prog size
aligned, this is a problem on devices with prog size > 1024.
[---- 6KiB erase block ----]
[-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --]
[ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ]
This can be increased to 12-bit (4096), but for NAND devices this is
still to small to completely solve the issue.
The previous workaround was to just create unaligned commits. This can
occur naturally if littlefs is used on portable media as the prog size
does not have to be consistent on different drivers. If littlefs sees
an unaligned commit, it treats the dir as unerased and must compact the
dir if it creates any new commits.
Unfortunately this isn't great. It effectively means that every small
commit forced an erase on devices with prog size > 1024. This is pretty
terrible.
[---- 6KiB erase block ----]
[-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --]
[ 1KiB commit |------------------- wasted ---------------------]
A different solution, implemented here, is to use multiple crc tags
to pad the commit until the remaining space fits in the padding. This
effectively looks like multiple empty commits and has a small runtime
cost to parse these tags, but otherwise does no harm.
[---- 6KiB erase block ----]
[-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --]
[ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ]
It was a bit tricky to implement, but now we can effectively support
unlimited prog sizes since there's no limit to the number of commits
in a block.
found by kazink and joicetm
Introduced in 0b76635, the workaround for erases sizes >1024 is to
commit with an unaligned CRC tag. Upon reading an unaligned CRC,
littlefs should treat the metadata pair as "requires erased". While
necessary for portability, this also lets us workaround the lack of
handling of erases sizes >1024.
Unfortunately, this workaround wasn't implemented correctly (by me)
in the case that the metadata-pair does not immediately compact. This
is solved here by added the erase check to lfs_dir_commit.
Note this is still only a part of a workaround which should be replaced.
One potential solution is to pad the commit with multiple smaller CRC
tags until we reach the next prog_size boundary.
found by kazink
As it is now, block_cycles = 0 disables wear leveling. This was a
mistake as 0 is the "default" value for several other config options.
It's even worse when migrating from v1 as it's easy to miss the addition
of block_cycles and end up with a filesystem that is not actually
wear-leveling.
Clearly, block_cycles = 0 should do anything but disable wear-leveling.
Here, I've changed block_cycles = 0 to assert. Forcing users to set a
value for block_cycles (500 is suggested). block_cycles can be set to -1
to explicitly disable wear leveling if desired.
This has been a large source of porting errors, partially due to my
fault in not having enough porting documentation, which is also
planned.
In the short term, asserts should at least help catch these types of
errors instead of just letting the filesystem collapse after recieving
an odd error code.
To use, compile and run with LFS_YES_TRACE defined:
make CFLAGS+=-DLFS_YES_TRACE=1 test_format
The name LFS_YES_TRACE was chosen to match the LFS_NO_DEBUG and
LFS_NO_WARN defines for the similar levels of output. The YES is
necessary to avoid a conflict with the actual LFS_TRACE macro that
gets emitting. LFS_TRACE can also be defined directly to provide
a custom trace formatter.
Hopefully having trace statements at the littlefs C API helps
debugging and reproducing issues.