thirdparty-littlefs

mirror of https://github.com/eledio-devices/thirdparty-littlefs.git synced 2025-10-31 16:14:16 +01:00

Author	SHA1	Message	Date
Christopher Haster	a5d614fbfb	Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->\| parent \|-. \| gstate \| \| .-\| a \|-' \| '--------' \| X <- child is orphaned \| .--------. '>\| child \|-> \| gstate \| \| a \| '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.	2020-01-26 23:45:54 -06:00
Christopher Haster	f4b6a6b328	Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain	2020-01-20 19:27:27 -06:00
Christopher Haster	9453ebd15d	Added/improved disk-reading debug scripts Also fixed a bug in dir splitting when there's a large number of open files, which was the main reason I was trying to make it easier to debug disk images. One part of the recent test changes was to move away from the file-per-block emubd and instead simulate storage with a single contiguous file. The file-per-block format was marginally useful at the beginning, but as the remaining bugs get more subtle, it becomes more useful to inspect littlefs through scripts that make the underlying metadata more human-readable. The key benefit of switching to a contiguous file is these same scripts can be reused for real disk images and can even read through /dev/sdb or similar. - ./scripts/readblock.py disk block_size block off data 00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73 q.......littlefs 00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00 /............... 00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19 ............... 00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00 a....b 0......d. ... readblock.py prints a hex dump of a given block on disk. It's basically just "dd if=disk bs=block_size count=1 skip=block \| xxd -g1 -" but with less typing. - ./scripts/readmdir.py disk block_size block1 block2 off tag type id len data (truncated) 0000003b: 0020000a dir 0 10 63 6f 6c 64 63 6f 66 66 coldcoff 00000049: 20000008 dirstruct 0 8 02 02 00 00 03 02 00 00 ........ 00000008: 00200409 dir 1 9 68 6f 74 63 6f 66 66 65 hotcoffe 00000015: 20000408 dirstruct 1 8 fe 01 00 00 ff 01 00 00 ........ readmdir.py prints info about the tags in a metadata pair on disk. It can print the currently active tags as well as the raw log of the metadata pair. - ./scripts/readtree.py disk block_size superblock "littlefs" version v2.0 block_size 512 block_count 1024 name_max 255 file_max 2147483647 attr_max 1022 gstate 0x000000000000000000000000 dir "/" mdir {0x0, 0x1} rev 3 v id 0 superblock "littlefs" inline size 24 mdir {0x77, 0x78} rev 1 id 0 dir "coffee" dir {0x1fc, 0x1fd} dir "/coffee" mdir {0x1fd, 0x1fc} rev 2 id 0 dir "coldcoffee" dir {0x202, 0x203} id 1 dir "hotcoffee" dir {0x1fe, 0x1ff} dir "/coffee/coldcoffee" mdir {0x202, 0x203} rev 1 dir "/coffee/warmcoffee" mdir {0x200, 0x201} rev 1 readtree.py parses the littlefs tree and prints info about the semantics of what's on disk. This includes the superblock, global-state, and directories/metadata-pairs. It doesn't print the filesystem tree though, that could be a different tool.	2020-01-20 19:27:27 -06:00
Christopher Haster	fb65057a3c	Restructured block devices again for better test exploitation Also finished migrating tests with test_relocations and test_exhaustion. The issue I was running into when migrating these tests was a lack of flexibility with what you could do with the block devices. It was possible to hack in some hooks for things like bad blocks and power loss, but it wasn't clean or easily extendable. The solution here was to just put all of these test extensions into a third block device, testbd, that uses the other two example block devices internally. testbd has several useful features for testing. Note this makes it a pretty terrible block device _example_ since these hooks look more complicated than a block device needs to be. - testbd can simulate different erase values, supporting 1s, 0s, other byte patterns, or no erases at all (which can cause surprising bugs). This actually depends on the simulated erase values in ramdb and filebd. I did try to move this out of rambd/filebd, but it's not possible to simulate erases in testbd without buffering entire blocks and creating an excessive amount of extra write operations. - testbd also helps simulate power-loss by containing a "power cycles" counter that is decremented every write operation until it calls exit. This is notably faster than the previous gdb approach, which is valuable since the reentrant tests tend to take a while to resolve. - testbd also tracks wear, which can be manually set and read. This is very useful for testing things like bad block handling, wear leveling, or even changing the effective size of the block device at runtime.	2020-01-20 19:27:24 -06:00
Christopher Haster	ecc2857c0e	Migrated bad-block tests Even with adding better reentrance testing, the bad-block tests are still very useful at isolating the block eviction logic. This also required rewriting a bit of the internal testing wirework to allow custom block devices which opens up quite a bit more straegies for testing.	2020-01-14 12:04:20 -06:00
Christopher Haster	5181ce66cd	Migrated the first of the tests with internal knowledge Both test_move and test_orphan needed internal knowledge which comes with the addition of the "in" attribute. This was in the plan for the test-revamp from the beginning as it really opens up the ability to write more unit-style-tests using internal knowledge of how littlefs works. More unit-style-tests should help _fix_ bugs by limiting the scope of the test and where the bug could be hiding. The "in" attribute effectively runs tests _inside_ the .c file specified, giving the test access to all static members without needed to change their visibility.	2020-01-14 09:14:01 -06:00
Christopher Haster	b06ce54279	Migrated the bulk of the feature-specific tests This involved some minor tweaks for the various types of tests, added predicates to the test framework (necessary for test_entries and test_alloc), and cleaned up some of the testing semantics such as reporting how many tests are filtered, showing permutation config on the result screen, and properly inheriting suite config in cases.	2020-01-12 22:21:09 -06:00
Christopher Haster	1d2688a771	Migrated test_files, test_dirs, test_format suites to new framework Also some tweaks to test_.py to capture Makefile warnings and print test locations a bit better.	2020-01-11 15:58:17 -06:00
Christopher Haster	53d2b02f2a	Added reentrant and gdb testing mechanisms to test framework Aside from reworking the internals of test_.py to work well with inherited TestCase classes, this also provides the two main features that were the main reason for revamping the test framework 1. ./scripts/test_.py --reentrant Runs reentrant tests (tests with reentrant=true in the .toml configuration) under gdb such that the program is killed on every call to lfs_emubd_prog or lfs_emubd_erase. Currently this just increments a number of prog/erases to skip, which means it doesn't necessarily check every possible branch of the test, but this should still provide a good coverage of power-loss tests. 2. ./scripts/test_.py --gdb Run the tests and if a failure is hit, drop into GDB. In theory this will be very useful for reproducing and debugging test failures. Note this can be combined with --reentrant to drop into GDB on the exact cycle of power-loss where the tests fail.	2019-12-31 11:51:52 -06:00
Christopher Haster	f42e007709	Created initial implementation of revamped test.py This is the start of reworking littlefs's testing framework based on lessons learned from the initial testing framework. 1. The testing framework needs to be _flexible_. It was hacky, which by itself isn't a downside, but it wasn't _flexible_. This limited what could be done with the tests and there ended up being many workarounds just to reproduce bugs. The idea behind this revamped framework is to separate the description of tests (tests/test_dirs.toml) and the running of tests (scripts/test.py). Now, with the logic moved entirely to python, it's possible to run the test under varying environments. In addition to the "just don't assert" run, I'm also looking to run the tests in valgrind for memory checking, and an environment with simulated power-loss. The test description can also contain abstract attributes that help control how tests can be ran, such as "leaky" to identify tests where memory leaks are expected. This keeps test limitations at a minimum without limiting how the tests can be ran. 2. Multi-stage-process tests didn't really add value and limited what the testing environment. Unmounting + mounting can be done in a single process to test the same logic. It would be really difficult to make this fail only when memory is zeroed, though that can still be caught by power-resilient tests. Requiring every test to be a single process adds several options for test execution, such as using a RAM-backed block device for speed, or even running the tests on a device. 3. Added fancy assert interception. This wasn't really a requirement, but something I've been wanting to experiment with for a while. During testing, scripts/explode_asserts.py is added to the build process. This is a custom C-preprocessor that parses out assert statements and replaces them with _very_ verbose asserts that wouldn't normally be possible with just C macros. It even goes as far as to report the arguments to strcmp, since the lack of visibility here was very annoying. tests_/test_dirs.toml:186:assert: assert failed with "..", expected eq "..." assert(strcmp(info.name, "...") == 0); One downside is that simply parsing C in python is slower than the entire rest of the compilation, but fortunately this can be alleviated by parallelizing the test builds through make. Other neat bits: - All generated files are a suffix of the test description, this helps cleanup and means it's (theoretically) possible to parallelize the tests. - The generated test.c is shoved base64 into an ad-hoc Makefile, this means it doesn't force a rebuild of tests all the time. - Test parameterizing is now easier. - Hopefully this framework can be repurposed also for benchmarks in the future.	2019-12-28 23:43:02 -06:00

10 Commits