The root of the problem was the notorious Python quirk with mutable
default parameters. The default defines for the TestSuite class ended
up being mutated as the class determined the permutations to test,
corrupting other test's defines.
However, the only define that was mutated this way was the CACHE_SIZE
config in test_entries.
The crazy thing was how this small innocuous change would cause
"./scripts/test.py -nr test_relocations" and "./scripts/test.py -nr"
to drift out of sync only after a commit spanning the different cache
sizes would be written out with a different number of prog calls. This
offset the power-cycle counter enough to cause one case to make it to
an erase, and the other to not.
Normally, the difference between a successful/unsuccessful erase
wouldn't change the result of a test, but in this case it offset the
revision count used for wear-leveling, causing one run run expand the
superblock and the other to not.
This change to the filesystem would then propogate through the rest of
the test, making it difficult to reproduce test failures.
Fortunately the fix was to just make a copy of the default define
dictionary. This should also prevent accidently mutating of dicts
belonging to our caller.
Oh, also fixed a buffer overflow in test_files.
Normally I wouldn't consider optimizing this sort of script, but
explode_asserts.py proved to be terribly inefficient and dominated
the build time for running tests. It was slow enough to be distracting
when attempting to test patches while debugging. Just running
explode_asserts.py was ~10x slower than the rest of the compilation
process.
After implementing a proper tokenizer and switching to a handwritten
recursive descent parser, I was able to speed up explode_asserts.py
by ~5x and make test compilation much more tolerable.
I don't think this was a limitaiton of parsy, but rather switching
to a recursive descent parser made it much easier to find the hotspots
where parsing was wasting cycles (string slicing for one).
It's interesting to note that while the assert patterns can be parsed
with a LL(1) parser (by dumping seen tokens if a pattern fails),
I didn't bother as it's much easier to write the patterns with LL(k)
and parsing asserts is predicated by the "assert" string.
A few other tweaks:
- allowed combining different test modes in one run
- added a --no-internal option
- changed test_.py to start counting cases from 1
- added assert(memcmp(a, b) == 0) matching
- added better handling of string escapes in assert messages
time to run tests:
before: 1m31.122s
after: 0m41.447s
Also fixed a bug in dir splitting when there's a large number of open
files, which was the main reason I was trying to make it easier to debug
disk images.
One part of the recent test changes was to move away from the
file-per-block emubd and instead simulate storage with a single
contiguous file. The file-per-block format was marginally useful
at the beginning, but as the remaining bugs get more subtle, it
becomes more useful to inspect littlefs through scripts that
make the underlying metadata more human-readable.
The key benefit of switching to a contiguous file is these same
scripts can be reused for real disk images and can even read through
/dev/sdb or similar.
- ./scripts/readblock.py disk block_size block
off data
00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73 q.......littlefs
00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00 /...............
00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19 ...............
00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00 a....b 0......d.
...
readblock.py prints a hex dump of a given block on disk. It's basically
just "dd if=disk bs=block_size count=1 skip=block | xxd -g1 -" but with
less typing.
- ./scripts/readmdir.py disk block_size block1 block2
off tag type id len data (truncated)
0000003b: 0020000a dir 0 10 63 6f 6c 64 63 6f 66 66 coldcoff
00000049: 20000008 dirstruct 0 8 02 02 00 00 03 02 00 00 ........
00000008: 00200409 dir 1 9 68 6f 74 63 6f 66 66 65 hotcoffe
00000015: 20000408 dirstruct 1 8 fe 01 00 00 ff 01 00 00 ........
readmdir.py prints info about the tags in a metadata pair on disk. It
can print the currently active tags as well as the raw log of the
metadata pair.
- ./scripts/readtree.py disk block_size
superblock "littlefs"
version v2.0
block_size 512
block_count 1024
name_max 255
file_max 2147483647
attr_max 1022
gstate 0x000000000000000000000000
dir "/"
mdir {0x0, 0x1} rev 3
v id 0 superblock "littlefs" inline size 24
mdir {0x77, 0x78} rev 1
id 0 dir "coffee" dir {0x1fc, 0x1fd}
dir "/coffee"
mdir {0x1fd, 0x1fc} rev 2
id 0 dir "coldcoffee" dir {0x202, 0x203}
id 1 dir "hotcoffee" dir {0x1fe, 0x1ff}
dir "/coffee/coldcoffee"
mdir {0x202, 0x203} rev 1
dir "/coffee/warmcoffee"
mdir {0x200, 0x201} rev 1
readtree.py parses the littlefs tree and prints info about the
semantics of what's on disk. This includes the superblock,
global-state, and directories/metadata-pairs. It doesn't print
the filesystem tree though, that could be a different tool.
Also finished migrating tests with test_relocations and test_exhaustion.
The issue I was running into when migrating these tests was a lack of
flexibility with what you could do with the block devices. It was
possible to hack in some hooks for things like bad blocks and power
loss, but it wasn't clean or easily extendable.
The solution here was to just put all of these test extensions into a
third block device, testbd, that uses the other two example block
devices internally.
testbd has several useful features for testing. Note this makes it a
pretty terrible block device _example_ since these hooks look more
complicated than a block device needs to be.
- testbd can simulate different erase values, supporting 1s, 0s, other byte
patterns, or no erases at all (which can cause surprising bugs). This
actually depends on the simulated erase values in ramdb and filebd.
I did try to move this out of rambd/filebd, but it's not possible to
simulate erases in testbd without buffering entire blocks and creating
an excessive amount of extra write operations.
- testbd also helps simulate power-loss by containing a "power cycles"
counter that is decremented every write operation until it calls exit.
This is notably faster than the previous gdb approach, which is
valuable since the reentrant tests tend to take a while to resolve.
- testbd also tracks wear, which can be manually set and read. This is
very useful for testing things like bad block handling, wear leveling,
or even changing the effective size of the block device at runtime.
Even with adding better reentrance testing, the bad-block tests are
still very useful at isolating the block eviction logic.
This also required rewriting a bit of the internal testing wirework
to allow custom block devices which opens up quite a bit more straegies
for testing.
Both test_move and test_orphan needed internal knowledge which comes
with the addition of the "in" attribute. This was in the plan for the
test-revamp from the beginning as it really opens up the ability to
write more unit-style-tests using internal knowledge of how littlefs
works. More unit-style-tests should help _fix_ bugs by limiting the
scope of the test and where the bug could be hiding.
The "in" attribute effectively runs tests _inside_ the .c file
specified, giving the test access to all static members without
needed to change their visibility.
This involved some minor tweaks for the various types of tests, added
predicates to the test framework (necessary for test_entries and
test_alloc), and cleaned up some of the testing semantics such as
reporting how many tests are filtered, showing permutation config on
the result screen, and properly inheriting suite config in cases.
The idea behind emubd (file per block), was neat, but doesn't add much
value over a block device that just operates on a single linear file
(other than adding a significant amount of overhead). Initially it
helped with debugging, but when the metadata format became more complex
in v2, most debugging ends up going through the debug.py script anyways.
Aside from being simpler, moving to filebd means it is also possible to
mount disk images directly.
Also introduced rambd, which keeps the disk contents in RAM. This is
very useful for testing where it increases the speed _significantly_.
- test_dirs w/ filebd - 0m7.170s
- test_dirs w/ rambd - 0m0.966s
These follow the emubd model of using the lfs_config for geometry. I'm
not convinced this is the best approach, but it gets the job done.
I've also added lfs_ramdb_createcfg to add additional config similar to
lfs_file_opencfg. This is useful for specifying erase_value, which tells
the block device to simulate erases similar to flash devices. Note that
the default (-1) meets the minimum block device requirements and is the
most performant.
Aside from reworking the internals of test_.py to work well with
inherited TestCase classes, this also provides the two main features
that were the main reason for revamping the test framework
1. ./scripts/test_.py --reentrant
Runs reentrant tests (tests with reentrant=true in the .toml
configuration) under gdb such that the program is killed on every
call to lfs_emubd_prog or lfs_emubd_erase.
Currently this just increments a number of prog/erases to skip, which
means it doesn't necessarily check every possible branch of the test,
but this should still provide a good coverage of power-loss tests.
2. ./scripts/test_.py --gdb
Run the tests and if a failure is hit, drop into GDB. In theory this
will be very useful for reproducing and debugging test failures.
Note this can be combined with --reentrant to drop into GDB on the
exact cycle of power-loss where the tests fail.
- Reworked how permutations work
- Now with global defines as well (apply to all code)
- Also supports lists of different permutation sets
- Added better cleanup in tests and "make clean"
This is the start of reworking littlefs's testing framework based on
lessons learned from the initial testing framework.
1. The testing framework needs to be _flexible_. It was hacky, which by
itself isn't a downside, but it wasn't _flexible_. This limited what
could be done with the tests and there ended up being many
workarounds just to reproduce bugs.
The idea behind this revamped framework is to separate the
description of tests (tests/test_dirs.toml) and the running of tests
(scripts/test.py).
Now, with the logic moved entirely to python, it's possible to run
the test under varying environments. In addition to the "just don't
assert" run, I'm also looking to run the tests in valgrind for memory
checking, and an environment with simulated power-loss.
The test description can also contain abstract attributes that help
control how tests can be ran, such as "leaky" to identify tests where
memory leaks are expected. This keeps test limitations at a minimum
without limiting how the tests can be ran.
2. Multi-stage-process tests didn't really add value and limited what
the testing environment.
Unmounting + mounting can be done in a single process to test the
same logic. It would be really difficult to make this fail only
when memory is zeroed, though that can still be caught by
power-resilient tests.
Requiring every test to be a single process adds several options
for test execution, such as using a RAM-backed block device for
speed, or even running the tests on a device.
3. Added fancy assert interception. This wasn't really a requirement,
but something I've been wanting to experiment with for a while.
During testing, scripts/explode_asserts.py is added to the build
process. This is a custom C-preprocessor that parses out assert
statements and replaces them with _very_ verbose asserts that
wouldn't normally be possible with just C macros.
It even goes as far as to report the arguments to strcmp, since the
lack of visibility here was very annoying.
tests_/test_dirs.toml:186:assert: assert failed with "..", expected eq "..."
assert(strcmp(info.name, "...") == 0);
One downside is that simply parsing C in python is slower than the
entire rest of the compilation, but fortunately this can be
alleviated by parallelizing the test builds through make.
Other neat bits:
- All generated files are a suffix of the test description, this helps
cleanup and means it's (theoretically) possible to parallelize the
tests.
- The generated test.c is shoved base64 into an ad-hoc Makefile, this
means it doesn't force a rebuild of tests all the time.
- Test parameterizing is now easier.
- Hopefully this framework can be repurposed also for benchmarks in the
future.