This is possible thanks to invoxiaamo's optimization of compacting
renames to avoid the O(n^3) nested filters. Not only does this
significantly reduce the runtime cost of that operation, but it
reduces the maximum possible depth of recursion to 3 frames.
Deepest lfs_dir_traverse before:
traverse with commit
'-> traverse with filter
'-> traverse with move
'-> traverse with filter
Deepest lfs_dir_traverse after:
traverse with commit
'-> traverse with move
'-> traverse with filter
Rename can be VERY time consuming. One of the reasons is the 4 recursion
level depth of lfs_dir_traverse() seen if a compaction happened during the
rename.
lfs_dir_compact()
size computation
[1] lfs_dir_traverse(cb=lfs_dir_commit_size)
- do 'duplicates and tag update'
[2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1])
- Reaching a LFS_FROM_MOVE tag (here)
[3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir
- do 'duplicates and tag update'
[4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3])
followed by the compaction itself:
[1] lfs_dir_traverse(cb=lfs_dir_commit_commit)
- do 'duplicates and tag update'
[2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1])
- Reaching a LFS_FROM_MOVE tag (here)
[3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir
- do 'duplicates and tag update'
[4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3])
Yet, analyse shows that levels [3] and [4] don't perform anything
if the callback is lfs_dir_traverse_filter...
A practical example:
- format and mount a 4KB block FS
- create 100 files of 256 Bytes named "/dummy_%d"
- create a 1024 Byte file "/test"
- rename "/test" "/test_rename"
- create a 1024 Byte file "/test"
- rename "/test" "/test_rename"
This triggers a compaction where lfs_dir_traverse was called 148393 times,
generating 25e6+ lfs_bd_read calls (~100 MB+ of data)
With the optimization, lfs_dir_traverse is now called 3248 times
(589e3 lfs_bds_calls (~2.3MB of data)
=> x 43 improvement...
- nit: Moving brace to end of if statement line for consistency
- mount: add more debug info per CR
- Fix compiler error from extra parentheses
- Fix superblock typo
The basic idea is simple, if we seek to a position in the currently
loaded cache, don't flush the cache. Notably this ensures that seek is
always as fast or faster than just reading the data.
This is a bit tricky since we need to check that our new block and
offset match the cache, fortunately we can skip the block check by
reevaluating the block index for both the current and new positions.
Note this only works whene reading, for writing we need to always flush
the cache, or else we will lose the pending write data.
Interestingly this was introduced by two different PRs which were not tested
together until pre-release testing:
- Fix lfs_file_seek doesn't update cache properties correctly
- Fix compiler warnings when LFS_READONLY defined
BUG CASE:Assume there are 6 blocks in littlefs, block 0,1,2,3 already allocated. 0 has a tail pair of {2, 3}. Now we try to write more into 0.
When writing to block 0, we will split(FIRST SPLIT), thus allocate block 4 and 5. Up to now , everything is as expected.
Then we will try to commit in block 4, during which split(SECOND SPLIT) is triggered again(In our case, some files are large, some are small, one split may not be enough). Still as expected now.
BUG happens when we try to alloc a new block pair for the second split:
As lookahead buffer reaches the end , a new lookahead buffer will be generated from flash content, and block 4, 5 are unused blocks in the new lookahead buffer because they are not programed yet. HOWEVER, block 4,5 should be occupied in the first split!!!!! The result is block 4,5 are allocated again(This is where things are getting wrong).
commit ce2c01f results in this bug. In the commit, a lfs_alloc_ack is inserted in lfs_dir_split, which will cause split to reset lfs->free.ack to block count.
In summary, this problem exists after 2.1.3.
Solution: don't call lfs_alloc_ack in lfs_dir_split.
lfs_dir_traverse is a bit unpleasant in that it is inherently a
recursive function, but without a strict bound of 4 calls (commit -> filter ->
move -> filter), and efforts to unroll the recursion comes at a
signification code cost.
It turns out the best solution I've found so far is to simple create an
explicit stack with an explicit bound of 4 calls (or more accurately,
3 pushed frames).
---
This actually highlights one of the bigger flaws in littlefs right now,
which is that this function, lfs_dir_traverse, takes O(n^2) disk reads
to traverse.
Note that LFS_FROM_MOVE can only occur once per commit, which is why
this code is O(n^2) and not O(n^4).
This mostly just required separate functions for "lfs_file_rawwrite" and
"lfs_file_flushedwrite", since lfs_file_flush recursively invokes
lfs_file_rawread and lfs_file_rawwrite.
This comes at a code cost, but gives us bounded and measurable RAM usage
on this code path.
lfs_dir_commit originally relied heavily on tail-recursion, though at
least one path (through relocations) was not tail-recursive, and could
cause unbounded stack usage in extreme cases of bad blocks. (Keep in
mind even extreme cases of bad blocks should be in scope for littlefs).
In order to remove recursion from this code path, several changed were
raequired:
- The lfs_dir_compact logic had to be somewhat inverted. Instead of
first compacting and then resolving issues such as relocations and
orphans, the overarching lfs_dir_commit now contains a state-machine
which after committing or compacting handles the extra changes to the
filesystem in a single, non-recursive loop
- Instead of fixing all relocations recursively, >1 relocation requires
defering to a full deorphan step. This step is unfortunately an
additional n^2 process. It also required some changes to lfs_deorphan
in order to ignore intentional orphans created as an intermediary in
lfs_mkdir. Maybe in the future we should remove these.
- Tail recursion normally found in lfs_fs_deorphan had to be rewritten
as a loop which restarts any time a new commit causes a relocation.
This does show that the algorithm may not terminate, but only if every
block is bad, which will eventually cause littlefs to run out of
blocks to write to.
When the on-disk block size doesn't match the config block size, it is
possible to get file corruption. For instance, if the num blocks was
0x200 and we re-mount with 0x100 files could be corrupt.
If we re-mount with a larger number of blocks things should be safer,
but could be handled with a resize option or perhaps a mount flag to
ignore this parameter.
lfs_mdir_isopen goes unused if asserts are disabled, and this caused an
"unused function" warning on Clang (curiously not on GCC since the
function was static inline, commonly used for header-only functions).
Also removed "inline" from the lfs_mdir_* functions as these involve
linked-list operations and really shouldn't be inlined. And since they
are static, inlining should occur automatically if there is a benefit.
Found by dpgeorge
This was caused by the new lfs_file_rawseek optimization that can skip
flushing when calculated file->pos is unchanged combined with an
implicit expectation in lfs_file_truncate that lfs_file_rawseek
unconditionally sets file->pos.
Because of this assumption, lfs_file_truncate could leave file->pos in
an outdated state while changing the internal file metadata. Humorously,
this was always gauranteed to trigger the skip in lfs_file_rawseek when
we try to restore the file->pos, leaving the file->cache used to do the
CTZ skip-list lookup in a potentially bad state.
The easiest fix is to just update file->pos correctly. Note we don't
want to explicitly flush since we can leverage the same noop
optimization if we truncate to the file position. Which I've added a
test for.
Mostly taken from .travis.yml, biggest changes were around how to get
the status updates to work.
We can't use a token on PRs the same way we could in Travis, so instead
we use a second workflow that checks every pull request for "status"
artifacts, and create the actual statuses in the "workflow_run" event,
where we have full access to repo secrets.
After a bit of tweaking in 9dde5c7 to write out all superblocks
during lfs_format, additional writes were added after the sanity
checking normally done at the end.
This turned out to be a problem when porting littlefs, as it makes it
easy for addressing issues to not get caught during lfs_format.
Found by marekr, tristanclare94, and mjs513
We have seen poor read performance on NAND flashes with 128kB blocks.
The root cause is inline files having to traverse many sets of metadata
pairs inside the current block before being fully reconstructed. Simply
disabling inline files is not enough, as the metadata will still fill up
the block and eventually need to be compacted.
By allowing configuration of how much size metadata takes up, along with
limiting (or disabling) inline file size, we achieve read performance
improvements on an order of magnitude.
This indirectly solves an issue with lfs_file_rawclose asserting
when lfs_file_opencfg errors since appending to the mlist occurs
after open. It also may speed up some of the internal operations such as
the lfs_file_write used to resolve unflushed data.
The idea behind adopting mlist over flags is that realistically it's
unlikely for the user to open a significant number of files (enough for
big O to kick in). That being said, moving the mlist asserts into the
API wrappers does protect some of the internal operations from scaling
based on the number of open files.