Found during testing, the issue was with lfs_migrate in combination with
wear leveling.
Normally, we can expect lfs_migrate to be able to respect the user-configured
block_cycles. It already has allocation information on which blocks are
used by both v1 and v2, so it should be safe to relocate blocks as
needed.
However, this fell apart when root was relocated. If lfs_migrate found a
root that needed migration, it would happily relocate the root. This
would normally be fine, except relocating the root has a side-effect of
needed to update the superblock. Which, during migration, is in a
delicate state of containing both v1's and v2's superblocks in the same
metadata pair. If the superblock ends up needing to compact, this would
clobber the v1 superblock and corrupt the filesystem during migration.
The best fix I could come up with is to specifically dissallow migrating the
root directory during migration. Fortunately this is behind the
LFS_MIGRATE macro, so the code cost for this check is not normally paid.
Due to the logging nature of metadata pairs, switching from inline files
(type3 = 0x201) to CTZ skip-lists (type3 = 0x202) does not explicitly
erase inline files, but instead leaves them up to compaction to omit.
To save code size, this is handled by the same logic that deduplicates
tags.
Unfortunately, this wasn't working. Due to a relatively late change in v2
the struct's type field was changed to no longer be a part of determining a
tag's "uniqueness". A part of this should have been the modification of
directory traversal filtering to respect type-dependent uniqueness, but
I missed this.
The fix is to add in correct type-dependent filtering. Also there was
some clean up necessary around removing delete tags during compaction
and outlining files.
Note that while this appears to conflict with the possibility of
combining inline + ctz files, we still have the device-side-only
LFS_TYPE_FROM tag that can be repurposed for 256 additional inline
"chunks".
Found by Johnxjj
Previously these returned LFS_ERR_BADF. But attempting to modify a file
opened read-only, or reading a write-only flie, is a user error and
should not occur in normal use.
Changing this to an assert allows the logic to be omitted if the user
disables asserts to reduce the code footprint (not suggested unless the
user really really knows what they're doing).
This is a minor quality of life change to help debugging, specifically
when debugging test failures.
Before, the test framework used hex, while the log output used decimal.
This was slightly annoying to convert between.
Why not output lengths/offset in hex? I don't have a big reason. I find
it easier to reason about lengths in decimal and ids (such as addresses
or block numbers) in hex. But this may just be me.
A current limitation of the lfs tag is the 10-bit (1024) length field.
This field is used to indicate padding for commits and effectively
limits the size of commits to 1KiB. Because commits must be prog size
aligned, this is a problem on devices with prog size > 1024.
[---- 6KiB erase block ----]
[-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --]
[ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ]
This can be increased to 12-bit (4096), but for NAND devices this is
still to small to completely solve the issue.
The previous workaround was to just create unaligned commits. This can
occur naturally if littlefs is used on portable media as the prog size
does not have to be consistent on different drivers. If littlefs sees
an unaligned commit, it treats the dir as unerased and must compact the
dir if it creates any new commits.
Unfortunately this isn't great. It effectively means that every small
commit forced an erase on devices with prog size > 1024. This is pretty
terrible.
[---- 6KiB erase block ----]
[-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --]
[ 1KiB commit |------------------- wasted ---------------------]
A different solution, implemented here, is to use multiple crc tags
to pad the commit until the remaining space fits in the padding. This
effectively looks like multiple empty commits and has a small runtime
cost to parse these tags, but otherwise does no harm.
[---- 6KiB erase block ----]
[-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --]
[ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ]
It was a bit tricky to implement, but now we can effectively support
unlimited prog sizes since there's no limit to the number of commits
in a block.
found by kazink and joicetm
Introduced in 0b76635, the workaround for erases sizes >1024 is to
commit with an unaligned CRC tag. Upon reading an unaligned CRC,
littlefs should treat the metadata pair as "requires erased". While
necessary for portability, this also lets us workaround the lack of
handling of erases sizes >1024.
Unfortunately, this workaround wasn't implemented correctly (by me)
in the case that the metadata-pair does not immediately compact. This
is solved here by added the erase check to lfs_dir_commit.
Note this is still only a part of a workaround which should be replaced.
One potential solution is to pad the commit with multiple smaller CRC
tags until we reach the next prog_size boundary.
found by kazink
As it is now, block_cycles = 0 disables wear leveling. This was a
mistake as 0 is the "default" value for several other config options.
It's even worse when migrating from v1 as it's easy to miss the addition
of block_cycles and end up with a filesystem that is not actually
wear-leveling.
Clearly, block_cycles = 0 should do anything but disable wear-leveling.
Here, I've changed block_cycles = 0 to assert. Forcing users to set a
value for block_cycles (500 is suggested). block_cycles can be set to -1
to explicitly disable wear leveling if desired.
This has been a large source of porting errors, partially due to my
fault in not having enough porting documentation, which is also
planned.
In the short term, asserts should at least help catch these types of
errors instead of just letting the filesystem collapse after recieving
an odd error code.
To use, compile and run with LFS_YES_TRACE defined:
make CFLAGS+=-DLFS_YES_TRACE=1 test_format
The name LFS_YES_TRACE was chosen to match the LFS_NO_DEBUG and
LFS_NO_WARN defines for the similar levels of output. The YES is
necessary to avoid a conflict with the actual LFS_TRACE macro that
gets emitting. LFS_TRACE can also be defined directly to provide
a custom trace formatter.
Hopefully having trace statements at the littlefs C API helps
debugging and reproducing issues.
Kind of a two-fold issue. One, the programming to the middle of inline
files was causing the cache to get updated to a half programmed state.
While fine, as all programs do occur in order in a block, this is less
efficient when writing to inline files as it would cause the inline file
to need to be reread even if it fits in the cache.
Two, the rereading of the inline file was broken and passed the file's
tag all the way to where a user would expect an error. This was easy to
fix but adds to the reasons we should have test coverage information.
Found by ebinans
The cause was mistakenly setting file->ctz.size directly instead of
file->pos, which file->ctz.size gets overwritten with later in
lfs_file_flush.
Also added better seek test cases specifically for inline files. This
should also catch most of the inline corner cases related to
lfs_file_size/lfs_file_tell.
Found by ebinans
This ensures that both blocks in the superblock pair are written with
the superblock info. While this does use an additional erase cycle, it
prevents older versions of littlefs from accidentally being picked up
in the case that the disk is mounted on a system that doesn't support
the newer version.
This does bring back the risk of picking up old littlefs versions on
a disk that has been formatted with a filesystem that doesn't use
block 2 (such as FAT), but this risk already exists, and moving between
versions of littlefs is more likely with the recent v1 -> v2 update.
Suggested by rojer
The data written to the prog cache would make littlefs internally
consistent, but because this was never written to disk, the filesystem
would become unmountable.
Unfortunately, this wasn't found during testing because caches automatically
flush if data is written up to a program boundary (maybe this was a mistake?).
Found by rojer
The maximum limit of inline files and attributes are unrelated, but were
not at a point in littlefs v2 development. This should be checking
against the bit-field limit in the littlefs tag.
Found by lsilvaalmeida
The problem was not setting the file state correctly after the truncate.
To truncate < size, we end up using the cache to traverse the ctz
skip-list far away from where our file->pos is.
We can leave the last block in the cache in case we're going to append
to the file, but if we do this we need to set up file->block+file->off
to tell use where we are in the file, and set the LFS_F_READING flag to
indicate that our cache contains read data.
Note this is different than the LFS_F_DIRTY, which we need also. The
purpose of the flags are as follows:
- LFS_F_DIRTY - file ctz skip-list branch is out of sync with
filesystem, need to update metadata
- LFS_F_READING - file cache is in use for reading, need to drop cache
- LFS_F_WRITING - file cache is in use for writing, need to write out
cache to disk
The difference between flags is subtle but important because read/prog
caches are handled differently. Prog caches have asserts in place to
catch programs without erases (the infamous pcache->block == 0xffffffff
assert).
Though maybe the names deserve an update...
Found by ebinans
In some cases specific alignment of buffer passed to underlying device
is required. For example SDMMC in STM32F7 (when used with DMA) requires
the buffers to be aligned to 16 bytes. If you enable data cache in
STM32F7, the alignment of buffer passed to any driver which uses DMA
should generally be at least 32 bytes.
While it is possible to provide sufficiently aligned "read", "prog" and
per-file caches to littlefs, the cases where caches are bypassed are
hard to control when littlefs is hidden under some additional layers.
For example if you couple littlefs with stdio and use it via `FILE*`,
then littlefs functions will operate on internal `FIlE*` buffer, usually
allocated dynamically, so in these specific cases - with insufficient
alignment (8 bytes on ARM Cortex-M).
The easy path was taken - remove all cases of cache bypassing.
Fixes#158
To make lfs_file_truncate inline with ftruncate function, when -ve
or size more than maximum file size is passed to function it should
return invalid parameter error. In LFS case LFS_ERR_INVAL.
Signed-off-by: Ajay Bhargav <contact@rickeyworld.info>
This is an expirement to determine which field in the tag structure is
the most critical: tag id or tag size.
This came from looking at NAND storage and discussions around behaviour of
large prog_sizes. Initial exploration indicates that prog_sizes around
2KiB are not _that_ uncommon, and the 1KiB limitation is surprising.
It's possible to increase the lfs_tag size to 12-bits (4096), but at the
cost of only 8-bit ids (256).
[---- 32 ----]
a [1|-3-|-- 8 --|-- 10 --|-- 10 --]
b [1|-3-|-- 8 --|-- 8 --|-- 12 --]
This requires more investigation, but in order to allow us to change
the tag sizes with minimal impact I've artificially limited the number
of file ids to 0xfe (255) different file ids per metadata pair. If
12-bit lengths turn out to be a bad idea, we can remove the artificial
limit without backwards incompatible changes.
To avoid breaking users already on v2-alpha, this change will refuse
_creating_ file ids > 255, but should read file ids > 255 without
issues.
- Shifting signed 32-bit value by 31 bits is undefined behaviour
This was an interesting one as on initial inspection, `uint8_t & 1`
looks like it will result in an unsigned variable. However, due to
uint8_t being "smaller" than int, this actually results in a signed
int, causing an undefined shift operation.
- Identical inner 'if' condition is always true (outer condition is
'true' and inner condition is 'true').
This was caused by the use of `if (true) {` to avoid "goto bypasses
variable initialization" warnings. Using just `{` instead seems to
avoid this problem.
found by keck-in-space and armandas
In v2, the lookahead_buffer was changed from requiring 4-byte alignment
to requiring 8-byte alignment. This was not documented as well as it
could be, and as FabianInostroza noted, this also implies that
lfs_malloc must provide 8-byte alignment.
To protect against this, I've also added an assert on the alignment of
both the lookahead_size and lookahead_buffer.
found by FabianInostroza and amitv87
lfs_file_sync was not correctly setting the LFS_F_ERRED flag.
Fortunately this is a relatively easy fix. LFS_F_ERRED prevents
further issues from occuring when cleaning up resources with
lfs_file_close.
found by TheLoneWolfling
The issue here is how commits handle padding to the nearest program
size. This is done by exploiting the size field of the LFS_TYPE_CRC
tag that completes the commit. Unfortunately, during developement, the
size field shrank in size to make room for more type information,
limiting the size field to 1024.
Normally this isn't a problem, as very rarely do program sizes exceed
1024 bytes. However, using a simulated block device, user earlephilhower
found that exceeding 1024 caused littlefs to crash.
To make this corner case behave in a more user friendly manner, I've
modified this situtation to treat >1024 program sizes as small commits
that don't match the prog size. As a part of this, littlefs also needed
to understand that non-matching commits indicate an "unerased" dir
block, which would be needed for portability (something which notably
lacks testing).
This raises the question of if the tag size field size needs to be
reconsidered, but to change that at this point would need a new major
version.
found by earlephilhower
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
This only required adding NULLs where commit statements were not fully
initialized.
Unfortunately we still need -Wno-missing-field-initializers because
of a bug in GCC that persists on Travis.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60784
Found by apmorton
- Fixed uninitialized values found by valgrind.
- Fixed uninitialized value in lfs_dir_fetchmatch when handling revision
counts.
- Fixed mess left by lfs_dir_find when attempting to find the root
directory in lfs_rename and lfs_remove.
- Fixed corner case with definitions of lfs->cfg->block_cycles.
- Added test cases around different forms of the root directory.
I think all of these were found by TheLoneWolfling, so props!
This was caused by any commit containing entries large enough to
_always_ force a compaction. This would cause littlefs to think that it
would need to split infinitely because there was no base case.
The fix here is pretty simple: treat any commit with only a single entry
as unsplittable. This forces littlefs to first try overcompacting
(fitting more in a block than what has optimal runtime), and then
failing that return LFS_ERR_NOSPC for higher layers to handle.
found by TheLoneWolfling
The problem was when we allocate a dir-pair, it's possible for the
revision count to immediately overflow and the dir-pair be evicted and
returned to the unused blocks without being written even once. In the
case that block_cycles = 1, this made it impossible to ever create a
dir-pair, even in lfs_format.
I've also added a bit of logic to lfs_dir_alloc that will prevent
any immediate evictions because of the revision count.
found by TheLoneWolfling
- Fixed cache tarnishing issue where flush did not clean up read caches
- Removed extra alloc acks which would prevent file relocations from
resolving on an exhausted filesystem
- Removed unsigned comparison < 0 from changed in file seek
- Fixed bug in lfs_dir_getslice with using gtag's size
- Removed warnings around PRIu32 used 16-bit types in debug info
Caught during power resilience testing, this was a bug that only occurs
when we need to compact in the middle of a move commit and we find that
the destination block is bad, forcing a relocate.
This series of events would cause littlefs to clear the "gpending" state
in preparation for fixing the move atomically, but this fix never gets
written out because of the relocate.
The fix here is to separate the update to the "gdelta" and "gpending"
state, marking "gdelta" in preparation for the move, but waiting to
update "gpending" until after our commit completes. This keeps our disk
state in sync without prematurely dropping moves.
Before this, there were some safety limits, but there was no real
default limit to the size of inline files other than the amount of RAM
available. On PCs, this meant that inline files were free to fill up
directory blocks to a little under the block size.
However this is very wasteful in terms of storage space. Because of
splitting limits to keep the compact runtime reasonable, each byte of an
inline files uses 4x the amount.
Fortunately we can find an optimal inline limit:
Inline file waste for n bytes = 3n
CTZ file waste for n bytes = B - n
Where B = block size
Solving for n = B/4
So the optimal inline limit is B/4. However, this assumes a perfect inline
file and no metadata. We can decrease this to B/8 to give a bit more
breathing room for directory+file metadata.
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
While linked-lists do have some minor benefits, arrays are more
idiomatic in C and may provide a more intuitive API.
Initially the linked-list approach was more beneficial than it is now,
since it allowed custom attributes to be chained to internal linked
lists of attributes. However, this was dropped because exposing the
internal attribute list in this way created a rather messy user
interface that required strictly encoding the attributes with the
on-disk tag format.
Minor downside, users can no longer introduce custom attributes in
different layers (think OS vs app). Minor upside, the code size and
stack usage was reduced a bit.
Fortunately, this API can always be changed in the future without
breaking anything (except maybe API compatibility).
The main difference here is a change from encoding "hasorphans" and
"hasmove" bits in the tag itself. This worked with the old format, but
in the new format the space these bits take up must be consistent for
each tag type. The tradeoff is that the new tag format allows for up to
256 different global states which may be useful in the future (for
example, a global free list).
The new format encodes this info in the data blob, using an additional
word of storage. This word is actually formatted the same as though it
was a tag, which simplified internal handling and may allow other tag
types in the future.
Format for global state:
[---- 96 bits ----]
[1|- 11 -|- 10 -|- 10 -|--- 64 ---]
^ ^ ^ ^ ^- move dir pair
| | | \-------------------------- unused, must be 0s
| | \--------------------------------- move id
| \---------------------------------------- type, 0xfff for move
\--------------------------------------------- has orphans
This also included another iteration over globals (renamed to gstate)
with some simplifications to how globals are handled.
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.