A relatively late change in v2 was to switch from implicit replacement
of inline files to explicit replacement, which requires explicitly
deleting the inline struct tag before creating a CTZ skip-list tag.
Unfortunately I never added the actual logic to delete the tag. This
went unnoticed as the later CTZ skip-list tag always overrides the
search for the struct tag during a directory fetch.
To fix we need to add an explicit delete. Also there was some clean up
necessary for actually removing delete tags during compaction as well
as some refactoring around outlining.
Found by Johnxjj
Kind of a two-fold issue. One, the programming to the middle of inline
files was causing the cache to get updated to a half programmed state.
While fine, as all programs do occur in order in a block, this is less
efficient when writing to inline files as it would cause the inline file
to need to be reread even if it fits in the cache.
Two, the rereading of the inline file was broken and passed the file's
tag all the way to where a user would expect an error. This was easy to
fix but adds to the reasons we should have test coverage information.
Found by ebinans
The cause was mistakenly setting file->ctz.size directly instead of
file->pos, which file->ctz.size gets overwritten with later in
lfs_file_flush.
Also added better seek test cases specifically for inline files. This
should also catch most of the inline corner cases related to
lfs_file_size/lfs_file_tell.
Found by ebinans
This ensures that both blocks in the superblock pair are written with
the superblock info. While this does use an additional erase cycle, it
prevents older versions of littlefs from accidentally being picked up
in the case that the disk is mounted on a system that doesn't support
the newer version.
This does bring back the risk of picking up old littlefs versions on
a disk that has been formatted with a filesystem that doesn't use
block 2 (such as FAT), but this risk already exists, and moving between
versions of littlefs is more likely with the recent v1 -> v2 update.
Suggested by rojer
The data written to the prog cache would make littlefs internally
consistent, but because this was never written to disk, the filesystem
would become unmountable.
Unfortunately, this wasn't found during testing because caches automatically
flush if data is written up to a program boundary (maybe this was a mistake?).
Found by rojer
The maximum limit of inline files and attributes are unrelated, but were
not at a point in littlefs v2 development. This should be checking
against the bit-field limit in the littlefs tag.
Found by lsilvaalmeida
The problem was not setting the file state correctly after the truncate.
To truncate < size, we end up using the cache to traverse the ctz
skip-list far away from where our file->pos is.
We can leave the last block in the cache in case we're going to append
to the file, but if we do this we need to set up file->block+file->off
to tell use where we are in the file, and set the LFS_F_READING flag to
indicate that our cache contains read data.
Note this is different than the LFS_F_DIRTY, which we need also. The
purpose of the flags are as follows:
- LFS_F_DIRTY - file ctz skip-list branch is out of sync with
filesystem, need to update metadata
- LFS_F_READING - file cache is in use for reading, need to drop cache
- LFS_F_WRITING - file cache is in use for writing, need to write out
cache to disk
The difference between flags is subtle but important because read/prog
caches are handled differently. Prog caches have asserts in place to
catch programs without erases (the infamous pcache->block == 0xffffffff
assert).
Though maybe the names deserve an update...
Found by ebinans
Both the littlefs-fuse and littlefs-migration test jobs depend on
the external littlefs-fuse repo. But unfortunately, the automatic
patching to update the external repo with the version under test
does not work with the prefix branches.
In this case we can just skip these tests, they've already been tested
multiple times to get to this point.
In some cases specific alignment of buffer passed to underlying device
is required. For example SDMMC in STM32F7 (when used with DMA) requires
the buffers to be aligned to 16 bytes. If you enable data cache in
STM32F7, the alignment of buffer passed to any driver which uses DMA
should generally be at least 32 bytes.
While it is possible to provide sufficiently aligned "read", "prog" and
per-file caches to littlefs, the cases where caches are bypassed are
hard to control when littlefs is hidden under some additional layers.
For example if you couple littlefs with stdio and use it via `FILE*`,
then littlefs functions will operate on internal `FIlE*` buffer, usually
allocated dynamically, so in these specific cases - with insufficient
alignment (8 bytes on ARM Cortex-M).
The easy path was taken - remove all cases of cache bypassing.
Fixes#158
To make lfs_file_truncate inline with ftruncate function, when -ve
or size more than maximum file size is passed to function it should
return invalid parameter error. In LFS case LFS_ERR_INVAL.
Signed-off-by: Ajay Bhargav <contact@rickeyworld.info>
This is an expirement to determine which field in the tag structure is
the most critical: tag id or tag size.
This came from looking at NAND storage and discussions around behaviour of
large prog_sizes. Initial exploration indicates that prog_sizes around
2KiB are not _that_ uncommon, and the 1KiB limitation is surprising.
It's possible to increase the lfs_tag size to 12-bits (4096), but at the
cost of only 8-bit ids (256).
[---- 32 ----]
a [1|-3-|-- 8 --|-- 10 --|-- 10 --]
b [1|-3-|-- 8 --|-- 8 --|-- 12 --]
This requires more investigation, but in order to allow us to change
the tag sizes with minimal impact I've artificially limited the number
of file ids to 0xfe (255) different file ids per metadata pair. If
12-bit lengths turn out to be a bad idea, we can remove the artificial
limit without backwards incompatible changes.
To avoid breaking users already on v2-alpha, this change will refuse
_creating_ file ids > 255, but should read file ids > 255 without
issues.
- Shifting signed 32-bit value by 31 bits is undefined behaviour
This was an interesting one as on initial inspection, `uint8_t & 1`
looks like it will result in an unsigned variable. However, due to
uint8_t being "smaller" than int, this actually results in a signed
int, causing an undefined shift operation.
- Identical inner 'if' condition is always true (outer condition is
'true' and inner condition is 'true').
This was caused by the use of `if (true) {` to avoid "goto bypasses
variable initialization" warnings. Using just `{` instead seems to
avoid this problem.
found by keck-in-space and armandas
In v2, the lookahead_buffer was changed from requiring 4-byte alignment
to requiring 8-byte alignment. This was not documented as well as it
could be, and as FabianInostroza noted, this also implies that
lfs_malloc must provide 8-byte alignment.
To protect against this, I've also added an assert on the alignment of
both the lookahead_size and lookahead_buffer.
found by FabianInostroza and amitv87
lfs_file_sync was not correctly setting the LFS_F_ERRED flag.
Fortunately this is a relatively easy fix. LFS_F_ERRED prevents
further issues from occuring when cleaning up resources with
lfs_file_close.
found by TheLoneWolfling
The issue here is how commits handle padding to the nearest program
size. This is done by exploiting the size field of the LFS_TYPE_CRC
tag that completes the commit. Unfortunately, during developement, the
size field shrank in size to make room for more type information,
limiting the size field to 1024.
Normally this isn't a problem, as very rarely do program sizes exceed
1024 bytes. However, using a simulated block device, user earlephilhower
found that exceeding 1024 caused littlefs to crash.
To make this corner case behave in a more user friendly manner, I've
modified this situtation to treat >1024 program sizes as small commits
that don't match the prog size. As a part of this, littlefs also needed
to understand that non-matching commits indicate an "unerased" dir
block, which would be needed for portability (something which notably
lacks testing).
This raises the question of if the tag size field size needs to be
reconsidered, but to change that at this point would need a new major
version.
found by earlephilhower
The script itself is a part of .travis.yml, using ./scripts/prefix.py
for applying prefixes to the source code.
This purpose of the automatic job is to provide a branch containing
version prefixes, to avoid name conflicts in binaries containing
different major versions of littlefs with only a git clone.
As a part of each release, two branches and a tag are created:
- vN - moving branch
- vN-prefix - moving branch
- vN.N.N - immutable tag
The major version branch (vM) is created on major releases, but updated
every patch release. The patch version tag (vM.M.P) is created every
patch release. Patch releases occur every time a commit is merged into
master, though multiple merges may be coalesced.
The major prefix branch (vM-prefix) is modified with the ./scripts/prefix.py
script. Note that this branch is updated as a synthetic merge commit
with the previous history of vM-prefix. The reason for this is to allow
users to easily update vM-prefix with a `git pull` as they would for
other branches.
A---B---C---D---E master, v1, v1.7.3
\ \ \
F-------G---H v1-prefix
Now with graphs! Images are stored on the branch gh-images in an effort
to avoid binary bloat in the git history.
Also spruced up SPEC.md and README.md and ran a spellechecker over the
documentation. Favorite typo so far was dependendent, which is, in fact,
not a word.
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
This only required adding NULLs where commit statements were not fully
initialized.
Unfortunately we still need -Wno-missing-field-initializers because
of a bug in GCC that persists on Travis.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60784
Found by apmorton
- Fixed uninitialized values found by valgrind.
- Fixed uninitialized value in lfs_dir_fetchmatch when handling revision
counts.
- Fixed mess left by lfs_dir_find when attempting to find the root
directory in lfs_rename and lfs_remove.
- Fixed corner case with definitions of lfs->cfg->block_cycles.
- Added test cases around different forms of the root directory.
I think all of these were found by TheLoneWolfling, so props!
This was caused by any commit containing entries large enough to
_always_ force a compaction. This would cause littlefs to think that it
would need to split infinitely because there was no base case.
The fix here is pretty simple: treat any commit with only a single entry
as unsplittable. This forces littlefs to first try overcompacting
(fitting more in a block than what has optimal runtime), and then
failing that return LFS_ERR_NOSPC for higher layers to handle.
found by TheLoneWolfling
The problem was when we allocate a dir-pair, it's possible for the
revision count to immediately overflow and the dir-pair be evicted and
returned to the unused blocks without being written even once. In the
case that block_cycles = 1, this made it impossible to ever create a
dir-pair, even in lfs_format.
I've also added a bit of logic to lfs_dir_alloc that will prevent
any immediate evictions because of the revision count.
found by TheLoneWolfling
- Fixed cache tarnishing issue where flush did not clean up read caches
- Removed extra alloc acks which would prevent file relocations from
resolving on an exhausted filesystem
- Removed unsigned comparison < 0 from changed in file seek
- Fixed bug in lfs_dir_getslice with using gtag's size
- Removed warnings around PRIu32 used 16-bit types in debug info
Caught during power resilience testing, this was a bug that only occurs
when we need to compact in the middle of a move commit and we find that
the destination block is bad, forcing a relocate.
This series of events would cause littlefs to clear the "gpending" state
in preparation for fixing the move atomically, but this fix never gets
written out because of the relocate.
The fix here is to separate the update to the "gdelta" and "gpending"
state, marking "gdelta" in preparation for the move, but waiting to
update "gpending" until after our commit completes. This keeps our disk
state in sync without prematurely dropping moves.
Before this, there were some safety limits, but there was no real
default limit to the size of inline files other than the amount of RAM
available. On PCs, this meant that inline files were free to fill up
directory blocks to a little under the block size.
However this is very wasteful in terms of storage space. Because of
splitting limits to keep the compact runtime reasonable, each byte of an
inline files uses 4x the amount.
Fortunately we can find an optimal inline limit:
Inline file waste for n bytes = 3n
CTZ file waste for n bytes = B - n
Where B = block size
Solving for n = B/4
So the optimal inline limit is B/4. However, this assumes a perfect inline
file and no metadata. We can decrease this to B/8 to give a bit more
breathing room for directory+file metadata.
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
While linked-lists do have some minor benefits, arrays are more
idiomatic in C and may provide a more intuitive API.
Initially the linked-list approach was more beneficial than it is now,
since it allowed custom attributes to be chained to internal linked
lists of attributes. However, this was dropped because exposing the
internal attribute list in this way created a rather messy user
interface that required strictly encoding the attributes with the
on-disk tag format.
Minor downside, users can no longer introduce custom attributes in
different layers (think OS vs app). Minor upside, the code size and
stack usage was reduced a bit.
Fortunately, this API can always be changed in the future without
breaking anything (except maybe API compatibility).
The main difference here is a change from encoding "hasorphans" and
"hasmove" bits in the tag itself. This worked with the old format, but
in the new format the space these bits take up must be consistent for
each tag type. The tradeoff is that the new tag format allows for up to
256 different global states which may be useful in the future (for
example, a global free list).
The new format encodes this info in the data blob, using an additional
word of storage. This word is actually formatted the same as though it
was a tag, which simplified internal handling and may allow other tag
types in the future.
Format for global state:
[---- 96 bits ----]
[1|- 11 -|- 10 -|- 10 -|--- 64 ---]
^ ^ ^ ^ ^- move dir pair
| | | \-------------------------- unused, must be 0s
| | \--------------------------------- move id
| \---------------------------------------- type, 0xfff for move
\--------------------------------------------- has orphans
This also included another iteration over globals (renamed to gstate)
with some simplifications to how globals are handled.
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
This is to prepare for future compatibility with other implementations
of the allocator's lookahead that are under consideration. The most
promising design so far is a sort of segments-list data structure that
stores pointer+size pairs, requiring 64-bits of alignment.
Changing this now takes advantage of the major version to avoid a
compatibility break in the future. If we end up not changing the
allocator or don't need 64-bit alignment we can easily drop this
requirement without breaking anyone's code.
There was an interesting subtlety with the existing layout of tags that
could become a problem in the future. Basically, littlefs avoids writing to
any region of storage it is not absolutely sure has been erased
beforehand. This is a part of limiting the number of assumptions about
storage. It's possible a storage technology can't support writes without
erases in a way that is undetectable at write time (Maybe changing a bit
without an erase decreases the longevity of the information stored on
the bit).
But the existing layout had a very tiny corner case where this wasn't
true. Consider the location of the valid bit in the tag struct:
[1|--- 31 ---]
^--- valid bit
The responsibility of this bit is to indicate if an attempt has been
made to write the following commit. If it is not set (the specific value
is dependent on a previous read and identified by the preceeding commit),
the assumption is that it is safe to write to the next region because it
has been erased previously. If it is set, we check if the next commit is
valid, if it isn't (because of CRC failure, likely due to power-loss), we
discard the commit. But because an attempt has been made to write to
that storage, we must then do a compaction to move to the other block in
the metadata-pair.
This plan looks good on paper, but what does it look like on storage?
The problem is that words in littlefs are in little-endian. So on
storage the tag actually looks like this:
[- 8 -|- 8 -|- 8 -|1|- 7 -]
^-- valid bit
This means that we don't actually set the valid bit before writing the
tag! We write the lower bytes first. If we lose power, we may have
written 3 bytes without this fact being detectable.
We could restructure the tag structure to store the valid bit lower,
however because none of the fields are 7 bits, this would make the
extraction more costly, and we then lose the ability to check this
valid bit with a sign comparison.
The simple solution is to just store the tag in big-endian. A small
benefit is that this will actually have a negative code cost on
big-endian machines.
This mixture of endiannesses is frustrating, however it is a pragmatic
solution with only a 20-byte code size cost.
- Fixed underflow issue caused by search id shortcuts that would result
in early termination from lfs_dir_get
- Fixed issue where entry file delete would toss out the best id during
lfs_dir_fetchmatch
- Fixed globals going out of date when canceling in same metadata-pair
- Fixed early removal of metadata-pair when attribute list contains
creates after deletes bring dir->count to zero
Interesting open-source projects that I've ran into around embedded
storage. May be interesting to others in the embedded space.
Added mklfs, SPIFFS, and Dhara.
Also a thanks to jolivepetrus for posting the mklfs tool he put
together.
On disk, littlefs uses 32-bit integers to track file size. This sets a
theoretical limit of 4GiB for files.
However, the API passes file sizes around as signed numbers, with
negative values representing error codes. This means that not all of the
APIs will work with file sizes > 2GiB.
Because of related complications over in FUSE land, I've added the LFS_FILE_MAX
constant and proper error reporting if file writes/seeks exceed the 2GiB limit.
In v2 this will join the other constants that get stored in the
superblock to help portability. Since littlefs is targeting
microcontrollers, it's likely this will be a sufficient solution.
Note that it's still possible to enable partial-support for 4GiB files
by defining LFS_FILE_MAX during compilation. This will work for most of
the APIs, except lfs_file_seek, lfs_file_tell, and lfs_file_size.
We can also consider improving support for 4GiB files, by making seek a
bit more complicated and adding a lfs_file_stat function. I'll leave
this for a future improvement if there's interest.
Found by cgrozemuller
The issue happens when a rename causes a split in the destination pair.
If the destination pair is the same as the source pair, this triggers the
logic to keep both pairs in sync. Unfortunately, this logic didn't work,
because the source entry still resides in the old source pair, unlike
the destination pair, which is now in the new pair created by the split.
The best fix for now is to refetch the source pair after the changes to the
destination pair. This isn't the most efficient solution, but fortunately
this bug has already been fixed in the revamped move logic in littlefs v2
(currently in progress).
Found by ohoc
This was an oversight on my part when adding strict ordering to
directories. Unfortunately now we can't take advantage of the atomic
creation of tail+dir entries. Now we need to first create the tail, then
create the actually directory entry. If we lose power, the orphan is
cleaned up like orphans created during remove.
Note that we still take advantage of the atomic tail+dir entries if we
are an end block. This is actually because this corner case is
complicated to _not_ do atomically, needing to update the directory we
just committed to.
A rather humorous issue, we accidentally ended up mixing our file
namespace with our superblocks. This meant if we created a file named
"littlefs" it would reference the superblock and all sorts of things
would break.
Fixing this also highlighted another issue, the fact that the superblock
always needs to come before any file entries in the directory. I didn't
account for this in the initial B-tree design, but we need a higher
ordering for superblocks + children + files than just name. To fix this
I added ordering information in the 2 bits currently unused in the tag
type. Though note that the size of these fields are flexible.
9-bit type field:
[--- 9 ---]
[1|- 3 -|- 2 -|- 3 -]
^ ^ ^ ^- type-specific info
| | \------- ordering info
| \------------- subtype
\----------------- user bit
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
The fact that the lookahead buffer uses bits instead of bytes is an
internal detail. Poking this through to the user API has caused a decent
amount of confusion. Most buffers are provided as bytes and the
inconsistency here can be surprising.
The use of bytes instead of bits also makes us forward compatible in
the case that we want to change the lookahead internal representation
(hint segment list).
Additionally, we change the configuration name to lookahead_size. This
matches other configurations, such as cache_size and read_size, while
also notifying the user that something important changed at compile time
(by breaking).
While technically, both system and user attributes share the same disk
limitations, that's not what attr_max represents when considered from
the user's perspective. To the user, attr_max applies only to custom
attributes. This means attr_max should not impact other configurable
limitations, such as inline files, and the ordering should be
reconsidered with what the user finds most important.
The downside of smarter caching is that now there are more complicated
corner cases to consider. Here we weren't considering our pcaches when
aligning reads to the rcache. This meant if things were unaligned, we
would read a cache-line that overlaps the pcache and then proceed to
ignore whatever we overlapped.
This fix is to determine the limit of an rcache read not from cache
alignment but from the available caches, which we check anyways to find
cached data.
This is a minor tweak that resulted from looking at some other use cases
for the littlefs data-structure on disk. Consider an implementation that
does not need to buffer inline-files in RAM. In this case we should have
as large a tag size field as possible. Unfortunately, we don't have much
space to work with in the 32-bit tag struct, so we have to make some
compromises. These limitations could be removed with a 64-bit tag
struct, at the cost of code size.
32-bit tag structure:
[--- 32 ---]
[1|- 9 -|- 9 -|-- 13 --]
^ ^ ^ ^- entry length
| | \-------- file id
| \-------------- tag type
\------------------ valid bit
Depending on your perspective, this may not be a necessary operation,
given that a nearly-full filesystem is already prone to ENOSPC errors,
especially a COW filesystem. However, splitting metadata-pairs can
happen in really unfortunate situations, such as removing files.
The solution here is to allow "overcompaction", that is, a compaction
without bounds checking to allow splitting. This unfortunately pushes
our metadata-pairs past their reasonable limit of saturation, which
means writes get exponentially costly. However it does allow littlefs to
continue working in extreme situations.
Added separate bit for "hasmove", which means we don't need to check
the move id, and allows us to add more sync-related global states in
the future, as long as they never happen simultaneously (such as
orphans and moves).
Also refactored some of the logic and removed the union in the global
structure, which didn't really add anything of value.
Conceptually these are two separate operations. However, they are both
only needed during mount, both require iteration over the linked-list of
metadata-pairs, and both are independent from each other.
Combining these into one gives us a nice code savings.
Additionally, this greatly simplifies the lookup of the root directory.
Initially we used a flag to indicate which superblock was root, since we
didn't want to fetch more pairs than we needed to. But since we're going
to fetch all metadata-pairs anyways, we can just use the last superblock
we find as the indicator of our root directory.
The xored-globals have a very large footprint. In the worst case, the
xored-globals are stored on each metadata-pair, twice in memory. They
must be very small, but are also very useful, so at risk of growing
in the future (hint global free-list?).
Initially we also stored a copy in each mdir structure, since this
avoided extra disk access to look up the globals when we need to modify
the global state on a metadata-pair. But we can easily just fetch the
globals when needed.
This is more costly in terms of runtime, but reduces RAM impact of
globals, which was previously needed for each open dir and file.
- Callbacks for get/match, this does have a code cost, but allows more
code reuse, which almost balances out the code cost, but also reduces
maintenance and increased flexibility. Also callbacks may be able to
be gc-ed in some cases.
- Consistent struct vs _t usage, _t for external-facing struct that
shouldn't be messed with outside the library. structs for external and
internal structs where anyone with access is allowed to modify.
- Reorganized several high-level function groups
- Inlined structures that didn't need separate definitions in header
This follows from enabling tag deletion, however does require some
consideration with the APIs.
Now we can remove custom attributes, as well as determine if an attribute
exists or not.
littlefs has a mechanism for deleting file entries, but it doesn't have
a mechanism for deleting individual tags. This _is_ sufficient for a
filesystem, but limits our flexibility. Deleting attributes would be
useful in the custom attribute API and for future improvements (hint the
child pointers in B-trees).
However, deleteing attributes is tricky. We can't just omit the
attribute, since we can only add new tags. Additionally, we need a way
to track what attributes have been deleted during compaction, which
currently relies on writing out attributes to disk.
The solution here is pretty nifty. First we have to come up with a way
to represent a "deleted" attribute. Rather than adding an additional
bit to the already squished tag structure, we use a -1 length field,
specifically 0xfff. Now we can commit a delete attribute, and this
deleted tag acts as a place holder during compacts.
However our delete tag will never leave our metadata log. We need some
way to discard our delete tag if we know it's the only representation of
that tag on the metadata log. Ah! We know it's the only tag if it's in
the first commit on the metadata log. So we add an additional bit to the
CRC entry to indicate if we're on the first commit, and use that to
decide if we need to keep delete tags around.
Now we have working tag deletion.
Interestingly enough, tag deletion is actually indirectly more efficient
than entry deletion, since compacting entries requires multiple passes,
whereas tag deletion gets cleaned up lazily. However we can't adopt the
same strategy in entry deletion because of the compact ordering of
entries. Tag deletion works because tag types are unique and static.
Managing entry deletion in this manner would require static id
allocation, which would cause problems when creating files, running out
of space, and disallow arbitrary insertions of files.
Currently unused, the insertion of new file entries in arbitrary
locations in a metadata-pair is very easy to add into the existing
metadata logging.
The only tricky things:
1. Name tags must strictly precede any tags related to a file. We can
pull this off during a compact, but must make two passes. One for the
name tag, one for the file. Though a benefit of this is that now our
scans during moves can exit early upon finding the name tag.
1. We need to handle name tags appearing out of order. This makes name
tags symmetric to deletes, although it doesn't seem like we can
leverage this fact very well. Note this also means we need to make
the superblock tag a type of name tag.
The valid bit present in tags is a requirement to properly detect the
end of commits in metadata logs. The way it works is that the CRC entry is
allowed to specify what is needed from the next tag's valid bit. If it's
incorrect, we've reached the end of the commit. We then set the valid bit to
indicate when we tried to program a new commit. If we lose power, this
commit will still be thrown out by a bad checksum.
However, the valid bit is unused outside of the CRC entry. Here we turn on the
valid bit for all tags, which means we have a decent chance of exiting early
if we hit a half-written commit. We still need to guarantee detection of
the valid bit on commits following the CRC entry, so we allow the CRC
entry to flip the expected valid bit.
The only tricky part is what valid bit we expect by default, since this
is used on the first commit on a metadata log. Here we default to a 1,
which gives us the fastest exit on blocks that erase to 0. This is
because blocks that erase to 1s will implicitly flip the valid bit of
the next tag, allowing us to exit on the next tag.
If we defaulted to 0, we could exit faster on disks that erase to 1, but
would need to scan the entire block on disks that erase to 0 before we
realize a CRC commit is never coming.
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
While ECORRUPT is not a wrong error code, it doesn't match other
instances of hitting a corrupt block during write. During writes, if
blocks are detected as corrupt their data is evicted and moved to a new
clean block. This means that at the end of a disk's lifetime, exhaustion
errors will be reported as ENOSPC when littlefs can't find any new block
to store the data.
This has the benefit of matching behaviour when a new file is written
and no more blocks can be found, due to either a small disk or corrupted
blocks on disk. To littlefs it's like the disk shrinks in size over
time.
The initial implementation of inline files was thrown together fairly
quicky, however it has worked well so far and there hasn't been much
reason to change it.
One shortcut was to trick file writes into thinking they are writing to
imaginary blocks. This works well and reuses most of the file code
paths, as long as we don't flush the imaginary block out to disk.
Initially we did this by limiting inline_max to cache_max-1, ensuring
that the cache never fills up and gets flushed. This was a rather dirty
hack, the better solution, implemented here, is to handle the
representation of an "imaginary" block correctly all the way down into
the cache layer.
So now for files specifically, the value -1 represents a null pointer,
and the value -2 represents an "imaginary" block. This may become a
problem if the number of blocks approaches the max, however this -2
value is never written to disk and can be changed in the future without
breaking compatibility.
Because a block can go bad at any time, if we're unlucky, we may end up
generating multiple orphans in a single metadata write. This is
exacerbated by the early eviction in dynamic wear-leveling.
We can't track _all_ orphans, because that would require unbounded
storage and significantly complicate things, but there are a handful of
intentional orphans we do track because they are easy to resolve without
the O(n^2) deorphan scan. These are anytime we intentionally remove a
metadata-pair.
Initially we cleaned up orphans as they occur with whatever knowledge we
do have, and just accepted the extra O(n^2) deorphan scans in the
unlucky case. However we can do a bit better by being lazy and leaving
deorphaning up to the next metadata write. This needs to work with the known
orphans while still setting the orphan flag on disk correctly. To
accomplish this we replace the internal flag with a small counter.
Note, this means that our internal representation of orphans differs
from what's on disk. This is annoying but not the end of the world.
This was a pretty simple oversight on my part. Conceptually, there's no
difference between lfs_fs_getattr and lfs_getattr("/"). Any operations
on directories can be applied "globally" by referring to the root
directory.
Implementation wise, this actually fixes the "corner case" of storing
attributes on the root directory, which is broken since the root
directory doesn't have a related entry. Instead we need to use the root
superblock for this purpose.
Fewer functions means less code to document and maintain, so this is a
nice benefit. Now we just have a single lfs_getattr/setattr/removeattr set
of functions along with the ability to access attributes atomically in
lfs_file_opencfg.
This implements the second step of full dynamic wear-leveling, block
allocation randomization. This is the key part the uniformly distributes
wear across the filesystem, even through reboots.
The entropy actually comes from the filesystem itself, by xoring
together all of the CRCs in the metadata-pairs on the filesystem. While
this sounds like a ridiculous operation, it's easy to do when we already
scan the metadata-pairs at mount time.
This gives us a random number we can use for block allocation.
Unfortunately it's not a great general purpose random generator as the
output only changes every filesystem write. Fortunately that's exactly
when we need our allocator.
---
Additionally, the randomization created a mess for the testing
framework. Fortunately, this method of randomization is deterministic.
A very useful property for reproducing bugs.
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
Found while testing big-endian support. Basically, if littlefs is really
really unlucky, the block allocator could kick in while committing a
file's CTZ reference. If this happens, the block allocator will need to
traverse all CTZ skip-lists in memory, including the skip-list we're
committing. This means we can't convert the CTZ's endianness in place,
and need to make a copy on big-endian systems.
We rely on dead-code elimination from the compiler to make the
conditional behaviour for big-endian vs little-endian system a noop
determined by the lfs_tole32 intrinsic.
In looking at the common CRC APIs out there, this seemed the most
common. At least more common than the current modified-in-place pointer
API. It also seems to have a slightly better code footprint. I'm blaming
pointer optimization issues.
One downside is that lfs_crc can't report errors, however it was already
assumed that lfs_crc can not error.
This was mostly tweaking test cases to be accommodating for variable
sized superblock-lists. Though there were a few bugs that needed fixing:
- Changed compact to use source dir for move since the original dir
could have changed as a result of an expand.
- Created copy of current directory so we don't overwrite ourselves
during an internal commit update.
Also made sure all of the test suites provide reproducable results when
ran independently (the entry tests were behaving differently based on
which tests were ran before).
(Some where legitimate test failures)
In v1, littlefs didn't trust blocks that were been previously erased and
conservatively erased any blocks before writing to them. This was a part
of the design since the beginning because of the complexity of managing
erased blocks when we can lose power at any time.
However, we theoretically could keep track of files that have been
properly erased by marking them with an "erased bit". A file marked this
way could be opened and appended to without needing to COW the last
block. The requirement would be that the "erased bit" is cleared during
a write, since a power-loss would require that littlefs no longer trust
the erased state of the file.
This commit just shuffles the struct types around to make space for an
"erased bit" in the struct type field to be added in the future. This
ordering also makes more sense, since there will likely be more file
representations than directory representations on disk.
Expanding superblocks has been on my wishlist for a while. The basic
idea is that instead of maintaining a fixed offset blocks {0, 1} to the
the root directory (1 pointer), we maintain a dynamically sized
linked-list of superblocks that point to the actual root. If the number
of writes to the root exceeds some value, we increase the size of the
superblock linked-list.
This can leverage existing metadata-pair operations. The revision count for
metadata-pairs provides some knowledge on how much wear we've put on the
superblock, and the threaded linked-list can also be reused for this
purpose. This means superblock expansion is both optional and cheap to
implement.
Expanding superblocks helps both extremely small and extremely large filesystem
(extreme being relative of course). On the small end, we can actually
collapse the superblock into the root directory and drop the hard requirement
of 4-blocks for the superblock. On the large end, our superblock will
now last longer than the rest of the filesystem. Each time we expand,
the number of cycles until the superblock dies is increased by a power.
Before we were stuck with this layout:
level cycles limit layout
1 E^2 390 MiB s0 -> root
Now we expand every time a fixed offset is exceeded:
level cycles limit layout
0 E 4 KiB s0+root
1 E^2 390 MiB s0 -> root
2 E^3 37 TiB s0 -> s1 -> root
3 E^4 3.6 EiB s0 -> s1 -> s2 -> root
...
Where the cycles are the number of cycles before death, and the limit is
the worst-case size a filesystem where early superblock death becomes a
concern (all writes to root using this formula: E^|s| = E*B, E = erase
cycles = 100000, B = block count, assuming 4096 byte blocks).
Note we can also store copies of the superblock entry on the expanded
superblocks. This may help filesystem recover tools in the future.
This is a downside caused by relying on and external repo for testing,
but also storing the CI configuration inside this repo. Fortunately we
can use a temporary v2-alpha branch in the FUSE repo mirroring the
v2-alpha branch for testing.
LFS_ERR_CORRUPT is unfortunately not a well defined error code. It's
very important in the context of littlefs, but missing from the standard
error codes defined in Linux.
After some discussions with other developers, it was encouraged to use
the encoding for EILSEQ over EBADE for representing on disk corrupt, as
EILSEQ implies that there is something wrong with the data.
I've changed this now to take advantage of the breaking changes in v2 to
avoid a risky change to a return value.
Because of limitations in how littlefs manages attributes on disk,
littlefs views zero-length attributes and missing attributes as the same
thing. The simpliest implementation of attributes mirrors this behaviour
transparently for the user.
State stealing is a tricky part of managing the xored-globals. When
removing a metadata-pair from the metadata chain, whichever
metadata-pair does the removing is also responsible for stealing the
removed metadata-pair's global delta and incorporating it into it's own
global delta. Otherwise the global state would become corrupted.
- Updated documentation where needed
- Added asserts which take into account relationships with the new
cache_size configuration
- Restructured ordering to be consistent for the three main
configurables: LFS_ATTR_MAX, LFS_NAME_MAX, and LFS_INLINE_MAX
The introduction of an explicit cache_size configuration allows
customization of the cache buffers independently from the hardware
read/write sizes.
This has been one of littlefs's main handicaps. Without a distinction
between cache units and hardware limitations, littlefs isn't able to
read or program _less_ than the cache size. This leads to the
counter-intuitive case where larger cache sizes can actually be harmful,
since larger read/prog sizes require sending more data over the bus if
we're only accessing a small set of data (for example the CTZ skip-list
traversal).
This is compounded with metadata logging, since a large program size
limits the number of commits we can write out in a single metadata
block. It really doesn't make sense to link program size + cache
size here.
With a separate cache_size configuration, we can be much smarter about
what we actually read/write from disk.
This also simplifies cache handling a bit. Before there were two
possible cache sizes, but these were rarely used. Note that the
cache_size is NOT written to the superblock and can be freely changed
without breaking backwards compatibility.
There wasn't much use (and inconsistent compiler support) for storing
small values next to the unaligned lfs_global_t struct. So instead, I've
rounded the struct up to the nearest word to try to take advantage of
the alignment in xor and memset operations.
I've also moved the global fetching into lfs_mount, since that was the
only use of the operation. This allows for some variable reuse in the
mount function.
This is an effort to try to consolidate the handling of in-flight files
and dirs opened by the user (and possibly opened internally). Both files
and dirs have metadata state that need to be kept in sync by the commit
logic.
This metadata state is mostly contained in the lfs_mdir_t type, which is
present in both the lfs_file_t and lfs_dir_t. Unfortunately both of
these structs have some relatively unrelated metadata that needs to be
kept in sync:
- Files store an id representing the open file
- Dirs store an id during iteration
While these take up the same space, they unfortunately need to be
managed differently by the commit logic.
The best solution I can come up with is to simple store a general
purpose list and tag both structures with LFS_TYPE_REG and LFS_TYPE_DIR
respectively. This is kinda funky, but wins out over duplicated the
commit logic.
Other than removed outdated TODOs, there are several tweaks:
- Standardized naming of fs-level functions (mostly internal names)
- Tweaked low-level use of subtype to hopefully take advantage of
redundant code removal
- Moved root-handling into lfs_dir_getinfo
- Updated DEBUG statements around move/orphan fixes
- Removed trailing 1s in type fields
- Removed unused code
Unfortunately for us, even with the new ability to store global state,
orphans can not be handled as gracefully as moves. This is due to the
fact that directory operations can create an unbounded number of
orphans. It's usually small, the fact that it's unbounded means we can't
store the orphan info in xored-globals.
However, one thing we can do to leverage the xored-global state is store
a bit indicating if _any_ orphans are present. This means in the common
case we can completely avoid the deorphan step, while only using a
single bit of the global state, which is effectively free since we can
store it in the globals tag itself.
If a littlefs drive does not want to consider the orphan bit, it's free
to use the previous behaviour of always checking for orphans on first
write.
Result of testing on zero-granularity blocks, where the prog size and
read size equals the block size. This represents SD cards and other
traditional forms of block storage where we don't really get a benefit
from the metadata logging.
Unfortunately, since updates in both are tested by the same script,
we can't really use simple bash commands. Added a more complex
script to simulate corruption. Fortunately this should be more robust
than the previous solutions.
The main fixes were around corner cases where the commit logic fell
apart when it didn't have room to complete commits, but these were
fixable in the current design.
The main change here was to drop the in-place twiddling of custom
attributes to match the internal attribute structures. The original
thought was that this could allow the compiler to garbage collect more
of the custom attribute logic when not used, but since this occurs in
the common lfs_file_opencfg function, gc can't really happen.
Not twiddling the user's structure is the polite thing to do, opens up
the ability to store the lfs_attr structure in ROM, and avoids surprising
the user if they attempt to use the structure for their own purposes.
This means we can make the lfs_attr structure const and rely on the list
in the lfs_file_config structure, similar to how we rely on the global
lfs_config structure.
Some other tweaks:
- Dropped the global file_buffer, replaced entirely by per-file buffers.
- Updated LFS_INLINE_MAX and LFS_ATTR_MAX to correct values
- Added workaround for compiler bug related to zero initializer:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
This is a very minor thing but it has been bugging me. On one hand, all
a callback ever needs is a single pointer for context. On the other
hand, you could make the argument that in the context of littlefs, the
lfs_t struct represents global state and should always be available to
callbacks passed to littlefs.
In the end I'm sticking with only a single context pointer, since this
is satisfies the minimum requirements and has the highest chance of
function reuse. If a user needs access to the lfs_t struct, it can be
passed by reference in the context provided to the callback.
This also matches callbacks used in other languages with more emphasis
on objects and classes. Usually the callback doesn't get a reference to
the caller.
The main thing to consider was how lfs_dir_fetchwith reacts to
corruption it finds and to make sure falling back to old values works
correctly.
Some of the tricky bits involved making sure we could fall back to both old
commits and old metadata blocks while still handling things like
synthetic moves correctly.
The main issue here was that the old orphan test relied on deleting the
block that contained the most recent update. In the new design this
doesn't really work since updates get appended to metadata-pairs
incrementally.
This is fixed by instead using the truncate command on the appropriate
block. We're now passing orphan tests.
Now that littlefs has been rebuilt almost from the ground up with the
intention to support custom attributes, adding in custom attribute
support is relatively easy.
The highest bit in the 9-bit type structure indicates that an attribute
is a user-specified custom attribute. The user then has a full 8-bits to
specify the attribute type. Other than that, custom attributes are
treated the same as system-level attributes.
Also made some tweaks to custom attributes:
- Adopted the opencfg for file-level attributes provided by dpgeorge
- Changed setattrs/getattrs to the simpler setattr/getattr functions
users will probably be more familiar with. Note that multiple
attributes can still be committed atomically with files, though not
with directories.
- Changed LFS_ATTRS_MAX -> LFS_ATTR_MAX since there's no longer a global
limit on the sum of attribute sizes, which was rather confusing.
Though they are still limited by what can fit in a metadata-pair.
Updated to account for changes as a result of commits/compacts. And
changed instances of iteration over both files and dirs to use a single
nested loop.
This does rely implicitly on the structure layout of dirs/files and
their location in lfs_t, which isn't great. But it gets the job done
with less code duplication.
This was a surprisingly tricky issue. One of the subtle requirements for
the new move handling to work is that the block containing the move does
not change until the move is resolved. Initially, this seemed easy to
implement, given that a move is always immediately followed by its
resolution.
However, the extra metadata-pair operations needed to maintain integrity
present a challenge. At any commit, a directory block may end up moved
as a side effect of relocation due to a bad block.
The fix here is to move the move resolution directly into the commit
logic. This means that any commit to a block containing a move will be
implicitly resolved, leaving the later attempt at move resolution as a
noop.
This fix required quite a bit of restructuring, but as a nice
side-effect some of the complexity around moves actually went away.
Additionally, the new move handling is surprisingly powerful at
combining moves with nearby commits. And we now get same-metadata-pair
renames for free! A win for procrasination on that minor feature.
Restrctured function organization to make a bit more sense, and made
some small refactoring tweaks, specifically around the commit logic and
global related functions.
The biggest change here is to make littlefs less obsessed with the
lfs_mattr_t struct. It was limiting our flexibility and can be entirely
replaced by passing the tag + data explicitly. The remaining use of
lfs_mattr_t is specific to the commit logic, where it replaces the
lfs_mattrlist_t struct.
Other changes:
- Added global lfs_diskoff struct for embedding disk references inside
the lfs_mattr_t.
- Reordered lfs_mattrlist_t to squeeze out some code savings
- Added commit_get for explicit access to entries from unfinished
metadata-pairs
- Parameterized the "stop_at_commit" flag instead of hackily storing it
in the lfs_mdir_t temporarily
- Changed return value of lfs_pred to error-only with ENOENT representing
a missing predecessor
- Adopted const where possible
One neat (if gimmicky) trick, is that each tag has a valid bit in the
highest bit position of the 32-bit word. This is used to determine when
to stop a fetch operation, but after fetch, the bit is free to use in
the driver. This means we can create a typed-union of sorts with error
codes and tags, returning both as the return value from a function.
Say what you will about this trick, it does have a significant impact on
code size. I suspect this is primarily due to the compiler having a hard
time optimizing around pointer access.
While it makes sense to reuse as many code paths as possible, it turns
out that the logic behind the traversal of littlefs's metadata-pairs is
so simple that it's actually cheaper to duplicate the traversal code
where needed.
This means instead of the code path move -> traverse -> movescan -> get
-> traverse -> getscan, we can use the relatively flatter code path of
move -> get.
Tags offer all of the necessary info from the find functions (which
makes sense, this is the structure that stores the info on disk).
Passing around a single tag instead of separate id and type fields
simplifies the internal functions while leverages the tag's compactness.
Unfortunately, the three different sets of get functions were not
contributing very much, and having three different get functions means
we may be wasting code on redundant code paths.
By dropping the user of the lfs_mattr_t struct in favor of a buffer, we
can combine the three code paths with a bit of tweaking.
Now that littlefs's fetchwith operations have stabilized a bit, there's
actually only a single fetchwith operation, the findscan function.
Given that there's no need for the internal functions to be a forward
compatible API, we can integrate the findscan behaviour directly into
fetchwith and avoid the (annoyingly) costly generalization overhead.
As an added benefit, we can easily add additional tag modifications
during fetch, such as the synthetic moves needed to resolve in-flight
move operations without disk modifications.
An interesting observation about the find and parent scanning functions
is that at their core, they're both actually doing the same operation.
They search a metadata-pair during fetch for an entry uses the entry data
instead of the entry tag. This means we can combine these functions and
get a decent code savings.
It's a little bit trickier because pair ordering isn't guaranteed. But
to work around that we can simply search for both pair orderings. It's a
bit more expensive but may be worth the code savings. A fancier
implementation in the future can avoid the 2x lfs_parent scans.
I've been trying to keep tag types organized with an encoding that hints
if a tag uses its id field for file ids. However this seem to have been
a mistake. Using a null id of 0x3ff greatly simplified quite a bit of
the logic around managing file related tags.
The downside is one less id we can use, but if we look at the encoding
cost, donating one full bit costs us 2^9 id permutations vs 1 id
permutation. So even if we had a perfect encoding it's in our favor to
use a null id. The cost of null ids is code size, but with the
complexity around figuring out if a type used it's id or not it just
works out better to use a null id.
Recall that the 32-bit tag structure contains a 9-bit type. The type
structure then decomposes into a bit more information:
[--- 9 ---]
[1|- 4 -|- 4 -]
^ ^ ^- specific type
| \------- subtype
\----------- user bit
The main change is an observation from moving type info to the name tag
from the struct tag. Since we don't need the type info in the struct
tag, we can significantly simplify the type structure.
Originally, I had type info encoded in the struct tag. This initially
made sense because the type info only directly impacts the struct tag.
However this was a case of focusing too much on the details instead of
the bigger picture.
A more file operations need to figure out the type of a file, but it's
only actually a small number of file operations that need to interact
with the file's structure. For the common case, providing the type of
the file early shortens operations by a full tag access.
Additionally, but storing the type in the file name tag, this opens up
the struct tag to use those bits for storing more struct descriptions.
The introduction of xored-globals required quite a bit of work to
integrate. But now that that is working, we can strip out the old move
logic.
It's worth noting that the xored-globals integration with commits is
relatively complex and subtle.
The issue lies in the reuse of the id field for globals. Before globals,
the only tags with a non-null (0x3ff) id field were names, structs, and
other file-specific metadata. But globals are also using this field for
the indirect delete, since otherwise the globals structure would be very
unaligned (74-bits long).
To make matters worse, the id field for globals contains the delta used
to reconstruct the globals at mount time. Which means the id field could
take on very absurd values and break the dir fetch logic if we're not
careful.
Solution is to use the scope portion of the type field where necessary,
although unforunately this does add some code cost.
Unfortunately, the behaviour needed of lfs_dir_fetchwith is as subtle as
it is important. When fetching from a block corrupted by power-loss,
lfs_dir_fetch must be able to rewind any state it picks up to before the
corruption. This is not limited to the directory state, but includes
find results and other side-effects.
This gets a bit complicated when trying to generalize littlefs's
fetchwith mechanics. Being able to scan a directory block during a fetch
greatly impacts the runtime of littlefs operations, but if the state is
generic how do we know what to rollback to?
The fix here is to leave the management of rolling back state to the
fetchwith match functions, and transparently pass a CRC tag to indicate
the temporary state can be saved.
Originally I tried to reuse the indirect delete to accomplish truely
atomic directory removes, however this fell apart when it came to
implementing directory removes as a side-effect of renames.
A single indirect-delete simply can't handle renames with removes as
a side effects. When copying an entry to its destination, we need to
atomically delete both the old entry, and the source of our copy. We
can't delete both with only a single indirect-delete. It is possible to
accomplish this with two indirect-deletes, but this is such an uncommon
case that it's really not worth supporting efficiently due to how
expensive globals are.
I also dropped indirect-deletes for normal directory removes. I may add
it back later, but at the moment it's extra code cost for that's not
traveled very often.
As a result, restructured the indirect delete handling to be a bit more
generic, now with a multipurpose lfs_globals_t struct instead of the
delete specific lfs_entry_t struct.
Also worked on integrating xored-globals, now with several primitive
global operations to manage fetching/updating globals on disk.
32-bit tag structure:
[--- 32 ---]
[1|- 9 -|- 10 -|-- 12 --]
^ ^ ^ ^- entry length
| | \--------- file id
| \--------------- tag type
\------------------- valid
In this tag, the type decomposes into some more information:
[--- 9 ---]
[1|- 2 -|- 3 -|- 3 -]
^ ^ ^ ^- struct
| | \------- type
| \------------- scope
\----------------- user
The change in this encoding is the addition of a global scope:
LFS_SCOPE_STRUCT = 0 00 xxx xxx
LFS_SCOPE_ENTRY = 0 01 xxx xxx
LFS_SCOPE_DIR = 0 10 xxx xxx
LFS_SCOPE_FS = 0 11 xxx xxx
LFS_SCOPE_USER = 1 xx xxx xxx
This was a big roadblock for a while: with the new feature of inlined
files, the existing move logic was fundamentally flawed.
To pull off atomic moves between two different metadata-pairs, littlefs
uses a simple, if a bit clumsy trick.
1. Marks entry as "moving"
2. Copies entry to new metadata-pair
3. Deletes old entry
If power is lost before the move operation is completed, we will find the
"moving" tag. This means there may or may not be an incomplete move on
the filesystem. In this case, we simply search for the moved entry, if
we find it, we remove the old entry, otherwise we just remove the
"moving" tag.
This worked perfectly, until we introduced inlined files. See, unlike
the existing directory and ctz entries, inlined files have no guarantee
they are unique. There is nothing we can search for that will allow us
to find a moved file unless we assign entries globally-unique ids. (note
that moves are fundamentally rename operations, so searching for names
does not make sense).
---
Solving this problem required completely restructuring how littlefs
handled moves and pulled out a really old idea that had been left in the
cutting room floor back when littlefs was going through many
designs: xored-globals.
The problem xored-globals solves is the need to maintain some global state
via commits to these distributed, independent metadata-pairs. The idea
is that we can use some sort of symmetric operation, such as xor, to
introduces deltas of the global state that can be committed atomically
along with any other info to these metadata-pairs.
This means that to figure out our global state, we xor together the global
delta stored in every metadata-pair.
Which means any commit can update the global state atomically, opening
up a whole new set atomic possibilities.
There is a couple of downsides. These globals may end up with deltas on
every single metadata-pair, effectively duplicating the data for each
block. Additionally, these globals need to have multiple copies in RAM.
This means and globals need to be a bounded size and very small, since even
small globals will have a large footprint.
---
On top of xored-globals, it's trivial to fix our move logic. Here we've
added an indirect delete tag which allows us to atomically specify a
delete of any entry on the filesystem.
Our move operation is now:
1. Copy entry to new metadata-pair and atomically xor globals to
indirectly delete our original entry.
2. Delete the original entry and xor globals to remove the indirect
delete.
Extra exciting is that this now takes our relatively clumsy move
operation into a sexy guaranteed O(1) move operation with no searching
necessary (though we do need to xor globals during mount).
Also reintroduced entry struct, now with a specific purpose to describe
the metadata-pair + id combo needed by indirect deletes to locate an
entry.
Unfortunately, it's hard to make directory lookups for root not a
special case, and we don't like special cases when we're trying to keep
code size small.
Since there are a handful of code paths where opening root should return
EISDIR (such as lfs_file_open("/")), using EISDIR to note that the
argument is in fact a path to the root.
This is needed because we no longer look up an entries contents in
lfs_dir_find for free, since entries are more ephemeral.
Similarly to the internal "meta-attributes", I was finding quite a bit
of need for an internal structure that mirrors the user-facing directory
structure for when I need to do an operation on a metadata-pair, but
don't need all of the state associated with a fully iterable directory
chain.
lfs_mdir_t - meta-directory, describes a single metadata-pair
lfs_dir_t - directory, describes an iterable directory chain
While it may seem complex to have all these structures lying around,
they only complicate the code at compile time. To the machine, any
number of nested structures all looks the same.
Attributes are used to describe more than just entries, so calling these
list of attributes "entries" was inaccurate. However, the name
"attributes" would conflict with "user attributes", user-facing
attributes with a very similar purpose. "user attributes" must be kept
distinct due to differences in binary layout (internal attributes can
use a more compact tag+buffer representation, but expecting users to
jump through hoops to get their data to look like that isn't very
user-friendly).
Decided to go with "mattr" as shorthand for "meta-attributes", similar
to "metadata".
I've found using temporary names to duplicate functions temporarily
(lfs_dir_commit + lfs_dir_commit_) is a great way to introduce sweeping
changes while keeping the code base functional and (mostly) passing
tests.
It does mean at some point I need to deduplicate all these functions.
lfs_dir_fetchwith did not recover from failed dir fetches correctly,
added a temporary dir variable to hold dir contents while being
populated, allowing us to fall back to a known good dir state if a
commit is corrupted.
There is a RAM cost, but the upside is that our lfs_dir_fetchwith
actually works.
Also added better handling of move ids during some get functions.
Integration with the new journaling metadata has now progressed to the
point where all of the basic functionality tests are passing. This
includes:
- test_format
- test_dirs
- test_files
- test_seek
- test_truncate
- test_interspersed
- test_paths
Some of the fixes:
- Modified move to correctly change entry ids
- Called lfs_commit_move directly from compact, avoiding commit parsing
logic during a compact
- Opened up commit filters to be passed down from compact for moves
- Added correct drop logic to lfs_dir_delete
- Updated lfs_dir_seek to use ids instead of offsets
- Caught id updates manually where possible (this needs to be fixed)
Now with some tweaks to commit/compact, and a committers for entrylists and
moves specifically. No longer relying on a commitwith callback, the
types of commits are now infered from their tags.
This means we can now commit things atomically with special commits,
such as moves. Now lfs_rename can move entries to new names correctly.
Passing more tests now with the journalling change, but still have more
work to do.
The most humorous bug was a bug where during the three step move
process, the entry move logic would dumbly copy over any tags associated
with the moving entry, including the tag used to temporarily mark the
entry as "moving".
Also combined dir and commit traversal using a "stop_at_commit" flag in
directory struct as a short-term hack to combine the code paths.
Also refactored lfs_dir_compact a bit, adding begin and end as arguments
since they simplify a bit of the logic and can be found out much easier
earlier in the commit logic.
Also changed add -> append and drop -> delete and cleaned up some of the
logic around there.
This was the simpler part of transitioning since file operations only
interact with metadata at sync time.
Also switched from array to linked-list of entries.
- Integrated into lfs_file_t_, duplicating functions where necessary
- Added lfs_dir_fetchwith_ as common parent to both lfs_dir_fetch_ and
lfs_dir_find_
- Added similar parent with lfs_dir_commitwith_
- Made matching find/get operations with getbuffer/getentry and
findbuffer/findentry
- lfs_dir_alloc now populates tail, since almost all directory block
allocations need to populate tail
- Integrated journaling into lfs_dir_t_ struct and operations,
duplicating functions where necessary
- Added internal lfs_tag_t and lfs_stag_t
- Consolidated lfs_region and lfs_entry structures
This is a big change stemming from the fact that resizable entries
were surprisingly complicated to implement and came in with a sizable
code cost.
The theory is that the journalling has a comparable cost to resizable
entries. Both need to handle overflowing blocks, and managing offsets is
comparable to managing attribute IDs. But by jumping all the way to full
journaling, we can statically wear-level the metadata written to
metadata pairs.
The idea of journaling littlefs's metadata has been mentioned several times in
discussions and fits well into how littlefs works. You could even view the
existing metadata log as a log of size 2.
The downside of this approach is that changing the metadata in this way
would break compatibility from the existing layout on disk. Something
that resizable entries does not do.
That being said, adopting journaling at the metadata layer offers a big
improvement to littlefs's performance and wear-leveling, with very
little cost (maybe even none or negative after resizable entries?).
This has existed for some time in the form of the lfs_traverse
function, through which a user could provide a simple callback that
would just count the number of blocks lfs_traverse finds. However,
this approach is relatively unconventional and has proven to be confusing
for most users.
In the form of lfs_file_setattr, lfs_file_getattr, lfs_fs_setattr,
lfs_fs_getattr.
This enables atomic updates of custom attributes as described in
6c754c8, and provides a custom attribute API that allows custom attributes
to be stored on the filesystem itself.
Mostly just removed LFS_FROM_DROP and changed the DSL grammar a bit to
allow drops to occur naturally through oldsize -> newsize diff expressed
in the region struct. This prevents us from having to add a drop every
time we want to update an entry in-place.
Although it's simple and probably what most users expect, the previous
custom attributes API suffered from one problem: the inability to update
attributes atomically.
If we consider our timestamp use case, updating a file would require:
1. Update the file
2. Update the timestamp
If a power loss occurs during this sequence of updates, we could end up
with a file with an incorrect timestamp.
Is this a big deal? Probably not, but it could be a surprise only found
after a power-loss. And littlefs was developed with the _specifically_
to avoid suprises during power-loss.
The littlefs is perfectly capable of bundling multiple attribute updates
in a single directory commit. That's kind of what it was designed to do.
So all we need is a new committer opcode for list of attributes, and
then poking that list of attributes through the API.
We could provide the single-attribute functions, but don't, because the
fewer functions makes for a smaller codebase, and these are already the
more advanced functions so we can expect more from users. This also
changes semantics about what happens when we don't find an attribute,
since erroring would throw away all of the other attributes we're
processing.
To atomically commit both custom attributes and file updates, we need a
new API, lfs_file_setattr. Unfortunately the semantics are a bit more
confusing than lfs_setattr, since the attributes aren't written out
immediately.
A much requested feature (mostly because of littlefs's notable lack of
timestamps), this commits adds support for user-specified custom
attributes.
Planned (though underestimated) since v1, custom attributes provide a
route for OSs and applications to provide their own metadata in
littlefs, without limiting portability.
However, unlike custom attributes that can be found on much more
powerful PC filesystems, these custom attributes are very limited,
intended for only a handful of bytes for very important metadata. Each
attribute has only a single byte to identify the attribute, and the
size of all attributes attached to a file is limited to 64 bytes.
Custom attributes can be accessed through the lfs_getattr, lfs_setattr,
and lfs_removeattr functions.
One of the big benefits of inline files is that small files no longer need to
take up a full block. This opens up an opportunity to provide much better
support for storage devices with only a handful of very large blocks. Such as
the internal flash found on most microcontrollers.
After investigating some use cases for a filesystem on internal flash,
it has become apparent that the 255-byte limit is going to be too
restrictive to be useful in many cases. Most uses I found needed files
~4-64 bytes in size, but it wasn't uncommon to find files ~512 bytes in
length.
To try to remedy this, I've pushed the 255 byte limit up to 1023 bytes,
by stealing some bits from the previously-unused attributes's size.
Unfortunately this limits attributes to 63 bytes in total and has a
minor code cost, but I'm not sure even 1023 bytes will be sufficient for
a lot of cases.
The littlefs will probably never be as efficient with internal flash as
other filesystems such as SPIFFS, it just wasn't designed for this sort of
limited geometry. However, this feature has been heavily requested, even
with limitations, because of the opportunity for code reuse on
microcontrollers with both internal and external flash.
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
This move was surprisingly complex, but offers the ultimate opportunity for
code reuse in terms of resizable entries. Instead of needing to provide
separate functions for adding and removing entries, adding and removing
entries can just be viewed as changing an entry's size to-and-from zero.
Unfortunately, it's not _quite_ that simple, since append and remove
hide some relatively complex operations for when directory blocks
overflow or need to be cleaned up.
However, with enough shoehorning, and a new committer type that allows
specifying recursive commit lists (is this now a push-down automata?),
it does seem to be possible to shove all of the entry update logic into
a single function.
Sidenote, I switched back to an enum-based DSL, since the addition of a
recursive region opcode breaks the consistency of what needs to be
passed to the DSL callback functions. It's much simpler to handle each
opcode explicitly inside a recursive lfs_commit_region function.
Making the superblock look like "just another entry" allows us to treat
the superblock like "just another entry" and reuse a decent amount of
logic that would otherwise only be used a format and mount time. In this
case we can use append to write out the superblock like it was creating
a new entry on the filesystem.
Now when a file overflows the max inline file size, it will be correctly
written out to a proper block. Additionally, tweaked corner cases around
inline file, however this still needs significant testing.
A real neat part that surprised me is that littlefs _already_ contains
the logic for writing out inline files: in lfs_file_relocate! With a bit
of tweaking, littlefs can pull off both the overflow from inline to
normal files _and_ the relocating of bad blocks in files with the same
piece of logic.
Proof-of-concept implementation of inline files that stores the file's
content directly in its parent's directory pair.
Inline files are indicated by a different type stored in an entry's
struct field, and take advantage of resizable entries. Where a normal
file's entry would normally hold the reference to the CTZ skip-list, an
inline file's entry contains the contents of the actual file.
Unfortunately, storing the inline file on disk is the easy part. We also
need to manage inline files in the internals of littlefs and provide the
same operations that we do on normal files, all while reusing as much
code as possible to avoid a significant increase in code cost.
There is a relatively simple, though maybe a bit hacky, solution here. If a
file fits entirely in a cache line, the file logic never actually has to go to
disk. This means we can just give the file a "pretend" block (hopefully
one that would assert if ever written to), and carry out file operations
as normal, as long as we catch the file before it exceeds the cache line
and write out the file to an actual disk.
The size field is redundant, since an entry's size can be determined
from the nlen+elen+alen+4. However, as you may have guessed from that
expression, calculating the size this way is a bit roundabout and
inefficient. Despite its redundancy, it's cheaper to store the size in the
entry, though with a minor RAM cost.
Note, extra care must now be taken to make sure these size and len fields
don't fall out of sync.
Tweaked the commit callback to pass the arguments for from-memory
commits explicitly, with non-from-memory commits still being able to
hijack the opaque data pointer for additional state.
The from-memory commits make up the vast majority of commits in
littlefs, so this small change has a noticable impact.
Before, when appending new entries to a directory, we try to find empty space
in the last block of a directory chain. This has a nice side-effect that
the order of directory entries is maintained. However, this isn't strictly
necessary.
We're already scanning the directory chain in order, so other than changes to
directory order, there's no downside to taking advantage of any free
space we come across.
Now, instead of passing an enum for mem/disk commits, we pass a function
pointer that can specify any behaviour.
This has the benefit of opening up the possibility to pass any sort of
commit logic to the committers, and unused logic can be garbage-collected
by the compiler if unused. The downside is that unfortunately compilers have
a harder time optimizing around functions pointers than enums, and
fitting the state into structs for the callbacks may be costly.
Before, tags were implicitly updated by the dir update functions, which
have a strong understanding of the entry struct. However, most of the
time the tag was already a part of the entry struct being committed.
By making tag updates explicit, this does add cost to commits that
now have to pass tag updates explicitly, but it reduces cost where that
tag and entry update can be combined into one commit region.
It also simplifies the dir update functions.
Now, with the off, diff, and len parameters in each commit entry, we can build
up directory commits that resize entries. This adds complexity but opens
up the directory blocks to be much more flexible.
The main concern is that resizing entries can push around neighboring entries
in surprising ways, such as pushing them into new directory blocks when a
directory splits. This can break littlefs's internal logic in how it tracks
in-flight entries. The most problematic example being open files.
Fortunately, this is helped by a global linked-list of all files and
directories opened by the filesystem. As entries change size, the state
of open files/dirs may be updated as needed. Note this already needed to
exist for the ability to remove files/dirs, which has the same issue.
Expiremental implementation. This opens up the opportunity to use the same
commit description for both commits and appends, which effectively do the same
thing.
This should lead to better code reuse.
Really all this means is that the internal commit function was changed
from taking an array of "commit structures" to a linked-list of "commit
structures". The benefit of a linked-list is that layers of commit
functions can pull off some minor modifications to the description of
the commit. Most notably, commit functions can add additional entries
that will be atomically written out and CRCed along with the initial
commit.
Also a minor benefit, this is one less parameter when committing a
directory with zero entries.
Previously, commits could only come from memory in RAM. This meant any
entries had to be buffered in their entirety before they could be moved
to a different directory pair. By adding parameters for specifying
commits from existing entries stored on disk, we allow any sized entries
to be moved between directory pairs with a fixed RAM cost.
The separation of data-structure vs entry type has been implicit for a
while now, and even taken advantage of to simplify the traverse logic.
Explicitely separating the data-struct and entry types allows us to
introduce new data structures (inlined files).
Before, release notes with a list of changes were created every
patch release. Unfortunately, it looks like this will create a lot of
noise on github, with a notification every patch release, which may be
as often as every time a PR is merged.
Rather than creating all of this noise for relatively uninteresting
changes, the script will now stick to simple tags, and create the
release notes only on minor releases.
I think this is what several of you were originally suggesting,
sorry about the journey, at least I learned a lot.
Fetching all tags was triggering the pagination system inside the github
API. This prevent version tags from being found.
Modified to use the version tag prefix in the ref lookup, however this
still may cause an issue if there are still enough patch releases to trigger
pagination.
Simpleish solution is to grab the link header to jump to the last page,
since pagination results appear to be in sorted order.
Normally, the linked-list of directory pairs should terminate at a null
pointer. However, it is possible if the filesystem is corrupted, that
that this linked-list forms a cycle.
This should never happen with littlefs's power resilience, but if it does
we should recover appropriately.
Modified lfs_deorphan to notice if we have a cycle and return
LFS_ERR_CORRUPT in that situation.
Found by kneko715
The lfs_cache_zero function that was recently added assumed a single cache
size, which is incorrect. This would cause a buffer overflow if
read_size != prog_size.
Since lfs_cache_zero is only used for scrubbing prog caches, the fix
here is to use lfs_cache_drop instead on read caches. Info in read
caches should never make its way to disk.
Found by nstcl
Previously, littlefs had mutable versions. That is, anytime a new commit
landed on master, the bot would update the most recent version to
contain the patch. The idea was that this would make sure users always
had the most recent bug fixes. Immutable snapshots could be accessed
through the git hashes.
However, at this point multiple developers have pointed out that this is
confusing, with mutable versions being non-standard and surprising.
This new release process adopts SemVer in its entirety, with
incrementing patch numbers and immutable versions.
When a new commit lands on master:
1. The major/minor version is taken from lfs.h
2. The most recent patch version is looked up on GitHub and incremented
3. A changelog is built out of the commits to the previous version
4. A new release is created on GitHub
Additionally, any commits that land while CI is still running are
coalesced together. Which means multiple PRs can land in a single
release.
The optional config structure options up the possibility of adding
file-level configuration in a backwards compatible manner.
Also adds possibility to open multiple files with LFS_NO_MALLOC
enabled thanks to dpgeorge
Also bumped minor version to v1.5
Before this, littlefs incorrectly assumed corrupt blocks were only the result
of our own modification. This would be fine for most cases of freshly
erased storage, but for storage with block-level ECC this wasn't always
true.
Fortunately, it's quite easy for littlefs to handle this case correctly,
as long as corrupt storage always reports that it is corrupt, which for
most forms of ECC is the case unless we perform a write on the storage.
found by rojer
When using "%d" or "%x" with uint32_t types, arm-none-eabi-gcc reports
warnings like below:
-- >8 -- >8 -- >8 -- >8 -- >8 -- >8 --
In file included from lfs.c:8:
lfs_util.h:45:12: warning: format '%d' expects argument of type 'int', but argument 4 has type 'lfs_block_t' {aka 'long unsigned int'} [-Wformat=]
printf("lfs debug:%d: " fmt "\n", __LINE__, __VA_ARGS__)
^~~~~~~~~~~~~~~~
lfs.c:2512:21: note: in expansion of macro 'LFS_DEBUG'
LFS_DEBUG("Found partial move %d %d",
^~~~~~~~~
lfs.c:2512:55: note: format string is defined here
LFS_DEBUG("Found partial move %d %d",
~^
%ld
-- >8 -- >8 -- >8 -- >8 -- >8 -- >8 --
Fix this by replacing "%d" and "%x" with `"%" PRIu32` and `"%" PRIx32`.
This was causing code sizes to be reported with several of the logging
functions still built in. A useful number, but not the minimum
achievable code size.
As a shortcut, littlefs never bother to zero any of the buffers is used.
It didn't need to because it would always write out the entirety of the
data it needed.
Unfortunately, this, combined with the extra padding used to align
buffers to the nearest prog size, would lead to uninitialized data
getting written out to disk.
This means unrelated file data could be written to different parts of
storage, or worse, information leaked from the malloc calls could be
written out to disk unnecessarily.
found by rojer
1. Added check for main.c and test.c to decide compilation target
2. Added step to remove test.c after successful test completion
The test.c file, which contains the expanded test main, is useful when
debugging why tests are failing. However, keeping the test.c file around
causes problems when a later attempt is made to compile a larger project
containing the littlefs directory.
Under (hopefully) normal operation, tests always pass. So it should be ok
to remove the test.c file after a successful test run. Hopefully this
behaviour doesn't cause too much confusion for contributors using the
tests.
On the other side of things, compiling the library with no main ends
(successfully) with the "main not found" error message. By defaulting
to lfs.a if neither test.c/main.c is avoid this in the common cases
found by armijnhemel and Sim4n6
- Fixed shadowed variable warnings in lfs_dir_find.
- Fixed unused parameter warnings when LFS_NO_MALLOC is enabled.
- Added extra warning flags to CFLAGS.
- Updated tests so they don't shadow the "size" variable for -Wshadow
Opening multiple files simultaneously is not supported without dynamic memory,
but the previous behaviour would just let the files overwrite each other, which
could lead to bad errors down the line
found by husigeza
Paths such as the following were causing issues:
/tea/hottea/.
/tea/hottea/..
Unfortunately the existing structure for path lookup didn't make it very
easy to introduce proper handling in this case without duplicating the
entire skip logic for paths. So the lfs_dir_find function had to be
restructured a bit.
One odd side-effect of this is that now lfs_dir_find includes the
initial fetch operation. This kinda breaks the fetch -> op pattern of
the dir functions, but does come with a nice code size reduction.
As pointed out by davidefer, the lookahead pointer modular arithmetic
does not work around integer overflow when the pointer size is not a
multiple of the block count.
To avoid overflow problems, the easy solution is to stop trying to
work around integer overflows and keep the lookahead offset inside the
block device. To make this work, the ack was modified into a resetable
counter that is decremented every block allocation.
As a plus, quite a bit of the allocation logic ended up simplified.
One of the big simplifications in littlefs's implementation is the
complete lack of tracking free blocks, allowing operations to simply
drop blocks that are no longer in use.
However, this means the lookahead buffer can easily contain outdated
blocks that were previously deleted. This is usually fine, as littlefs
will rescan the storage if it can't find a free block in the lookahead
buffer, but after changes that caused littlefs to more conservatively
respect the alloc acks (e611cf5), any scanned blocks after an ack would
be incorrectly trusted.
The fix is to eagerly scan ahead in the lookahead when we allocate so
that alloc acks are better able to discredit old lookahead blocks. Since
usually alloc acks are tightly coupled to allocations of one or two blocks,
this allows littlefs to properly rescan every set of allocations.
This may still be a concern if there is a long series of worn out
blocks, but in the worst case littlefs will conservatively avoid using
blocks it's not sure about.
Found by davidefer
Like most of the lfs_dir_t functions, lfs_dir_append is responsible for
updating the lfs_dir_t struct if the underlying directory block is
moved. This property makes handling worn out blocks much easier by
removing the amount of state that needs to be considered during a
directory update.
However, extending the dir chain is a bit of a corner case. It's not
changing the old block, but callers of lfs_dir_append do assume the
"entry" will reside in "dir" after lfs_dir_append completes.
This issue only occurs when creating files, since mkdir does not use
the entry after lfs_dir_append. Unfortunately, the tests against
extending the directory chain were all made using mkdir.
Found by schouleu
Before this patch, when calling lfs_mkdir or lfs_file_open with root
as the target, littlefs wouldn't find the path properly and happily
run into undefined behaviour.
The fix is to populate a directory entry for root in the lfs_dir_find
function. As an added plus, this allowed several special cases around
root to be completely dropped.
Suggested by sn00pster, LFS_CONFIG is an opt-in user provided
configuration file that will override the util implementation in
lfs_util.h. This is useful for allowing system-specific overrides
without needing to rely on git merges or other forms of patching
for updates.
Using gcc cross compilers and qemu:
- make test CC="arm-linux-gnueabi-gcc --static -mthumb" EXEC="qemu-arm"
- make test CC="powerpc-linux-gnu-gcc --static" EXEC="qemu-ppc"
- make test CC="mips-linux-gnu-gcc --static" EXEC="qemu-mips"
Also separated out Travis jobs and added some size reporting
Note: It's still expected to modify lfs_utils.h when porting littlefs
to a new target/system. There's just too much room for system-specific
improvements, such as taking advantage of CRC hardware.
Rather, encouraging modification of lfs_util.h and making it easy to
modify and debug should result in better integration with the consuming
systems.
This just adds a bunch of quality-of-life improvements that should help
development and integration in littlefs.
- Macros that require no side-effects are all-caps
- System includes are only brought in when needed
- Malloc/free wrappers
- LFS_NO_* checks for quickly disabling things at the command line
- At least a little-bit more docs
Required to support big-endian processors, with the most notable being
the PowerPC architecture.
On little-endian architectures, these conversions can be optimized out
and have no code impact.
Initial patch provided by gmouchard
This helps significantly with supporting different compilers. Intrinsics for
different compilers can be added as they are found.
Note that for ARMCC, __builtin_ctz is not used. This was the result of a
strange issue where ARMCC only emits __builtin_ctz when passed the
--gnu flag, but __builtin_clz and __builtin_popcount are always emitted.
This isn't a big problem since the ARM instruction set doesn't have a
ctz instruction, and the npw2 based implementation is one of the most
efficient.
Also note that for littefs's purposes, we consider ctz(0) to be
undefined. This lets us save a branch in the software lfs_ctz
implementation.
Rather than tracking all in-flight blocks blocks during a lookahead,
littlefs uses an ack scheme to mark the first allocated block that
hasn't reached the disk yet. littlefs assumes all blocks since the
last ack are bad or in-flight, and uses this to know when it's out
of storage.
However, these unacked allocations were still being populated in the
lookahead buffer. If the whole block device fits in the lookahead
buffer, _and_ littlefs managed to scan around the whole storage while
an unacked block was still in-flight, it would assume the block was
free and misallocate it.
The fix is to only fill the lookahead buffer up to the last ack.
The internal free structure was restructured to simplify the runtime
calculation of lookahead size.
- Write on read-only file to return LFS_ERR_BADF
- Renaming directory onto file to return LFS_ERR_NOTEMPTY
- Changed LFS_ERR_INVAL in lfs_file_seek to assert
- no previous prototype for ‘test_assert’
- no previous prototype for ‘test_count’
- unused parameter ‘b’ in test_count
- function declaration isn’t a prototype for main
The most useful part of -Werror is preventing code from being
merged that has warnings. However it is annoying for users who may have
different compilers with different warnings. Limiting -Werror to CI only
covers the main concern about warnings without limiting users.
Some of the tests were creating a variable `res`, however the test
system itself relies on it's own `res` variable. This worked out by
luck, but could lead to problems if the res variables were different
types.
Changed the generated variable in the test system to the less common
name `test`, which also works out to share the same prefix as other test
functions.
Found by user iamscottmoyers, this was an interesting bug with the test
system. If the new test.c file is generated fast enough, it may not have
a new timestamp and not get recompiled.
To fix, we can remove the specific files that need to be rebuilt (lfs and
test.o).
An annoying part of filesystems is that the software library can change
independently of the on-disk structures. For this reason versioning is
very important, and must be handled separately for the software and
on-disk parts.
In this patch, littlefs provides two version numbers at compile time,
with major and minor parts, in the form of 6 macros.
LFS_VERSION // Library version, uint32_t encoded
LFS_VERSION_MAJOR // Major - Backwards incompatible changes
LFS_VERSION_MINOR // Minor - Feature additions
LFS_DISK_VERSION // On-disk version, uint32_t encoded
LFS_DISK_VERSION_MAJOR // Major - Backwards incompatible changes
LFS_DISK_VERSION_MINOR // Minor - Feature additions
Note that littlefs will error if it finds a major version number that
is different, or a minor version number that has regressed.
When running the tests, the emubd erase function relied on the value of
errno to not change over a possible call to unlink. Annoyingly, I've
only seen this cause problems on a couple of specific Travis instances
while self-hosting littlefs on top of littlefs-fuse.
As a copy-on-write filesystem, the truncate function is a very nice
function to have, as it can take advantage of reusing the data already
written out to disk.
Unfortunately for us, the ctz skip-list does not offer very much benefit
for full traversals. Since the information about which blocks are in
use are spread throughout the file, we can't use the fast-lanes
embedded in the skip-list without missing blocks.
However, it turns out we can at least use the 2nd level of the skip-list
without missing any blocks. From an asymptotic analysis, a constant speed
up isn't interesting, but from a pragmatic perspective, a 2x speedup is
not bad.
littlefs had an unwritten assumption that the block device's program
size would be a multiple of the read size, and the block size would
be a multiple of the program size. This has already caused confusion
for users. Added a note and assert to catch unexpected geometries
early.
Also found that the prog/erase functions indicated they must return
LFS_ERR_CORRUPT to catch bad blocks. This is no longer true as errors
are found by CRC.
In the open call, the LFS_O_TRUNC flag was correctly zeroing the file, but
it wasn't actually writing the change out to disk. This went unnoticed because
in the cases where the truncate was followed by a file write, the
updated contents would be written out correctly.
Marking the file as dirty if the file isn't already truncated fixes the
problem with the least impact. Also added better test cases around
truncating files.
This bug was a result of an annoying corner case around intermingling
signed and unsigned offsets. The boundary check that prevents seeking
a file to a position before the file was preventing valid seeks with
positive offsets.
This corner case is a bit more complicated than it looks because the
offset is signed, while the size of the file is unsigned. Simply
casting both to signed or unsigned offsets won't handle large files.
This was a small hole in the logic that handles initializing the
lookahead buffer. To imitate exhaustion (so the block allocator
will trigger a scan), the lookahead buffer is rewound a full
lookahead and set up to look like it is exhausted. However,
unlike normal allocation, this rewind was not kept aligned to
a multiple of the scan size, which is limited by both the
lookahead buffer and the total storage size.
This bug went unnoticed for so long because it only causes
problems when the block device is both:
1. Not aligned to the lookahead buffer (not a power of 2)
2. Smaller than the lookahead buffer
While this seems like a strange corner case for a block device,
this turned out to be very common for internal flash, especially
when a handleful of blocks are reserved for code.
As it was, if a user operated on a directory while at the same
time iterating over the directory, the directory objects could
fall out of sync. In the best case, files may be skipped while
removing everything in a file, in the worst case, a very poorly
timed directory relocate could be missed.
Simple fix is to add the same directory tracking that is currently
in use for files, at a small code+complexity cost.
Mostly changed the wording around the goals/features of littlefs based
on feedback from other developers.
Also added in project links now that there are a few of those floating
around. And made the README a bit easier to navigate.
Short story, files are no longer committed to directories during
file sync/close if the last write did not complete successfully.
This avoids a set of interesting user-experience issues related
to the end-of-life behaviour of the filesystem.
As a filesystem approaches end-of-life, the chances of running into
LFS_ERR_NOSPC grows rather quickly. Since this condition occurs after
at the end of a devices life, it's likely that operating in these
conditions hasn't been tested thoroughly.
In the specific case of file-writes, you can hit an LFS_ERR_NOSPC after
parts of the file have been written out. If the program simply continues
and closes the file, the file is written out half completed. Since
littlefs has a strong garuntee the prevents half-writes, it's unlikely
this state of the file would be expected.
To make things worse, since close is also responsible for memory
cleanup, it's actually _impossible_ to continue working as it was
without leaking memory.
By prevent the file commits, end-of-life behaviour should at least retain
a previous copy of the filesystem without any surprises.
Specifically around error handling. As is, incorrectly handled
errors could cause higher code to get uninitialized blocks,
potentially leading to writes to arbitray blocks on storage.
This is only an issue in the weird case that are worn down block is
left in the odd state of not being able to change the data that resides
on the block. That being said, this does pop up often when simulating
wear on block devices.
Currently, directory commits checked if the write succeeded by crcing the
block to avoid the additional RAM cost for another buffer. However,
before this commit, directory commits just checked if the block crc was
valid, rather than comparing to the expected crc. This would usually
work, unless the block was stuck in a state with valid crc.
The fix is to simply compare with the expected crc to find errors.
The previous math for determining if we scanned all of disk wasn't set
up correctly in the lfs_mount function. If lookahead == block_count
the lfs_alloc function would think we had already searched the entire
disk.
This is only an issue if we manage to exhaust a block on the first
pass after mount, since lfs_alloc_ack resets the lookahead region
into a valid state after a succesful block allocation.
The littlefs allows buffers to be passed statically in the case
that a system does not have a heap. Unfortunately, this means we
can't round up in the case of an unaligned lookahead buffer.
Double unfortunately, rounding down after clamping to the block device
size could result in a lookahead of zero for block devices < 32 blocks
large.
The assert in littlefs does catch this case, but rounding down prevents
support for < 32 block devices.
The solution is to simply require a 32-bit aligned buffer with an
assert. This avoids runtime problems while allowing a user to pass
in the correct buffer for < 32 block devices. Rounding up can be
handled at higher API levels.
Same runtime cost, however reduces the logic and avoids one of
the two big branches. See the DESIGN.md for more info.
Now uses these equations instead of the messy guess and correct method:
n = (N - w/8(popcount(N/(B-2w/8)) + 2)) / (B-2w/8)
off = N - (B-w2/8)n - w/8popcount(n)
This reduces the O(n^2logn) runtime to read a file to only O(nlog).
The extra O(n) did not touch the disk, so it isn't a problem until the
files become very large, but this solution comes with very little cost.
Long story short, you can find the block index + offset pair for a
CTZ linked-list with this series of formulas:
n' = floor(N / (B - 2w/8))
N' = (B - 2w/8)n' + (w/8)popcount(n')
off' = N - N'
n, off =
n'-1, off'+B if off' < 0
n', off'+(w/8)(ctz(n')+1) if off' >= 0
For the long story, you will need to see the updated DESIGN.md
Initially, I was concerned that the number of pointers in the ctz
linked-list could exceed the storage in a block. Long story short
this isn't really possible outside of extremely small block sizes.
Since clamping impacts the layout of files on disk, removing the
block size removed quite a bit of logic and corner cases. Replaced
with an assert on block size during initialization.
---
Long story long, the minimum block size needed to store all ctz
pointers in a filesystem can be found with this formula:
B = (w/8)*log2(2^w / (B - 2*(w/8)))
where:
B = block size in bytes
w = pointer width in bits
It's not a very pretty formula, but does give us some useful info
if we apply some math:
min block size:
32 bit ctz linked-list = 104 bytes
64 bit ctz linked-list = 448 bytes
For littlefs, 128 bytes is a perfectly reasonable minimum block size.
Deduplication and deorphan steps aren't required under indentical
conditions, but they can be processed in the same iteration of the
filesystem. Since lfs_alloc (requires deorphan) occurs on most write
calls to the filesystem (requires deduplication), it was simpler to
just compine the steps into a single lfs_deorphan step.
Also traded out the places where lfs_rename/lfs_remove just defer
operations to the deorphan step. This adds a bit of code, but also
significantly speeds up directory operations.
The "move problem" has been present in littlefs for a while, but I haven't
come across a solution worth implementing for various reasons.
The problem is simple: how do we move directory entries across
directories atomically? Since multiple directory entries are involved,
we can't rely entirely on the atomic block updates. It ends up being
a bit of a puzzle.
To make the problem more complicated, any directory block update can
fail due to wear, and cause the directory block to need to be relocated.
This happens rarely, but brings a large number of corner cases.
---
The solution in this patch is simple:
1. Mark source as "moving"
2. Copy source to destination
3. Remove source
If littlefs ever runs into a "moving" entry, that means a power loss
occured during a move. Either the destination entry exists or it
doesn't. In this case we just search the entire filesystem for the
destination entry.
This is expensive, however the chance of a power loss during a move
is relatively low.
lfs_file_seek returned the _previous_ file offset on success, where
most standards return the _calculated_ offset on success.
This just falls into me not noticing a mistake, and shows why it's
always helpful to have a second set of eyes on code.
Simply limiting the lookahead region to the size of
the block device fixes the problem.
Also added logic to limit the allocated region and
floor to nearest word, since the additional memory
couldn't really be used effectively.
Moslty just a hole in testing. Dir blocks were not being
correctly collected when removing entries from very large
files due to forgetting about the tail-bit in the directory
block size. The test hole has now been filled.
Also added lfs_entry_size to avoid having to repeat that
expression since it is a bit ridiculous
- out-of-bound read results in eof
- out-of-bound write will fill missing area with zeros
The write behaviour matches expected posix behaviour, but was
under consideration for not being dropped, since littlefs does
not support holes, and support of out-of-band seeks adds complexity.
However, it turned out filling with zeros was trivial, and only
cost an extra 74 bytes of flash (0.48%).
When the lookahead buffer wraps around in an unaligned filesystem, it's
possible for blocks at the beginning of the disk to have a negative distance
from the lookahead, but still reside in the lookahead buffer.
Switching to signed modulo doesn't quite work due to how negative modulo
is implemented in C, so the simple solution is to shift the region to be
positive.
This off-by-one error was caused by a slight difference between the
pos argument to lfs_index_find and lfs_index_extend. When pos is on
a block boundary, lfs_index_extend expects block to point before pos,
as it would when writing a file linearly. But when seeking to that
pos, the lfs_index_find to warm things up just supplies the block it
expects pos to be in.
Fixed the off-by-one error and added a test case for several of these
cold seek+writes.
2017-09-17 16:51:04 -05:00
32 changed files with 9966 additions and 2753 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.