Added documentation over the underlying design

2025-10-31 16:14:16 +01:00 · 2017-05-18 23:44:18 -05:00
parent 69294ac418
commit d2bf2bbc72
2 changed files with 1005 additions and 14 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -0,0 +1,973 @@
 ## The design of the little filesystem
 The littlefs is a little fail-safe filesystem designed for embedded systems.
 ```
   | | |     .---._____
  .-----.   |          |
 --|o    |---| littlefs |
 --|     |---|          |
  '-----'   '----------'
   | | |
 ```
 For a bit of backstory, the littlefs was developed with the goal of learning
 more about filesystem design by tackling the relative unsolved problem of
 managing a robust filesystem resilient to power loss on devices
 with limited RAM and ROM.
 The embedded systems the littlefs is targeting are usually 32bit
 microcontrollers with around 32Kbytes of RAM and 512Kbytes of ROM. These are
 often paired with SPI NOR flash chips with about 4Mbytes of flash storage.
 Flash itself is a very interesting piece of technology with quite a bit of
 nuance. Unlike most other forms of storage, writing to flash requires two
 operations: erasing and programming. The programming operation is relatively
 cheap, and can be very granular. For NOR flash specifically, byte-level
 programs are quite common. Erasing, however, requires an expensive operation
 that forces the state of large blocks of memory to reset in a destructive
 reaction that gives flash its name. The [Wikipedia entry](https://en.wikipedia.org/wiki/Flash_memory)
 has more information if you are interesting in how this works.
 This leaves us with an interesting set of limitations that can be simplified
 to three strong requirements:
 1. **Fail-safe** - This is actually the main goal of the littlefs and the focus
   of this project. Embedded systems are usually designed without a shutdown
   routine and a notable lack of user interface for recovery, so filesystems
   targeting embedded systems should be prepared to lose power an any given
   time.
   Despite this state of things, there are very few embedded filesystems that
   handle power loss in a reasonable manner, and can become corrupted if the
   user is unlucky enough.
 2. **Wear awareness** - Due to the destructive nature of flash, most flash
   chips have a limited number of erase cycles, usually in the order of around
   100,000 erases per block for NOR flash. Filesystems that don't take wear
   into account can easily burn through blocks used to store frequently updated
   metadata.
   Consider the [FAT filesystem](https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system),
   which stores a file allocation table (FAT) at a specific offset from the
   beginning of disk. Every block allocation will update this table, and after
   100,000 updates, the block will likely go bad, rendering the filesystem
   unusable even if there are many more erase cycles available on the storage.
 3. **Bounded RAM/ROM** - Even with the design difficulties presented by the
   previous two limitations, we have already seen several flash filesystems
   developed on PCs that handle power loss just fine, such as the
   logging filesystems. However, these filesystems take advantage of the
   relatively cheap access to RAM, and use some rather... opportunistic...
   techniques, such as reconstructing the entire directory structure in RAM.
   These operations make perfect sense when the filesystem's only concern is
   erase cycles, but the idea is a bit silly on embedded systems.
   To cater to embedded systems, the littlefs has the simple limitation of
   using only a bounded amount of RAM and ROM. That is, no matter what is
   written to the filesystem, and no matter how large the underlying storage
   is, the littlefs will always use the same amount of RAM and ROM. This
   presents a very unique challenge, and makes presumably simple operations,
   such as iterating through the directory tree, surprisingly difficult.
 ## Existing designs?
 There are of course, many different existing filesystem. Heres a very rough
 summary of the general ideas behind some of them.
 Most of the existing filesystems fall into the one big category of filesystem
 designed in the early days of spinny magnet disks. While there is a vast amount
 of interesting technology and ideas in this area, the nature of spinny magnet
 disks encourage properties such as grouping writes near each other, that don't
 make as much sense on recent storage types. For instance, on flash, write
 locality is not as important and can actually increase wear destructively.
 One of the most popular designs for flash filesystems is called the
 [logging filesystem](https://en.wikipedia.org/wiki/Log-structured_file_system).
 The flash filesystems [jffs](https://en.wikipedia.org/wiki/JFFS)
 and [yaffs](https://en.wikipedia.org/wiki/YAFFS) are good examples. In
 logging filesystem, data is not store in a data structure on disk, but instead
 the changes to the files are stored on disk. This has several neat advantages,
 such as the fact that the data is written in a cyclic log format naturally
 wear levels as a side effect. And, with a bit of error detection, the entire
 filesystem can easily be designed to be resilient to power loss. The
 journalling component of most modern day filesystems is actually a reduced
 form of a logging filesystem. However, logging filesystems have a difficulty
 scaling as the size of storage increases. And most filesystems compensate by
 caching large parts of the filesystem in RAM, a strategy that is unavailable
 for embedded systems.
 Another interesting filesystem design technique that the littlefs borrows the
 most from, is the [copy-on-write (COW)](https://en.wikipedia.org/wiki/Copy-on-write).
 A good example of this is the [btrfs](https://en.wikipedia.org/wiki/Btrfs)
 filesystem. COW filesystems can easily recover from corrupted blocks and have
 natural protection against power loss. However, if they are not designed with
 wear in mind, a COW filesystem could unintentionally wear down the root block
 where the COW data structures are synchronized.
 ## Metadata pairs
 The core piece of technology that provides the backbone for the littlefs is
 the concept of metadata pairs. The key idea here, is that any metadata that
 needs to be updated atomically is stored on a pair of blocks tagged with
 a revision count and checksum. Every update alternates between these two
 pairs, so that at any time there is always a backup containing the previous
 state of the metadata.
 Consider a small example where each metadata pair has a revision count,
 a number as data, and the xor of the block as a quick checksum. If
 we update the data to a value of 9, and then to a value of 5, here is
 what the pair of blocks may look like after each update:
 ```
  block 1   block 2        block 1   block 2        block 1   block 2
 .---------.---------.    .---------.---------.    .---------.---------.
 | rev: 1  | rev: 0  |    | rev: 1  | rev: 2  |    | rev: 3  | rev: 2  |
 | data: 3 | data: 0 | -> | data: 3 | data: 9 | -> | data: 5 | data: 9 |
 | xor: 2  | xor: 0  |    | xor: 2  | xor: 11 |    | xor: 6  | xor: 11 |
 '---------'---------'    '---------'---------'    '---------'---------'
                 let data = 9             let data = 5
 ```
 After each update, we can find the most up to date value of data by looking
 at the revision count.
 Now consider what the blocks may look like if we suddenly loss power while
 changing the value of data to 5:
 ```
  block 1   block 2        block 1   block 2        block 1   block 2
 .---------.---------.    .---------.---------.    .---------.---------.
 | rev: 1  | rev: 0  |    | rev: 1  | rev: 2  |    | rev: 3  | rev: 2  |
 | data: 3 | data: 0 | -> | data: 3 | data: 9 | -x | data: 3 | data: 9 |
 | xor: 2  | xor: 0  |    | xor: 2  | xor: 11 |    | xor: 2  | xor: 11 |
 '---------'---------'    '---------'---------'    '---------'---------'
                 let data = 9             let data = 5
                                          powerloss!!!
 ```
 In this case, block 1 was partially written with a new revision count, but
 the littlefs hadn't made it to updating the value of data. However, if we
 check our checksum we notice that block 1 was corrupted. So we fall back to
 block 2 and use the value 9.
 Using this concept, the littlefs is able to update metadata blocks atomically.
 There are a few other tweaks, such as using a 32bit crc and using sequence
 arithmetic to handle revision count overflow, but the basic concept
 is the same. These metadata pairs define the backbone of the littlefs, and the
 rest of the filesystem is built on top of these atomic updates.
 ## Files
 Now, the metadata pairs do come with some drawbacks. Most notably, each pair
 requires two blocks for each block of data. I'm sure users would be very
 unhappy if their storage was suddenly cut in half! Instead of storing
 everything in these metadata blocks, the littlefs uses a COW data structure
 for files which is in turn pointed to by a metadata block. When
 we update a file, we create a copies of any blocks that are modified until
 the metadata blocks are updated with the new copy. Once the metadata block
 points to the new copy, we deallocate the old blocks that are no longer in use.
 Here is what updating a one-block file may look like:
 ```
  block 1   block 2        block 1   block 2        block 1   block 2
 .---------.---------.    .---------.---------.    .---------.---------.
 | rev: 1  | rev: 0  |    | rev: 1  | rev: 0  |    | rev: 1  | rev: 2  |
 | file: 4 | file: 0 | -> | file: 4 | file: 0 | -> | file: 4 | file: 5 |
 | xor: 5  | xor: 0  |    | xor: 5  | xor: 0  |    | xor: 5  | xor: 7  |
 '---------'---------'    '---------'---------'    '---------'---------'
    |                        |                                  |
    v                        v                                  v
 block 4                  block 4    block 5       block 4    block 5
 .--------.               .--------. .--------.    .--------. .--------.
 | old    |               | old    | | new    |    | old    | | new    |
 | data   |               | data   | | data   |    | data   | | data   |
 |        |               |        | |        |    |        | |        |
 '--------'               '--------' '--------'    '--------' '--------'
            update data in file        update metadata pair
 ```
 It doesn't matter if we lose power while writing block 5 with the new data,
 since the old data remains unmodified in block 4. This example also
 highlights how the atomic updates of the metadata blockss provide a
 synchronization barrier for the rest of the littlefs.
 At this point, it may look like we are wasting an awfully large amount
 of space on the metadata. Just looking at that example, we are using
 three blocks to represent a file that fits comfortably in one! So instead
 of giving each file a metadata pair, we actually store the metadata for
 all files contained in a single directory in a single metadata block.
 Now we could just leave files here, copying the entire file on write
 provides the synchronization without the duplicated memory requirements
 of the metadata blocks. However, we can do a bit better.
 ## CTZ linked-lists
 There are many different data structures for representing the actual
 files in filesystems. Of these, the littlefs uses a rather unique [COW](https://upload.wikimedia.org/wikipedia/commons/0/0c/Cow_female_black_white.jpg)
 data structure that allows the filesystem to reuse unmodified parts of the
 file without additional metadata pairs.
 First lets consider storing files in a simple linked-list. What happens when
 append a block? We have to change the last block in the linked-list to point
 to this new block, which means we have to copy out the last block, and change
 the second-to-last block, and then the third-to-last, and so on until we've
 copied out the entire file.
 ```
 Exhibit A: A linked-list
 .--------.  .--------.  .--------.  .--------.  .--------.  .--------.
 | data 0 |->| data 1 |->| data 2 |->| data 4 |->| data 5 |->| data 6 |
 |        |  |        |  |        |  |        |  |        |  |        |
 |        |  |        |  |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 To get around this, the littlefs, at its heart, stores files backwards. Each
 block points to its predecessor, with the first block containing no pointers.
 If you think about this, it makes a bit of sense. Appending blocks just point
 to their predecessor and no other blocks need to be updated. If we update
 a block in the middle, we will need to copy out the blocks that follow,
 but can reuse the blocks before the modified block. Since most file operations
 either reset the file each write or append to files, this design avoids
 copying the file in the most common cases.
 ```
 Exhibit B: A backwards linked-list
 .--------.  .--------.  .--------.  .--------.  .--------.  .--------.
 | data 0 |<-| data 1 |<-| data 2 |<-| data 4 |<-| data 5 |<-| data 6 |
 |        |  |        |  |        |  |        |  |        |  |        |
 |        |  |        |  |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 However, a backwards linked-list does come with a rather glaring problem.
 Iterating over a file _in order_ has a runtime of O(n^2). Gah! A quadratic
 runtime to just _read_ a file? That's awful. Keep in mind reading files are
 usually the most common filesystem operation.
 To avoid this problem, the littlefs uses a multilayered linked-list. For
 every block that is divisible by a power of two, the block contains an
 additional pointer that points back by that power of two. Another way of
 thinking about this design is that there are actually many linked-lists
 threaded together, with each linked-lists skipping an increasing number
 of blocks. If you're familiar with data-structures, you may have also
 recognized that this is a deterministic skip-list.
 To find the power of two factors efficiently, we can use the instruction
 [count trailing zeros (CTZ)](https://en.wikipedia.org/wiki/Count_trailing_zeros),
 which is where this linked-list's name comes from.
 ```
 Exhibit C: A backwards CTZ linked-list
 .--------.  .--------.  .--------.  .--------.  .--------.  .--------.
 | data 0 |<-| data 1 |<-| data 2 |<-| data 3 |<-| data 4 |<-| data 5 |
 |        |<-|        |--|        |<-|        |--|        |  |        |
 |        |<-|        |--|        |--|        |--|        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 Taking exhibit C for example, here is the path from data block 5 to data
 block 1. You can see how data block 3 was completely skipped:
 ```
 .--------.  .--------.  .--------.  .--------.  .--------.  .--------.
 | data 0 |  | data 1 |<-| data 2 |  | data 3 |  | data 4 |<-| data 5 |
 |        |  |        |  |        |<-|        |--|        |  |        |
 |        |  |        |  |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 The path to data block 0 is even more quick, requiring only two jumps:
 ```
 .--------.  .--------.  .--------.  .--------.  .--------.  .--------.
 | data 0 |  | data 1 |  | data 2 |  | data 3 |  | data 4 |<-| data 5 |
 |        |  |        |  |        |  |        |  |        |  |        |
 |        |<-|        |--|        |--|        |--|        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 The CTZ linked-list has quite a few interesting properties. All of the pointers
 in the block can be found by just knowing the index in the list of the current
 block, and, with a bit of math, the amortized overhead for the linked-list is
 only two pointers per block.  Most importantly, the CTZ linked-list has a
 worst case lookup runtime of O(logn), which brings the runtime of reading a
 file down to O(n logn). Given that the constant runtime is divided by the
 amount of data we can store in a block, this is pretty reasonable.
 Here is what it might look like to update a file stored with a CTZ linked-list:
 ```
                                      block 1   block 2
                                    .---------.---------.
                                    | rev: 1  | rev: 0  |
                                    | file: 6 | file: 0 |
                                    | size: 4 | xor: 0  |
                                    | xor: 3  | xor: 0  |
                                    '---------'---------'
                                        |
                                        v
  block 3     block 4     block 5     block 6
 .--------.  .--------.  .--------.  .--------.
 | data 0 |<-| data 1 |<-| data 2 |<-| data 3 |
 |        |<-|        |--|        |  |        |
 |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'
 |  update data in file
 v
                                      block 1   block 2
                                    .---------.---------.
                                    | rev: 1  | rev: 0  |
                                    | file: 6 | file: 0 |
                                    | size: 4 | size: 0 |
                                    | xor: 3  | xor: 0  |
                                    '---------'---------'
                                        |
                                        v
  block 3     block 4     block 5     block 6
 .--------.  .--------.  .--------.  .--------.
 | data 0 |<-| data 1 |<-| old    |<-| old    |
 |        |<-|        |--| data 2 |  | data 3 |
 |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'
     ^ ^           ^
     | |           |      block 7     block 8     block 9    block 10
     | |           |    .--------.  .--------.  .--------.  .--------.
     | |           '----| new    |<-| new    |<-| new    |<-| new    |
     | '----------------| data 2 |<-| data 3 |--| data 4 |  | data 5 |
     '------------------|        |--|        |--|        |  |        |
                        '--------'  '--------'  '--------'  '--------'
 |  update metadata pair
 v
                                                   block 1   block 2
                                                 .---------.---------.
                                                 | rev: 1  | rev: 2  |
                                                 | file: 6 | file: 10|
                                                 | size: 4 | size: 6 |
                                                 | xor: 3  | xor: 14 |
                                                 '---------'---------'
                                                                |
                                                                |
  block 3     block 4     block 5     block 6                   |
 .--------.  .--------.  .--------.  .--------.                  |
 | data 0 |<-| data 1 |<-| old    |<-| old    |                  |
 |        |<-|        |--| data 2 |  | data 3 |                  |
 |        |  |        |  |        |  |        |                  |
 '--------'  '--------'  '--------'  '--------'                  |
     ^ ^           ^                                            v
     | |           |      block 7     block 8     block 9    block 10
     | |           |    .--------.  .--------.  .--------.  .--------.
     | |           '----| new    |<-| new    |<-| new    |<-| new    |
     | '----------------| data 2 |<-| data 3 |--| data 4 |  | data 5 |
     '------------------|        |--|        |--|        |  |        |
                        '--------'  '--------'  '--------'  '--------'
 ```
 ## Block allocation
 So those two ideas provide the grounds for the filesystem. The metadata pairs
 give us directories, and the CTZ linked-lists give us files. But this leaves
 one big [elephant](https://upload.wikimedia.org/wikipedia/commons/3/37/African_Bush_Elephant.jpg)
 of a question. How do we get those blocks in the first place?
 One common strategy is to store unallocated blocks in a big free list, and
 initially the littlefs was designed with this in mind. By storing a reference
 to the free list in every single metadata pair, additions to the free list
 could be updated atomically at the same time the replacement blocks were
 stored in the metadata pair. During boot, every metadata pair had to be
 scanned to find the most recent free list, but once the list was found the
 state of all free blocks becomes known.
 However, this approach had several issues:
 - There was a lot of nuanced logic for adding blocks to the free list without
  modifying the blocks, since the blocks remain active until the metadata is
  updated.
 - The free list had to support both additions and removals in fifo order while
  minimizing block erases.
 - The free list had to handle the case where the file system completely ran
  out of blocks and may no longer be able to add blocks to the free list.
 - If we used a revision count to track the most recently updated free list,
  metadata blocks that were left unmodified were ticking time bombs that would
  cause the system to go haywire if the revision count overflowed
 - Every single metadata block wasted space to store these free list references.
 Actually, to simplify, this approach had one massive glaring issue: complexity.
 > Complexity leads to fallibility.  
 > Fallibility leads to unmaintainability.  
 > Unmaintainability leads to suffering.  
 Or at least, complexity leads to increased code size, which is a problem
 for embedded systems.
 In the end, the littlefs adopted more of a "drop it on the floor" strategy.
 That is, the littlefs doesn't actually store information about which blocks
 are free on the storage. The littlefs already stores which files _are_ in
 use, so to find a free block, the littlefs just takes all of the blocks that
 exist and subtract the blocks that are in use.
 Of course, it's not quite that simple. Most filesystems that adopt this "drop
 it on the floor" strategy either rely on some properties inherent to the
 filesystem, such as the cyclic-buffer structure of logging filesystems,
 or use a bitmap or table stored in RAM to track free blocks, which scales
 with the size of storage and is problematic when you have limited RAM. You
 could iterate through every single block in storage and check it against
 every single block in the filesystem on every single allocation, but that
 would have an abhorrent runtime.
 So the littlefs compromises. It doesn't store a bitmap the size of the storage,
 but it does store a little bit-vector that contains a fixed set lookahead
 for block allocations. During a block allocation, the lookahead vector is
 checked for any free blocks, if there are none, the lookahead region jumps
 forward and the entire filesystem is scanned for free blocks.
 Here's what it might look like to allocate 4 blocks on a decently busy
 filesystem with a 32bit lookahead and a total of
 128 blocks (512Kbytes of storage if blocks are 4Kbyte):
 ```
 boot...         lookahead:
                fs blocks: fffff9fffffffffeffffffffffff0000
 scanning...     lookahead: fffff9ff
                fs blocks: fffff9fffffffffeffffffffffff0000
 alloc = 21      lookahead: fffffdff
                fs blocks: fffffdfffffffffeffffffffffff0000
 alloc = 22      lookahead: ffffffff
                fs blocks: fffffffffffffffeffffffffffff0000
 scanning...     lookahead:         fffffffe
                fs blocks: fffffffffffffffeffffffffffff0000
 alloc = 63      lookahead:         ffffffff
                fs blocks: ffffffffffffffffffffffffffff0000
 scanning...     lookahead:         ffffffff
                fs blocks: ffffffffffffffffffffffffffff0000
 scanning...     lookahead:                 ffffffff
                fs blocks: ffffffffffffffffffffffffffff0000
 scanning...     lookahead:                         ffff0000
                fs blocks: ffffffffffffffffffffffffffff0000
 alloc = 112     lookahead:                         ffff8000
                fs blocks: ffffffffffffffffffffffffffff8000
 ```
 While this lookahead approach still has an asymptotic runtime of O(n^2) to
 scan all of storage, the lookahead reduces the practical runtime to a
 reasonable amount. Bit-vectors are surprisingly compact, given only 16 bytes,
 the lookahead could track 128 blocks. For a 4Mbyte flash chip with 4Kbyte
 blocks, the littlefs would only need 8 passes to scan the entire storage.
 The real benefit of this approach is just how much it simplified the design
 of the littlefs. Deallocating blocks is as simple as simply forgetting they
 exist, and there is absolutely no concern of bugs in the deallocation code
 causing difficult to detect memory leaks.
 ## Directories
 Now we just need directories to store our files. Since we already have
 metadata blocks that store information about files, lets just use these
 metadata blocks as the directories. Maybe turn the directories into linked
 lists of metadata blocks so it isn't limited by the number of files that fit
 in a single block. Add entries that represent other nested directories.
 Drop "." and ".." entries, cause who needs them. Dust off our hands and
 we now have a directory tree.
 ```
            .--------.
            |root dir|
            | pair 0 |
            |        |
            '--------'
            .-'    '-------------------------.
           v                                  v
      .--------.        .--------.        .--------.
      | dir A  |------->| dir A  |        | dir B  |
      | pair 0 |        | pair 1 |        | pair 0 |
      |        |        |        |        |        |
      '--------'        '--------'        '--------'
      .-'    '-.            |             .-'    '-.
     v          v           v            v          v
 .--------.  .--------.  .--------.  .--------.  .--------.
 | file C |  | file D |  | file E |  | file F |  | file G |
 |        |  |        |  |        |  |        |  |        |
 |        |  |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 Unfortunately it turns out it's not that simple. See, iterating over a
 directory tree isn't actually all that easy, especially when you're trying
 to fit in a bounded amount of RAM, which rules out any recursive solution.
 And since our block allocator involves iterating over the entire filesystem
 tree, possibly multiple times in a single allocation, iteration needs to be
 efficient.
 So, as a solution, the littlefs adopted a sort of threaded tree. Each
 directory not only contains pointers to all of its children, but also a
 pointer to the next directory. These pointers create a linked-list that
 is threaded through all of the directories in the filesystem. Since we
 only use this linked list to check for existance, the order doesn't actually
 matter. As an added plus, we can repurpose the pointer for the individual
 directory linked-lists and avoid using any additional space.
 ```
            .--------.
            |root dir|-.
            | pair 0 | |
   .--------|        |-'
   |        '--------'
   |        .-'    '-------------------------.
   |       v                                  v
   |  .--------.        .--------.        .--------.
   '->| dir A  |------->| dir B  |------->| dir B  |
      | pair 0 |        | pair 1 |        | pair 0 |
      |        |        |        |        |        |
      '--------'        '--------'        '--------'
      .-'    '-.            |             .-'    '-.
     v          v           v            v          v
 .--------.  .--------.  .--------.  .--------.  .--------.
 | file C |  | file D |  | file E |  | file F |  | file G |
 |        |  |        |  |        |  |        |  |        |
 |        |  |        |  |        |  |        |  |        |
 '--------'  '--------'  '--------'  '--------'  '--------'
 ```
 This threaded tree approach does come with a few tradeoffs. Now, anytime we
 want to manipulate the directory tree, we find ourselves having to update two
 pointers instead of one. For anyone familiar with creating atomic data
 structures this should set off a whole bunch of red flags.
 But unlike the data structure guys, we can update a whole block atomically! So
 as long as we're really careful (and cheat a little bit), we can still
 manipulate the directory tree in a way that is resilient to power loss.
 Consider how we might add a new directory. Since both pointers that reference
 it can come from the same directory, we only need a single atomic update to
 finagle the directory into the filesystem:
 ```
   .--------.
   |root dir|-.
   | pair 0 | |
 .--|        |-'
 |  '--------'
 |      |
 |      v
 |  .--------.
 '->| dir A  |
   | pair 0 |
   |        |
   '--------'
 |  create the new directory block
 v
               .--------.
               |root dir|-.
               | pair 0 | |
            .--|        |-'
            |  '--------'
            |      |
            |      v
            |  .--------.
 .--------.  '->| dir A  |
 | dir B  |---->| pair 0 |
 | pair 0 |     |        |
 |        |     '--------'
 '--------'
 |  update root to point to directory B
 v
         .--------.
         |root dir|-.
         | pair 0 | |
 .--------|        |-'
 |        '--------'
 |        .-'    '-.
 |       v          v
 |  .--------.  .--------.
 '->| dir B  |->| dir A  |
   | pair 0 |  | pair 0 |
   |        |  |        |
   '--------'  '--------'
 ```
 Note that even though directory B was added after directory A, we insert
 directory B before directory A in the linked-list because it is convenient.
 Now how about removal:
 ```
         .--------.        .--------.
         |root dir|------->|root dir|-.
         | pair 0 |        | pair 1 | |
 .--------|        |--------|        |-'
 |        '--------'        '--------'
 |        .-'    '-.            |
 |       v          v           v
 |  .--------.  .--------.  .--------.
 '->| dir A  |->| dir B  |->| dir C  |
   | pair 0 |  | pair 0 |  | pair 0 |
   |        |  |        |  |        |
   '--------'  '--------'  '--------'
 |  update root to no longer contain directory B
 v
   .--------.              .--------.
   |root dir|------------->|root dir|-.
   | pair 0 |              | pair 1 | |
 .--|        |--------------|        |-'
 |  '--------'              '--------'
 |      |                       |
 |      v                       v
 |  .--------.  .--------.  .--------.
 '->| dir A  |->| dir B  |->| dir C  |
   | pair 0 |  | pair 0 |  | pair 0 |
   |        |  |        |  |        |
   '--------'  '--------'  '--------'
 |  remove directory B from the linked-list
 v
   .--------.  .--------.
   |root dir|->|root dir|-.
   | pair 0 |  | pair 1 | |
 .--|        |--|        |-'
 |  '--------'  '--------'
 |      |           |
 |      v           v
 |  .--------.  .--------.
 '->| dir A  |->| dir C  |
   | pair 0 |  | pair 0 |
   |        |  |        |
   '--------'  '--------'
 ```
 Wait, wait, wait, wait, wait, that's not atomic at all! If power is lost after
 removing directory B from the root, directory B is still in the linked-list.
 We've just created a memory leak!
 And to be honest, I don't have a clever solution for this case. As a
 side-effect of using multiple pointers in the threaded tree, the littlefs
 can end up with orphan blocks that have no parents and should have been
 removed.
 To keep these orphan blocks from becoming a problem, the littlefs has a
 deorphan step that simply iterates through every directory in the linked-list
 and checks it against every directory entry in the filesystem to see if it
 has a parent. The deorphan step occurs on the first block allocation after
 boot, so orphans should never cause the littlefs to run out of storage
 prematurely.
 And for my final trick, moving a directory:
 ```
         .--------.
         |root dir|-.
         | pair 0 | |
 .--------|        |-'
 |        '--------'
 |        .-'    '-.
 |       v          v
 |  .--------.  .--------.
 '->| dir A  |->| dir B  |
   | pair 0 |  | pair 0 |
   |        |  |        |
   '--------'  '--------'
 |  update directory B to point to directory A
 v
         .--------.
         |root dir|-.
         | pair 0 | |
 .--------|        |-'
 |        '--------'
 |    .-----'    '-.
 |    |             v
 |    |           .--------.
 |    |        .->| dir B  |
 |    |        |  | pair 0 |
 |    |        |  |        |
 |    |        |  '--------'
 |    |     .-------'
 |    v    v   |
 |  .--------. |
 '->| dir A  |-'
   | pair 0 |
   |        |
   '--------'
 |  update root to no longer contain directory A
 v
     .--------.
     |root dir|-.
     | pair 0 | |
 .----|        |-'
 |    '--------'
 |        |
 |        v
 |    .--------.
 | .->| dir B  |
 | |  | pair 0 |
 | '--|        |-.
 |    '--------' |
 |        |      |
 |        v      |
 |    .--------. |
 '--->| dir A  |-'
     | pair 0 |
     |        |
     '--------'
 ```
 Note that once again we don't care about the ordering of directories in the
 linked-list, so we can simply leave directories in their old positions. This
 does make the diagrams a bit hard to draw, but the littlefs doesn't really
 care.
 It's also worth noting that once again we have an operation that isn't actually
 atomic. After we add directory A to directory B, we could lose power, leaving
 directory A as a part of both the root directory and directory B. However,
 there isn't anything inherent to the littlefs that prevents a directory from
 having multiple parents, so in this case, we just allow that to happen. Extra
 care is taken to only remove a directory from the linked-list if there are
 no parents left in the filesystem.
 ## Wear awareness
 So now that we have all of the pieces of a filesystem, we can look at a more
 subtle attribute of embedded storage: The wear down of flash blocks.
 The first concern for the littlefs, is that prefectly valid blocks can suddenly
 become unusable. As a nice side-effect of using a COW data-structure for files,
 we can simply move on to a different block when a file write fails. All
 modifications to files are performed in copies, so we will only replace the
 old file when we are sure none of the new file has errors. Directories, on
 the other hand, need a different strategy.
 The solution to directory corruption in the littlefs relies on the redundant
 nature of the metadata pairs. If an error is detected during a write to one
 of the metadata pairs, we seek out a new block to take its place. Once we find
 a block without errors, we iterate through the directory tree, updating any
 references to the corrupted metadata pair to point to the new metadata block.
 Just like when we remove directories, we can lose power during this operation
 and end up with a desynchronized metadata pair in our filesystem. And just like
 when we remove directories, we leave the possibility of a desynchronized
 metadata pair up to the deorphan step to clean up.
 Here's what encountering a directory error may look like with all of
 the directories and directory pointers fully expanded:
 ```
         root dir
         block 1   block 2
       .---------.---------.
       | rev: 1  | rev: 0  |--.
       |         |         |-.|
 .------|         |         |-|'
 |.-----|         |         |-'
 ||     '---------'---------'
 ||       |||||'--------------------------------------------------.
 ||       ||||'-----------------------------------------.         |
 ||       |||'-----------------------------.            |         |
 ||       ||'--------------------.         |            |         |
 ||       |'-------.             |         |            |         |
 ||       v         v            v         v            v         v
 ||    dir A                  dir B                  dir C
 ||    block 3   block 4      block 5   block 6      block 7   block 8
 ||  .---------.---------.  .---------.---------.  .---------.---------.
 |'->| rev: 1  | rev: 0  |->| rev: 1  | rev: 0  |->| rev: 1  | rev: 0  |
 '-->|         |         |->|         |         |->|         |         |
    |         |         |  |         |         |  |
    |         |         |  |         |         |  |         |         |
    '---------'---------'  '---------'---------'  '---------'---------'
 |  update directory B
 v
         root dir
         block 1   block 2
       .---------.---------.
       | rev: 1  | rev: 0  |--.
       |         |         |-.|
 .------|         |         |-|'
 |.-----|         |         |-'
 ||     '---------'---------'
 ||       |||||'--------------------------------------------------.
 ||       ||||'-----------------------------------------.         |
 ||       |||'-----------------------------.            |         |
 ||       ||'--------------------.         |            |         |
 ||       |'-------.             |         |            |         |
 ||       v         v            v         v            v         v
 ||    dir A                  dir B                  dir C
 ||    block 3   block 4      block 5   block 6      block 7   block 8
 ||  .---------.---------.  .---------.---------.  .---------.---------.
 |'->| rev: 1  | rev: 0  |->| rev: 1  | rev: 2  |->| rev: 1  | rev: 0  |
 '-->|         |         |->|         | corrupt!|->|         |         |
    |         |         |  |         | corrupt!|  |         |         |
    |         |         |  |         | corrupt!|  |         |         |
    '---------'---------'  '---------'---------'  '---------'---------'
 |  oh no! corruption detected
 v  allocate a replacement block
         root dir
         block 1   block 2
       .---------.---------.
       | rev: 1  | rev: 0  |--.
       |         |         |-.|
 .------|         |         |-|'
 |.-----|         |         |-'
 ||     '---------'---------'
 ||       |||||'----------------------------------------------------.
 ||       ||||'-------------------------------------------.         |
 ||       |||'-----------------------------.              |         |
 ||       ||'--------------------.         |              |         |
 ||       |'-------.             |         |              |         |
 ||       v         v            v         v              v         v
 ||    dir A                  dir B                    dir C
 ||    block 3   block 4      block 5   block 6        block 7   block 8
 ||  .---------.---------.  .---------.---------.    .---------.---------.
 |'->| rev: 1  | rev: 0  |->| rev: 1  | rev: 2  |--->| rev: 1  | rev: 0  |
 '-->|         |         |->|         | corrupt!|--->|         |         |
    |         |         |  |         | corrupt!| .->|         |         |
    |         |         |  |         | corrupt!| |  |         |         |
    '---------'---------'  '---------'---------' |  '---------'---------'
                                       block 9   |
                                     .---------. |
                                     | rev: 2  |-'
                                     |         |
                                     |         |
                                     |         |
                                     '---------'
 |  update root directory to contain block 9
 v
        root dir
        block 1   block 2
      .---------.---------.
      | rev: 1  | rev: 2  |--.
      |         |         |-.|
 .-----|         |         |-|'
 |.----|         |         |-'
 ||    '---------'---------'
 ||       .--------'||||'----------------------------------------------.
 ||       |         |||'-------------------------------------.         |
 ||       |         ||'-----------------------.              |         |
 ||       |         |'------------.           |              |         |
 ||       |         |             |           |              |         |
 ||       v         v             v           v              v         v
 ||    dir A                   dir B                      dir C
 ||    block 3   block 4       block 5     block 9        block 7   block 8
 ||  .---------.---------.   .---------. .---------.    .---------.---------.
 |'->| rev: 1  | rev: 0  |-->| rev: 1  |-| rev: 2  |--->| rev: 1  | rev: 0  |
 '-->|         |         |-. |         | |         |--->|         |         |
    |         |         | | |         | |         | .->|         |         |
    |         |         | | |         | |         | |  |         |         |
    '---------'---------' | '---------' '---------' |  '---------'---------'
                          |               block 6   |
                          |             .---------. |
                          '------------>| rev: 2  |-'
                                        | corrupt!|
                                        | corrupt!|
                                        | corrupt!|
                                        '---------'
 |  remove corrupted block from linked-list
 v
        root dir
        block 1   block 2
      .---------.---------.
      | rev: 1  | rev: 2  |--.
      |         |         |-.|
 .-----|         |         |-|'
 |.----|         |         |-'
 ||    '---------'---------'
 ||       .--------'||||'-----------------------------------------.
 ||       |         |||'--------------------------------.         |
 ||       |         ||'--------------------.            |         |
 ||       |         |'-----------.         |            |         |
 ||       |         |            |         |            |         |
 ||       v         v            v         v            v         v
 ||    dir A                  dir B                  dir C
 ||    block 3   block 4      block 5   block 9      block 7   block 8
 ||  .---------.---------.  .---------.---------.  .---------.---------.
 |'->| rev: 1  | rev: 2  |->| rev: 1  | rev: 2  |->| rev: 1  | rev: 0  |
 '-->|         |         |->|         |         |->|         |         |
    |         |         |  |         |         |  |         |         |
    |         |         |  |         |         |  |         |         |
    '---------'---------'  '---------'---------'  '---------'---------'
 ```
 Also one question I've been getting is, what about the root directory?
 It can't move so wouldn't the filesystem die as soon as the root blocks
 develop errors? And you would be correct. So instead of storing the root
 in the first few blocks of the storage, the root is actually pointed to
 by the superblock. The superblock contains a few bits of static data, but
 outside of when the filesystem is formatted, it is only updated when the root
 develops errors and needs to be moved.
 ## Wear leveling
 The second concern for the littlefs, is that blocks in the filesystem may wear
 unevenly. In this situation, a filesystem may meet an early demise where
 there are no more non-corrupted blocks that aren't in use. It may be entirely
 possible that files were written once and left unmodified, wasting the
 potential erase cycles of the blocks it sits on.
 Wear leveling is a term that describes distributing block writes evenly to
 avoid the early termination of a flash part. There are typically two levels
 of wear leveling:
 1. Dynamic wear leveling - Blocks are distributed evenly during blocks writes.
   Note that the issue with write-once files still exists in this case.
 2. Static wear leveling - Unmodified blocks are evicted for new block writes.
   This provides the longest lifetime for a flash device.
 Now, it's possible to use the revision count on metadata pairs to approximate
 the wear of a metadata block. And combined with the COW nature of files, the
 littlefs could provide a form of dynamic wear leveling.
 However, the littlefs does not. This is for a few reasons. Most notably, even
 if the littlefs did implement dynamic wear leveling, this would still not
 handle the case of write-once files, and near the end of the lifetime of a
 flash device, you would likely end up with uneven wear on the blocks anyways.
 As a flash device reaches the end of its life, the metadata blocks will
 naturally be the first to go since they are updated most often. In this
 situation, the littlefs is designed to simply move on to another set of
 metadata blocks. This travelling means that at the end of a flash device's
 life, the filesystem will have worn the device down as evenly as a dynamic
 wear leveling filesystem could anyways. Simply put, if the lifetime of flash
 is a serious concern, static wear leveling is the only valid solution.
 This is a very important takeaway to note. If your storage stack uses highly
 sensitive storage such as NAND flash. In most cases you are going to be better
 off just using a [flash translation layer (FTL)](https://en.wikipedia.org/wiki/Flash_translation_layer).
 NAND flash already has many limitations that make it poorly suited for an
 embedded system: low erase cycles, very large blocks, errors that can develop
 even during reads, errors that can develop during writes of neighboring blocks.
 Managing sensitive storage such as NAND flash is out of scope for the littlefs.
 The littlefs does have some properties that may be beneficial on top of a FTL,
 such as limiting the number of writes where possible. But if you have the
 storage requirements that necessitate the need of NAND flash, you should have
 the RAM to match and just use an FTL or flash filesystem.
 ## Summary
 So, to summarize:
 1. The littlefs is composed of directory blocks
 2. Each directory is a linked-list of metadata pairs
 3. These metadata pairs can be updated atomically by alternative which
   metadata block is active
 4. Directory blocks contain either references to other directories or files
 5. Files are represented by copy-on-write CTZ linked-lists
 6. The CTZ linked-lists support appending in O(1) and reading in O(n logn)
 7. Blocks are allocated by scanning the filesystem for used blocks in a
   fixed-size lookahead region is that stored in a bit-vector
 8. To facilitate scanning the filesystem, all directories are part of a
   linked-list that is threaded through the entire filesystem
 9. If a block develops an error, the littlefs allocates a new block, and
   moves the data and references of the old block to the new.
 10. Any case where an atomic operation is not possible, it is taken care of
   by a deorphan step that occurs on the first allocation after boot
 Welp, that's the little filesystem. Thanks for reading!
--- a/README.md
+++ b/README.md
@@ -1,23 +1,33 @@
 ## The little filesystem
-A little fail-safe filesystem designed for low ram/rom footprint.
+A little fail-safe filesystem designed for embedded systems.
-**Fail-safe** - The littlefs is designed to work consistently with random power
+```
-failures. During filesystem operations the storage on disk is always kept
+   | | |     .---._____
-in a valid state. The filesystem also has strong copy-on-write garuntees.
+  .-----.   |          |
 --|o    |---| littlefs |
 --|     |---|          |
  '-----'   '----------'
   | | |
 ```
 **Fail-safe** - The littlefs is designed to work consistently with random
 power failures. During filesystem operations the storage on disk is always
 kept in a valid state. The filesystem also has strong copy-on-write garuntees.
 When updating a file, the original file will remain unmodified until the
 file is closed, or sync is called.
-**Handles bad blocks** - While the littlefs does not implement static wear
+**Wear awareness** - While the littlefs does not implement static wear
-leveling, if the underlying block device reports write errors, the littlefs
+leveling, the littlefs takes into account write errors reported by the
-uses a form of dynamic wear leveling to manage blocks that go bad during
+underlying block device and uses a limited form of dynamic wear leveling
-the lifetime of the filesystem.
+to manage blocks that go bad during the lifetime of the filesystem.
-**Constrained memory** - The littlefs is designed to work in bounded memory,
+**Bounded ram/rom** - The littlefs is designed to work in a
-recursion is avoided, and dynamic memory is kept to a minimum. The littlefs
+limited amount of memory, recursion is avoided, and dynamic memory is kept
-allocates two fixed-size buffers for general operations, and one fixed-size
+to a minimum. The littlefs allocates two fixed-size buffers for general
-buffer per file. If there is only ever one file in use, these buffers can be
+operations, and one fixed-size buffer per file. If there is only ever one file
-provided statically.
+in use, all memory can be provided statically and the littlefs can be used
 in a system without dynamic memory.
 ## Example
@@ -74,7 +84,7 @@ int main(void) {
    // remember the storage is not updated until the file is closed successfully
    lfs_file_close(&lfs, &file);
-    // release and resources we were using
+    // release any resources we were using
    lfs_unmount(&lfs);
 }
 ```
@@ -113,6 +123,14 @@ long as the machines involved share endianness and don't have really
 strange padding requirements. If the question does come up, the littlefs
 metadata should be stored on disk in little-endian format.
 ## Design
 the littlefs was developed with the goal of learning more about filesystem
 design by tackling the relative unsolved problem of managing a robust
 filesystem resilient to power loss on devices with limited RAM and ROM.
 More detail on the solutions and tradeoffs incorporated into this filesystem
 can be found in [DESIGN.md](DESIGN.md).
 ## Testing
 The littlefs comes with a test suite designed to run on a pc using the