summaryrefslogtreecommitdiff
path: root/static/src/_posts/2022-01-23-the-cryptic-filesystem.md
diff options
context:
space:
mode:
Diffstat (limited to 'static/src/_posts/2022-01-23-the-cryptic-filesystem.md')
-rw-r--r--static/src/_posts/2022-01-23-the-cryptic-filesystem.md258
1 files changed, 0 insertions, 258 deletions
diff --git a/static/src/_posts/2022-01-23-the-cryptic-filesystem.md b/static/src/_posts/2022-01-23-the-cryptic-filesystem.md
deleted file mode 100644
index 1c4138b..0000000
--- a/static/src/_posts/2022-01-23-the-cryptic-filesystem.md
+++ /dev/null
@@ -1,258 +0,0 @@
----
-title: >-
- The Cryptic Filesystem
-description: >-
- Hey, I'm brainstorming here!
-series: nebula
-tags: tech
----
-
-Presently the cryptic-net project has two components: a VPN layer (implemented
-using [nebula][nebula], and DNS component which makes communicating across that
-VPN a bit nicer. All of this is wrapped up in a nice bow using an AppImage and a
-simple process manager. The foundation is laid for adding the next major
-component: a filesystem layer.
-
-I've done a lot of research and talking about this layer, and you can see past
-posts in this series talking about it. Unfortunately, I haven't really made much
-progress on a solution. It really feels like there's nothing out there already
-implemented, and we're going to have to do it from scratch.
-
-To briefly recap the general requirements of the cryptic network filesystem
-(cryptic-fs), it must have:
-
-* Sharding of the fs dataset, so each node doesn't need to persist the full
- dataset.
-
-* Replication factor (RF), so each piece of content must be persisted by at
- least N nodes of the clusters.
-
-* Nodes are expected to be semi-permanent. They are expected to be in it for the
- long-haul, but they also may flit in and out of existence frequently.
-
-* Each cryptic-fs process should be able to track multiple independent
- filesystems, with each node in the cluster not necessarily tracking the same
- set of filesystems as the others.
-
-This post is going to be a very high-level design document for what, in my head,
-is the ideal implementation of cryptic-fs. _If_ cryptic-fs is ever actually
-implemented it will very likely differ from this document in major ways, but one
-must start somewhere.
-
-[nebula]: https://github.com/slackhq/nebula
-
-## Merkle DAG
-
-It wouldn't be a modern network filesystem project if there wasn't a [Merkle
-DAG][mdag]. The minutia of how a Merkle DAG works isn't super important here,
-the important bits are:
-
-* Each file is represented by a content identifier (CID), which is essentially a
- consistent hash of the file's contents.
-
-* Each directory is also represented by a CID which is generated by hashing the
- CIDs of the directory's files and their metadata.
-
-* Since the root of the filesystem is itself a directory, the entire filesystem
- can be represented by a single CID. By tracking the changing root CID all
- hosts participating in the network filesystem can cheaply identify the latest
- state of the entire filesystem.
-
-A storage system for a Merkle DAG is implemented as a key-value store which maps
-CID to directory node or file contents. When nodes in the cluster communicate
-about data in the filesystem they will do so using these CIDs; one node might
-ask the other "can you give me CID `AAA`", and the other would respond with the
-contents of `AAA` without really caring about whether or not that CID points to
-a file or directory node or whatever. It's quite a simple system.
-
-As far as actual implementation of the storage component, it's very likely we
-could re-use some part of the IPFS code-base rather than implementing this from
-scratch.
-
-[mdag]: https://docs.ipfs.io/concepts/merkle-dag/
-
-## Consensus
-
-The cluster of nodes needs to (roughly) agree on some things in order to
-function:
-
-* What the current root CID of the filesystem is.
-
-* Which nodes have which CIDs persisted.
-
-These are all things which can change rapidly, and which _every_ node in the
-cluster will need to stay up-to-date on. On the other hand, given efficient use
-of the boolean tagged CIDs mentioned in the previous section, this is a dataset
-which could easily fit in memory even for large filesystems.
-
-I've done a bunch of research here and I'm having trouble finding anything
-existing which fits the bill. Most databases expect the set of nodes to be
-pretty constant, so that eliminates most of them. Here's a couple of other ideas
-I spitballed:
-
-* Taking advantage of the already written [go-ds-crdt][crdt] package which the
- [IPFS Cluster][ipfscluster] project uses. My biggest concern with this
- project, however, is that the entire history of the CRDT must be stored on
- each node, which in our use-case could be a very long history.
-
-* Just saying fuck it and using a giant redis replica-set, where each node in
- the cluster is a replica and one node is chosen to be the primary. [Redis
- sentinel][sentinel] could be used to decide the current primary. The issue is
- that I don't think sentinel is designed to handle hundreds or thousands of
- nodes, which places a ceiling on cluster capacity. I'm also not confident that
- the primary node could handle hundreds/thousands of replicas syncing from it
- nicely; that's not something Redis likes to do.
-
-* Using a blockchain engine like [Tendermint][tendermint] to implement a custom,
- private blockchain for the cluster. This could work performance-wise, but I
- think it would suffer from the same issue as CRDT.
-
-It seems to me like some kind of WAN-optimized gossip protocol would be the
-solution here. Each node already knows which CIDs it itself has persisted, so
-what's left is for all nodes to agree on the latest root CID, and to coordinate
-who is going to store what long-term.
-
-[crdt]: https://github.com/ipfs/go-ds-crdt
-[ipfscluster]: https://cluster.ipfs.io/
-[sentinel]: https://redis.io/topics/sentinel
-[tendermint]: https://tendermint.com/
-
-### Gossip
-
-The [gossipsub][gossipsub] library which is built into libp2p seems like a good
-starting place. It's optimized for WANs and, crucially, is already implemented.
-
-Gossipsub makes use of different topics, onto which peers in the cluster can
-publish messages which other peers who are subscribed to those topics will
-receive. It makes sense to have a topic-per-filesystem (remember, from the
-original requirements, that there can be multiple filesystems being tracked), so
-that each node in the cluster can choose for itself which filesystems it cares
-to track.
-
-The messages which can get published will be dependent on the different
-situations in which nodes will want to communicate, so it's worth enumerating
-those.
-
-**Situation #1: Node A wants to obtain a CID**: Node A will send out a
-`WHO_HAS:<CID>` message (not the actual syntax) to the topic. Node B (and
-possibly others), which has the CID persisted, will respond with `I_HAVE:<CID>`.
-The response will be sent directly from B to A, not broadcast over the topic,
-since only A cares. The timing of B's response to A could be subject to a delay
-based on B's current load, such that another less loaded node might get its
-response in first.
-
-From here node A would initiate a download of the CID from B via a direct
-connection. If node A has enough space then it will persist the contents of the
-CID for the future.
-
-This situation could arise because the user has opened a file in the filesystem
-for reading, or has attempted to enumerate the contents of a directory, and the
-local storage doesn't already contain that CID.
-
-**Situation #2: Node A wants to delete a CID which it has persisted**: Similar
-to #1, Node A needs to first ensure that other nodes have the CID persisted, in
-order to maintain the RF across the filesystem. So node A first sends out a
-`WHO_HAS:<CID>` message. If >=RF nodes respond with `I_HAVE:<CID>` then node A
-can delete the CID from its storage without concern. Otherwise it should not
-delete the CID.
-
-**Situation #2a: Node A wants to delete a CID which it has persisted, and which
-is not part of the current filesystem**: If the filesystem is in a state where
-the CID in question is no longer present in the system, then node A doesn't need
-to care about the RF and therefore doesn't need to send any messages.
-
-**Situation #3: Node A wants to update the filesystem root CID**: This is as
-simple as sending out a `ROOT:<CID>` message on the topic. Other nodes will
-receive this and note the new root.
-
-**Situation #4: Node A wants to know the current filesystem root CID**: Node A
-sends out a `ROOT?` message. Other nodes will respond to node A directly telling
-it the current root CID.
-
-These describe the circumstances around the messages used across the gossip
-protocol in a very shallow way. In order to properly flesh out the behavior of
-the consistency mechanism we need to dive in a bit more.
-
-### Optimizations, Replication, and GC
-
-A key optimization worth hitting straight away is to declare that each node will
-always immediately persist all directory CIDs whenever a `ROOT:<CID>` message is
-received. This will _generally_ only involve a couple of round-trips with the
-host which issued the `ROOT:<CID>` message, with opportunity for
-parallelization.
-
-This could be a problem if the directory structure becomes _huge_, at which
-point it might be worth placing some kind of limit on what percent of storage is
-allowed for directory nodes. But really... just have less directories people!
-
-The next thing to dive in on is replication. We've already covered in situation
- #1 what happens if a user specifically requests a file. But that's not enough
-to ensure the RF of the entire filesystem, as some files might not be requested
-by any users except the original user to add the file.
-
-We can note that each node knows when a file has been added to the filesystem,
-thanks to each node knowing the full directory tree. So upon seeing that a new
-file has been added, a node can issue a `WHO_HAS:<CID>` message for it, and if
-less than RF nodes respond then it can persist the CID. This is all assuming
-that the node has enough space for the new file.
-
-One wrinkle in that plan is that we don't want all nodes to send the
-`WHO_HAS:<CID>` at the same time for the same CID, otherwise they'll all end up
-downloading the CID and over-replicating it. A solution here is for each node to
-delay it's `WHO_HAS:<CID>` based on how much space it has left for storage, so
-nodes with more free space are more eager to pull in new files.
-
-Additionally, we want to have nodes periodically check the replication status of
-each CID in the filesystem. This is because nodes might pop in and out of
-existence randomly, and the cluster needs to account for that. The way this can
-work is that each node periodically picks a CID at random and checks the
-replication status of it. If the period between checks is calculated as being
-based on number of online nodes in the cluster and the number of CIDs which can
-be checked, then it can be assured that all CIDs will be checked within a
-reasonable amount of time with minimal overhead.
-
-This dovetails nicely with garbage collection. Given that nodes can flit in and
-out of existence, a node might come back from having been down for a time, and
-all CIDs it had persisted would then be over-replicated. So the same process
-which is checking for under-replicated files will also be checking for
-over-replicated files.
-
-### Limitations
-
-This consistency mechanism has a lot of nice properties: it's eventually
-consistent, it nicely handles nodes coming in and out of existence without any
-coordination between the nodes, and it _should_ be pretty fast for most cases.
-However, it has its downsides.
-
-There's definitely room for inconsistency between each node's view of the
-filesystem, especially when it comes to the `ROOT:<CID>` messages. If two nodes
-issue `ROOT:<CID>` messages at the same time then it's extremely likely nodes
-will have a split view of the filesystem, and there's not a great way to
-resolve this until another change is made on another node. This is probably the
-weakest point of the whole design.
-
-[gossipsub]: https://github.com/libp2p/specs/tree/master/pubsub/gossipsub
-
-## FUSE
-
-The final piece is the FUSE connector for the filesystem, which is how users
-actually interact with each filesystem being tracked by their node. This is
-actually the easiest component, if we use an idea borrowed from
-[Tahoe-LAFS][tahoe], cryptic-fs can expose an SFTP endpoint and that's it.
-
-The idea is that hooking up an existing SFTP implementation to the rest of
-cryptic-fs should be pretty straightforward, and then every OS should already
-have some kind of mount-SFTP-as-FUSE mechanism already, either built into it or
-as an existing application. Exposing an SFTP endpoint also allows a user to
-access the cryptic-fs remotely if they want to.
-
-[tahoe]: https://tahoe-lafs.org/trac/tahoe-lafs
-
-## Ok
-
-So all that said, clearly the hard part is the consistency mechanism. It's not
-even fully developed in this document, but it's almost there. The next step,
-beyond polishing up the consistency mechanism, is going to be roughly figuring
-out all the interfaces and types involved in the implementation, planning out
-how those will all interact with each other, and then finally an actual
-implementation!