summaryrefslogtreecommitdiff
path: root/_drafts/program-structure-and-composability.md
diff options
context:
space:
mode:
authorBrian Picciano <mediocregopher@gmail.com>2019-08-02 19:20:31 -0400
committerBrian Picciano <mediocregopher@gmail.com>2019-08-02 19:20:31 -0400
commit6ae2ab2e708607070779446124c2b70581fefbef (patch)
tree53eee0b978b95126195ec7d187f02abda0fa8055 /_drafts/program-structure-and-composability.md
parent7f9d4504e1bde949ba202940541d93b5930685fa (diff)
soft-launch the program structure post
Diffstat (limited to '_drafts/program-structure-and-composability.md')
-rw-r--r--_drafts/program-structure-and-composability.md581
1 files changed, 0 insertions, 581 deletions
diff --git a/_drafts/program-structure-and-composability.md b/_drafts/program-structure-and-composability.md
deleted file mode 100644
index ef04300..0000000
--- a/_drafts/program-structure-and-composability.md
+++ /dev/null
@@ -1,581 +0,0 @@
----
-title: >-
- Program Structure and Composability
-description: >-
- Discussing the nature of program structure, the problems presented by
- complex structures, and a pattern which helps in solving those problems.
----
-
-## Part 0: Introduction
-
-This post is focused on a concept I call "program structure", which I will try
-to shed some light on before discussing complex program structures, then
-discussing why complex structures can be problematic to deal with, and finally
-discussing a pattern for dealing with those problems.
-
-My background is as a backend engineer working on large projects that have had
-many moving parts; most had multiple programs interacting with each other, using
-many different databases in various contexts, and facing large amounts of load
-from millions of users. Most of this post will be framed from my perspective,
-and will present problems in the way I have experienced them. I believe,
-however, that the concepts and problems I discuss here are applicable to many
-other domains, and I hope those with a foot in both backend systems and a second
-domain can help to translate the ideas between the two.
-
-Also note that I will be using Go as my example language, but none of the
-concepts discussed here are specific to Go. To that end, I've decided to favor
-readable code over "correct" code, and so have elided things that most gophers
-hold near-and-dear, such as error checking and proper documentation, in order to
-make the code as accessible as possible to non-gophers as well. As with before,
-I trust someone with a foot in Go and another language can translate help me
-translate between the two.
-
-## Part 1: Program Structure
-
-In this section I will discuss the difference between directory and program
-structure, show how global state is antithetical to compartmentalization (and
-therefore good program structure), and finally discuss a more effective way to
-think about program structure.
-
-### Directory Structure
-
-For a long time I thought about program structure in terms of the hierarchy
-present in the filesystem. In my mind, a program's structure looked like this:
-
-```
-// The directory structure of a project called gobdns.
-src/
- config/
- dns/
- http/
- ips/
- persist/
- repl/
- snapshot/
- main.go
-```
-
-What I grew to learn was that this conflation of "program structure" with
-"directory structure" is ultimately unhelpful. While can't be denied that every
-program has a directory structure (and if not, it ought to), this does not mean
-that the way the program looks in a filesystem in any way corresponds to how it
-looks in our mind's eye.
-
-The most notable way to show this is to consider a library package. Here is the
-structure of a simple web-app which uses redis (my favorite database) as a
-backend:
-
-```
-src/
- redis/
- http/
- main.go
-```
-
-If I were to ask you, based on that directory strucure, what the program does,
-in the most abstract terms, you might say something like: "The program
-establishes an http server which listens for requests. It also establishes a
-connection to the redis server. The program then interacts with redis in
-different ways, based on the http requests which are received on the server."
-
-And that would be a good guess. Here's a diagram which depicts the program
-structure, wherein the root node, `main.go`, takes in requests from `http` and
-processes them using `redis`.
-
-{% include image.html
- dir="program-structure" file="diag1.jpg" width=519
- descr="Example 1"
- %}
-
-This is certainly a viable guess for how a program with that directory structure
-operates, but consider another answer: "A component of the program called
-`server` establishes an http server which listens for requests. `server` also
-establishes a connection to a redis server. `server` then interacts with that
-redis connection in different ways, based on the http requests which are
-received on the http server. Additionally, `server` tracks statistics about
-these interactions and makes them available to other components. The root
-component of the program establishes a connection to a second redis server, and
-stores those statistics in that redis server." Here's another diagram to depict
-_that_ program.
-
-{% include image.html
- dir="program-structure" file="diag2.jpg" width=712
- descr="Example 2"
- %}
-
-The directory structure could apply to either description; `redis` is just a
-library which allows for interacting with a redis server, but it doesn't specify
-_which_ server, or _how many_. And those are extremely important factors which
-are definitely reflected in our concept of the program's structure, and yet not
-in the directory structure. **What the directory structure reflects are the
-different _kinds_ of components available to use, but it does not reflect how a
-program will use those components.**
-
-### Global State vs Compartmentalization
-
-The directory-centric view of structure often leads to the use of global
-singletons to manage access to external resources like RPC servers and
-databases. In examples 1 and 2 the `redis` library might contain code which
-looks something like:
-
-```go
-// A mapping of connection names to redis connections.
-var globalConns = map[string]*RedisConn{}
-
-func Get(name string) *RedisConn {
- if globalConns[name] == nil {
- globalConns[name] = makeRedisConnection(name)
- }
- return globalConns[name]
-}
-```
-
-Even though this pattern would work, it breaks with our conception of the
-program structure in more complexes cases like example 2. Rather than the
-`redis` component being owned by the `server` component, which actually uses it,
-it would be practically owned by _all_ components, since all are able to use it.
-Compartmentalization has been broken, and can only be held together through
-sheer human discipline.
-
-**This is the problem with all global state. It's shareable amongst all components
-of a program, and so is accountable to none of them.** One must look at an
-entire codebase to understand how a globally held component is used, which might
-not even be possible for a large codebase. And so the maintainers of these
-shared components rely entirely on the discipline of their fellow coders when
-making changes, usually discovering where that discipline broke down once the
-changes have been pushed live.
-
-Global state also makes it easier for disparate programs/components to share
-datastores for completely unrelated tasks. In example 2, rather than creating a
-new redis instance for the root component's statistics storage, the coder might
-have instead said "well, there's already a redis instance available, I'll just
-use that." And so compartmentalization would have been broken further. Perhaps
-the two instances _could_ be coalesced into the same one, for the sake of
-resource efficiency, but that decision would be better made at runtime via the
-configuration of the program, rather than being hardcoded into the code.
-
-From the perspective of team management, global state-based patterns do nothing
-except slow teams down. The person/team responsible for maintaining the central
-library in which shared components live (`redis`, in the above examples) becomes
-the bottleneck for creating new instances for new components, which will further
-lead to re-using existing instances rather than creating new ones, further
-breaking compartmentalization. The person/team responsible for the central
-library often finds themselves as the maintainers of the shared resource as
-well, rather than the team actually using it.
-
-### Component Structure
-
-So what does proper program structure look like? In my mind the structure of a
-program is a hierarchy of components, or, in other words, a tree. The leaf nodes
-of the tree are almost _always_ IO related components, e.g. database
-connections, RPC server frameworks or clients, message queue consumers, etc...
-The non-leaf nodes will _generally_ be components which bring together the
-functionalities of their children in some useful way, though they may also have
-some IO functionality of their own.
-
-Let's look at an even more complex structure, still only using the `redis` and
-`http` component types:
-
-{% include image.html
- dir="program-structure" file="diag3.jpg" width=729
- descr="Example 3"
- %}
-
-This component structure contains the addition of the `debug` component. Clearly
-the `http` and `redis` components are reusable in different contexts, but for
-this example the `debug` endpoint is as well. It creates a separate http server
-which can be queried to perform runtime debugging of the program, and can be
-tacked onto virtually any program. The `rest-api` component is specific to this
-program and therefore not reusable. Let's dive into it a bit to see how it might
-be implemented:
-
-```go
-// RestAPI is very much not thread-safe, hopefully it doesn't have to handle
-// more than one request at once.
-type RestAPI struct {
- redisConn *redis.RedisConn
- httpSrv *http.Server
-
- // Statistics exported for other components to see
- RequestCount int
- FooRequestCount int
- BarRequestCount int
-}
-
-func NewRestAPI() *RestAPI {
- r := new(RestAPI)
- r.redisConn := redis.NewConn("127.0.0.1:6379")
-
- // mux will route requests to different handlers based on their URL path.
- mux := http.NewServeMux()
- mux.HandleFunc("/foo", r.fooHandler)
- mux.HandleFunc("/bar", r.barHandler)
- r.httpSrv := http.NewServer(mux)
-
- // Listen for requests and serve them in the background.
- go r.httpSrv.Listen(":8000")
-
- return r
-}
-
-func (r *RestAPI) fooHandler(rw http.ResponseWriter, r *http.Request) {
- r.redisConn.Command("INCR", "fooKey")
- r.RequestCount++
- r.FooRequestCount++
-}
-
-func (r *RestAPI) barHandler(rw http.ResponseWriter, r *http.Request) {
- r.redisConn.Command("INCR", "barKey")
- r.RequestCount++
- r.BarRequestCount++
-}
-```
-
-As can be seen, `rest-api` coalesces `http` and `redis` into a simple REST-like
-api, using pre-made library components. `main.go`, the root component, does much
-the same:
-
-```go
-func main() {
- // Create debug server and start listening in the background
- debugSrv := debug.NewServer()
-
- // Set up the RestAPI, this will automatically start listening
- restAPI := NewRestAPI()
-
- // Create another redis connection and use it to store statistics
- statsRedisConn := redis.NewConn("127.0.0.1:6380")
- for {
- time.Sleep(1 * time.Second)
- statsRedisConn.Command("SET", "numReqs", restAPI.RequestCount)
- statsRedisConn.Command("SET", "numFooReqs", restAPI.FooRequestCount)
- statsRedisConn.Command("SET", "numBarReqs", restAPI.BarRequestCount)
- }
-}
-```
-
-One thing which is clearly missing in this program is proper configuration,
-whether from command-line, environment variables, etc.... As it stands, all
-configuration parameters, such as the redis addresses and http listen addresses,
-are hardcoded. Proper configuration actually ends up being somewhat difficult,
-as the ideal case would be for each component to set up its own configuration
-variables, without its parent needing to be aware. For example, `redis` could
-set up `addr` and `pool-size` parameters. The problem is that there are two
-`redis` components in the program, and their parameters would therefore conflict
-with each other. An elegant solution to this problem is discussed in the next
-section.
-
-## Part 2: Components, Configuration, and Runtime
-
-The key to the configuration problem is to recognize that, even if there are two
-of the same component in a program, they can't occupy the same place in the
-program's structure. In the above example there are two `http` components, one
-under `rest-api` and the other under `debug`. Since the structure is represented
-as a tree of components, the "path" of any node in the tree uniquely represents
-it in the structure. For example, the two `http` components in the previous
-example have these paths:
-
-```
-root -> rest-api -> http
-root -> debug -> http
-```
-
-If each component were to know its place in the component tree, then it would
-easily be able to ensure that its configuration and initialization didn't
-conflict with other components of the same type. If the `http` component sets up
-a command-line parameter to know what address to listen on, the two `http`
-components in that program would set up:
-
-```
---rest-api-listen-addr
---debug-listen-addr
-```
-
-So how can we enable each component to know its path in the component structure?
-To answer this we'll have to take a detour through a type, called `Component`.
-
-### Component and Configuration
-
-The `Component` type is a made up type (though you'll be able to find an
-implementation of it at the end of this post). It has a single primary purpose,
-and that is to convey the program's structure to new components.
-
-To see how this is done, let's look at a couple of `Component`'s methods:
-
-```go
-// Package mcmp
-
-// New returns a new Component which has no parents or children. It is therefore
-// the root component of a component hierarchy.
-func New() *Component
-
-// Child returns a new child of the called upon Component.
-func (*Component) Child(name string) *Component
-
-// Path returns the Component's path in the component hierarchy. It will return
-// an empty slice if the Component is the root component.
-func (*Component) Path() []string
-```
-
-
-`Child` is used to create a new `Component`, corresponding to a new child node
-in the component structure, and `Path` is used retrieve the path of any
-`Component` within that structure. For the sake of keeping the examples simple
-let's pretend these functions have been implemented in a package called `mcmp`.
-Here's an example of how `Component` might be used in the `redis` component's
-code:
-
-```go
-// Package redis
-
-func NewConn(cmp *mcmp.Component, defaultAddr string) *RedisConn {
- cmp = cmp.Child("redis")
- paramPrefix := strings.Join(cmp.Path(), "-")
-
- addrParam := flag.String(paramPrefix+"-addr", defaultAddr, "Address of redis instance to connect to")
- // finish setup
-
- return redisConn
-}
-```
-
-In our above example, the two `redis` components' parameters would be:
-
-```
-// This first parameter is for the stats redis, whose parent is the root and
-// therefore doesn't have a prefix. Perhaps stats should be broken into its own
-// component in order to fix this.
---redis-addr
---rest-api-redis-addr
-```
-
-`Component` definitely makes it easier to instantiate multiple redis components
-in our program, since it allows them to know their place in the component
-structure.
-
-Having to construct the prefix for the parameters ourselves is pretty annoying
-though, so let's introduce a new package, `mcfg`, which acts like `flag` but is
-aware of `Component`. Then `redis.NewConn` is reduced to:
-
-```go
-// Package redis
-
-func NewConn(cmp *mcmp.Component, defaultAddr string) *RedisConn {
- cmp = cmp.Child("redis")
- addrParam := flag.String(cmp, "-addr", defaultAddr, "Address of redis instance to connect to")
- // finish setup
-
- return redisConn
-}
-```
-
-Easy-peazy.
-
-#### But What About Parse?
-
-Sharp-eyed gophers will notice that there's a key piece missing: When is
-`flag.Parse`, or its `mcfg` counterpart, called? When does `addrParam` actually
-get populated? You can't use the redis connection until that happens, but that
-can't happen inside `redis.NewConn` because there might be other components
-after `redis.NewConn` which want to set up parameters. To illustrate the
-problem, let's look at a simple program which wants to set up two `redis`
-components:
-
-```go
-func main() {
- // Create the root Component, an empty Component.
- cmp := mcmp.New()
-
- // Create the Components for two sub-components, foo and bar.
- cmpFoo := cmp.Child("foo")
- cmpBar := cmp.Child("bar")
-
- // Now we want to try to create a redis sub-component for each component.
-
- // This will set up the parameter "--foo-redis-addr", but bar hasn't had a
- // chance to set up its corresponding parameter, so the command-line can't
- // be parsed yet.
- fooRedis := redis.NewConn(cmpFoo, "127.0.0.1:6379")
-
- // This will set up the parameter "--bar-redis-addr", but, as mentioned
- // before, redis.NewConn can't parse command-line.
- barRedis := redis.NewConn(cmpBar, "127.0.0.1:6379")
-
- // If the command-line is parsed here, then how can fooRedis and barRedis
- // have been created yet? It's only _after_ this point that `fooRedis` and
- // `barRedis` could possibly be usable.
- mcfg.Parse()
-}
-```
-
-We will solve this problem in the next section.
-
-### Instantiation vs Initialization
-
-Let's break down `redis.NewConn` into two phases: instantiation and initialization.
-Instantiation refers to creating the component on the component structure and
-having it declare what it needs in order to initialize (e.g. configuration
-parameters). During instantiation nothing external to the program is performed;
-no IO, no reading of the command-line, no logging, etc... All that's happened is
-that the empty template of a `redis` component has been created.
-
-Initialization is the phase when that template is filled in. Configuration
-parameters are read, startup actions like the creation of database connections
-are performed, and logging is output for informational and debugging purposes.
-
-The key to making effective use of this dichotemy is to allow _all_ components
-to instantiate themselves before they initialize themselves. By doing this we
-can ensure that, for example, all components have had the chance to declare
-their configuration parameters before configuration parsing is done.
-
-So let's modify `redis.NewConn` so that it follows this dichotemy. It makes
-sense to leave instantiation related code where it is, but we need a mechanism
-by which we can declare initialization code before actually calling it. For
-this, I will introduce the idea of a "hook".
-
-#### But First: Augment Component
-
-In order to support hooks, however, `Component` will need to be augmented with
-a few new methods. Right now it can only carry with it information about the
-component structure, but here we will add the ability to carry arbitrary
-key/value information as well:
-
-```go
-// Package mcmp
-
-// SetValue sets the given key to the given value on the Component, overwriting
-// any previous value for that key.
-func (*Component) SetValue(key, value interface{})
-
-// Value returns the value which has been set for the given key, or nil if the
-// key was never set.
-func (*Component) Value(key interface{}) interface{}
-
-// Children returns the Component's children in the order they were created.
-func (*Component) Children() []*Component
-```
-
-The final method allows us to, starting at the root `Component`, traverse the
-component structure, interacting with each `Component`'s key/value store. This
-will be useful for implementing hooks.
-
-#### Hooks
-
-A hook is, simply a function which will run later. We will declare a new
-package, calling it `mrun`, and say that it has two new functions:
-
-```go
-// Package mrun
-
-// InitHook registers the given hook to the given Component.
-func InitHook(cmp *mcmp.Component, hook func())
-
-// Init runs all hooks registered using InitHook. Hooks are run in the order
-// they were registered.
-func Init(cmp *mcmp.Component)
-```
-
-With these two functions we are able to defer the initialization phase of
-startup by using the same `Component`s we were passing around for the purpose of
-denoting component structure.
-
-Now, with these few extra pieces of functionality in place, let's reconsider the
-most recent example, and make a program which creates two redis components which
-exist independently of each other:
-
-```go
-// Package redis
-
-// NOTE that NewConn has been renamed to InstConn, to reflect that the returned
-// *RedisConn is merely instantiated, not initialized.
-
-func InstConn(cmp *mcmp.Component, defaultAddr string) *RedisConn {
- cmp = cmp.Child("redis")
-
- // we instantiate an empty RedisConn instance and parameters for it. Neither
- // has been initialized yet. They will remain empty until initialization has
- // occurred.
- redisConn := new(RedisConn)
- addrParam := mcfg.String(cmp, "-addr", defaultAddr, "Address of redis instance to connect to")
-
- mrun.InitHook(cmp, func() {
- // This hook will run after parameter initialization has happened, and
- // so addrParam will be usable. Once this hook as run, redisConn will be
- // usable as well.
- *redisConn = makeRedisConnection(*addrParam)
- })
-
- // Now that cmp has had configuration parameters and intialization hooks
- // set into it, return the empty redisConn instance back to the parent.
- return redisConn
-}
-```
-
-```go
-// Package main
-
-func main() {
- // Create the root Component, an empty Component.
- cmp := mcmp.New()
-
- // Create the Components for two sub-components, foo and bar.
- cmpFoo := cmp.Child("foo")
- cmpBar := cmp.Child("bar")
-
- // Add redis components to each of the foo and bar sub-components.
- redisFoo := redis.InstConn(cmpFoo, "127.0.0.1:6379")
- redisBar := redis.InstConn(cmpBar, "127.0.0.1:6379")
-
- // Parse will descend into the Component and all of its children,
- // discovering all registered configuration parameters and filling them from
- // the command-line.
- mcfg.Parse(cmp)
-
- // Now that configuration parameters have been initialized, run the Init
- // hooks for all Components.
- mrun.Init(cmp)
-
- // At this point the redis components have been fully initialized and may be
- // used. For this example we'll copy all keys from one to the other.
- keys := redisFoo.Command("KEYS", "*")
- for i := range keys {
- val := redisFoo.Command("GET", keys[i])
- redisBar.Command("SET", keys[i], val)
- }
-}
-```
-
-## Conclusion
-
-While the examples given here are fairly simplistic, the pattern itself is quite
-powerful. Codebases naturally accumulate small, domain specific behaviors and
-optimizations over time, especially around the IO components of the program.
-Databases are used with specific options that an organization finds useful,
-logging is performed in particular places, metrics are counted around certain
-pieces of code, etc...
-
-By programming with component structure in mind we are able to keep these
-optimizations while also keeping the clarity and compartmentalization of the
-code in-tact. We are able to keep our code flexible and configurable, while also
-re-usable and testable. And the simplicity of the tools involved means it can be
-extended and retrofitted for nearly any situation or use-case.
-
-Overall, it's a powerful pattern that I've found myself unable to do without
-once I began using it.
-
-### Implementation
-
-As a final note, you can find an example implementation of the packages
-described in this post here:
-
-* [mcmp](https://godoc.org/github.com/mediocregopher/mediocre-go-lib/mcmp)
-* [mcfg](https://godoc.org/github.com/mediocregopher/mediocre-go-lib/mcfg)
-* [mrun](https://godoc.org/github.com/mediocregopher/mediocre-go-lib/mrun)
-
-The packages are not stable and are likely to change frequently. You'll also
-find that they have been extended quite a bit from the simple descriptions found
-here, based on what I've found useful as I've implemented programs using
-component structures. With these two points in mind, I would encourage you to
-look in and take whatever functionality you find useful for yourself, and not
-use the packages directly. The core pieces are not different from what has been
-described in this post.