diff options
-rw-r--r-- | static/src/_posts/2021-12-31-ginger-its-alive.md | 422 |
1 files changed, 422 insertions, 0 deletions
diff --git a/static/src/_posts/2021-12-31-ginger-its-alive.md b/static/src/_posts/2021-12-31-ginger-its-alive.md new file mode 100644 index 0000000..efd6bd0 --- /dev/null +++ b/static/src/_posts/2021-12-31-ginger-its-alive.md @@ -0,0 +1,422 @@ +--- +title: >- + Ginger: It's Alive! +description: >- + The new best language for computing fibonacci numbers. +series: ginger +tags: tech +--- + +As a kind of Christmas present to myself I took a whole week off of work +specifically to dedicate myself to working on ginger. + +My concrete goal was to be able to run a ginger program to compute any Nth +fibonacci number, a goal I chose because it would require the implementation of +conditionals, some kind of looping or recursion, and basic addition/subtraction. +In other words, it would require all the elements which comprise a Turing +complete language. + +And you know what? I actually succeeded! + +The implementation can be found [here][impl]. At this point ginger is an +interpreted language running in a golang-based VM. The dream is for it to be +self-hosted on LLVM (and other platforms after), but as an intermediate step to +that I decided on sticking to what I know (golang) rather than having to learn +two things at once. + +In this post I'm going to describe the components of this VM at a high level, +show a quick demo of it working, and finally talk about the roadmap going +forward. + +[impl]: https://github.com/mediocregopher/ginger/tree/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2 + +## Graph + +The core package of the whole project is the [`graph`][graph] package. This +package implements a generic directed graph datastructure. + +The generic part is worth noting; I was able to take advantage of go's new +generics which are currently [in beta][go118]. I'd read quite a bit on how the +generic system would work even before the beta was announced, so I was able to +hit the ground running and start using them without much issue. + +Ginger's unique graph datastructure has been discussed in previous posts in this +series quite a bit, and this latest implementation doesn't deviate much at a +high level. Below are the most up-to-date core datatypes and functions which are +used to construct ginger graphs: + +```go + +// Value is any value which can be stored within a Graph. Values should be +// considered immutable, ie once used with the graph package their internal +// value does not change. +type Value interface { + Equal(Value) bool + String() string +} + +// OpenEdge consists of the edge value (E) and source vertex value (V) of an +// edge in a Graph. When passed into the AddValueIn method a full edge is +// created. An OpenEdge can also be sourced from a tuple vertex, whose value is +// an ordered set of OpenEdges of this same type. +type OpenEdge[E, V Value] struct { ... } + +// ValueOut creates a OpenEdge which, when used to construct a Graph, represents +// an edge (with edgeVal attached to it) coming from the vertex containing val. +func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V] + +// TupleOut creates an OpenEdge which, when used to construct a Graph, +// represents an edge (with edgeVal attached to it) coming from the vertex +// comprised of the given ordered-set of input edges. +func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V] + +// Graph is an immutable container of a set of vertices. The Graph keeps track +// of all Values which terminate an OpenEdge. E indicates the type of edge +// values, while V indicates the type of vertex values. +type Graph[E, V Value] struct { ... } + +// AddValueIn takes a OpenEdge and connects it to the Value vertex containing +// val, returning the new Graph which reflects that connection. +func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V] + +// ValueIns returns, if any, all OpenEdges which lead to the given Value in the +// Graph (ie, all those added via AddValueIn). +func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V] + +``` + +The current `Graph` implementation is _incredibly_ inefficient, it does a lot of +copying, looping, and equality checks which could be optimized out one day. +That's going to be a recurring theme of this post, as I had to perform a +balancing act between actually reaching my goal for the week while not incurring +too much tech debt for myself. + +[graph]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go +[go118]: https://go.dev/blog/go1.18beta1 + +### MapReduce + +There's a final operation I implemented as part of the `graph` package: +[MapReduce][mapreduce]. It's a difficult operation to describe, but I'm going to +do my best in this section for those who are interested. If you don't understand +it, or don't care, just know that `MapReduce` is a generic tool for transforming +graphs. + +For a description of `MapReduce` we need to present an example graph: + +``` + +<--b--- + + \ +X <--a--+<--c----+<--f-- A + + / + + +<---g--- + +<--d--+ + +<---h--- + \ +Y <---------e----------- B +``` + +Plus signs indicate tuples, and lowercase letters are edge values while upper +case letters are vertex values. The pseudo-code to construct this graph in go +might look like: + +```go + g := new(Graph) + + fA := ValueOut("f", "A") + + g = g.AddValueIn( + "X", + TupleOut( + "a", + TupleOut("b", fA), + TupleOut("c", fA), + TupleOut( + "d", + ValueOut("g", "A"), + ValueOut("h", "B"), + ), + ), + ) + + g = g.AddValueIn("e", "B") +``` + +As can be seen in the [code][mapreduce], `MapReduce`'s first argument is an +`OpenEdge`, _not_ a `Graph`. Fundamentally `MapReduce` is a reduction of the +_dependencies_ of a particular value into a new value; to reduce the +dependencies of multiple values at the same time would be equivalent to looping +over those values and calling `MapReduce` on each individually. Having +`MapReduce` only deal with one edge at a time is more flexible. + +So let's focus on a particular `OpenEdge`, the one leading into `X` (returned by +`TupleOut("a", etc...)`. `MapReduce` is going to descend into this `OpenEdge` +recursively, in order to first find all value vertices (ie the leaf vertices, +those without any children of their own). + +At this point `MapReduce` will use its second argument, the `mapVal` function, +which accepts a value of one type and returns a value of another type. This +function is called on each value from every value vertex encountered. In this +case both `A` and `B` are connectable from `X`, so `mapVal` will be called on +each _only once_. This is the case even though `A` is connected to multiple +times (once with an edge value of `f`, another with an edge value of `b`). +`mapVal` only gets called once per vertex, not per connection. + +With all values mapped, `MapReduce` will begin reducing. For each edge leaving +each value vertex, the `reduceEdge` function is called. `reduceEdge` accepts as +arguments the edge value of the edge and the _mapped value_ (not the original +value) of the vertex, and returns a new value of the same type that `mapVal` +returned. Like `mapVal`, `reduceEdge` will only be called once per edge. In our +example, `<--f--A` is used twice (`b` and `c`), but `reduceEdge` will only be +called on it once. + +With each value vertex edge having been reduced, `reduceEdge` is called again on +each edge leaving _those_ edges, which must be tuple edges. An array of the +values returned from the previous `reduceEdge` calls for each of the tuples' +input edges is used as the value argument in the next call. This is done until +the `OpenEdge` is fully reduced into a single value. + +To flesh out our example, let's imagine a `mapVal` which returns the input +string repeated twice, and a `reduceEdge` which returns the input values joined +with the edge value, and then wrapped with the edge value (eg `reduceEdge(a, [B, +C]) -> aBaCa`). + +Calling `MapReduce` on the edge leading into `X` will then give us the following +calls: + +``` +# Map the value vertices + +mapVal(A) -> AA +mapVal(B) -> BB + +# Reduce the value vertex edges + +reduceEdge(f, [AA]) -> fAAf +reduceEdge(g, [AA]) -> gAAg +reduceEdge(h, [BB]) -> hBBh + +# Reduce tuple vertex edges + +reduceEdge(b, [fAAf]) -> bfAAfb +reduceEdge(c, [fAAf]) -> cfAAfc +reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd + +reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda +``` + +Beautiful, exactly what we wanted. + +`MapReduce` will prove extremely useful when it comes time for the VM to execute +the graph. It enables the VM to evaluate only the values which are needed to +produce an output, and to only evaluate each value once no matter how many times +it's used. `MapReduce` also takes care of the recursive traversal of the +`Graph`, which simplifies the VM code significantly. + +[mapreduce]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go#L338 + +## gg + +With a generic graph implementation out of the way, it was then required to +define a specific implementation which could be parsed from a file and later +used for execution in the VM. + +The file extension used for ginger code is `.gg`, as in "ginger graph" (of +course). The package name for decoding this file format is, therefore, also +called `gg`. + +The core datatype for the `gg` package is the [`Value`][ggvalue], since the +`graph` package takes care of essentially everything else in the realm of graph +construction and manipulation. The type definition is: + +```go +// Value represents a value which can be serialized by the gg text format. +type Value struct { + + // Only one of these fields may be set + Name *string + Number *int64 + Graph *Graph + + // Optional fields indicating the token which was used to construct this + // Value, if any. + LexerToken *LexerToken +} + +type Graph = graph.Graph[Value, Value] // type alias for convenience +``` + +Note that it's currently only possible to describe three different types in a +`gg` file, and one of them is the `Graph`! These are the only ones needed to +implement a fibonacci function, so they're all I implemented. + +The lexing/parsing of `gg` files is not super interesting, you can check out the +package code for more details. The only other thing worth noting is that, for +now, all statements are required to end with a `;`. I had originally wanted to +be less strict with this, and allow newlines and other tokens to indicate the +end of statements, but it was complicating the code and I wanted to move on. + +Another small thing worth noting is that I decided to make each entire `.gg` +file implicitly define a graph. So you can imagine each file's contents wrapped +in curly braces. + +With the `gg` package out of the way I was able to finally parse ginger +programs! The following is the actual, real-life implementation of the fibonacci +function (though at this point it didn't actually work, because the VM was still +not implemented: + +``` +out = { + + decr = { out = add < (in; -1;); }; + + n = tupEl < (in; 0;); + a = tupEl < (in; 1;); + b = tupEl < (in; 2;); + + out = if < ( + isZero < n; + a; + recur < ( + decr < n; + b; + add < (a;b;); + ); + ); + +} < (in; 0; 1;); +``` + +[ggvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/gg/gg.go#L14 + +## VM + +Finally, the meat of all this. If the `graph` and `gg` packages are the sturdy, +well constructed foundations of a tall building, then the `vm` package is the +extremely long, flimsy stick someone propped up vertically so they could say +they built a structure of impressive height. + +In other words, it's very likely that the current iteration of the VM will not +be long for this world, and so I won't waste time describing it in super detail. + +What I will say about it is that within the `vm` package I've defined a [new +`Value` type][vmvalue], which extends the one defined in `gg`. The necessity of +this was that there are types which cannot be represented syntactically in a +`.gg` file, but which _can_ be used as values within a program being run. + +The first of these is the `Operation`, which is essentially a first-class +function. The VM will automatically interpret a graph as an `Operation` when it +is used as an edge value, as has been discussed in previous posts, but there are +also built-in operations (like `if` and `recur`) which cannot be represented as +datastructures, and so it was necessary to introduce a new in-memory type to +properly represent operations. + +The second is the `Tuple` type. This may seem strange, as ginger graphs already +have a concept of a tuple. But the ginger graph tuple is a _vertex type_, not a +value type. The distinction is small, but important. Essentially the graph tuple +is a structural element which describes how to create a tuple value, but it is +not yet that value. So we need a new Value type to hold the tuple once it _has_ +been created during runtime. + +Another thing worth describing about the `vm` package, even though I think they +might change drastically, are [`Thunk`s][thunk]: + +```go +// Thunk is returned from the performance of an Operation. When called it will +// return the result of that Operation having been called with the particular +// arguments which were passed in. +type Thunk func() (Value, error) +``` + +The term "thunk" is borrowed from Haskell, which I don't actually know so I'm +probably using it wrong, but anyway... + +A thunk is essentially a value which has yet to be evaluated; the VM knows +exactly _how_ to evaluate it, but it hasn't done so yet. The primary reason for +their existence within ginger is to account for conditionals, ie the `if` +operation. The VM can't evaluate each of an `if`'s arguments all at once, it +must only evaluate the first argument (to obtain a boolean), and then based on +that evaluate the second or third argument. + +This is where `graph.MapReduce` comes in. The VM uses `graph.MapReduce` to +reduce each edge in a graph to a `Thunk`, where the `Thunk`'s value is based on +the operation (the edge's value) and the inputs to the edge (which will +themselves be `Thunk`s). Because each `Thunk` represents a potential value, not +an actual one, the VM is able to completely parse the program to be executed +(using `graph.MapReduce`) while allowing conditionals to still work correctly. + +[EvaluateEdge][evaledge] is where all that happens, if you're interested, but be +warned that the code is a hot mess right now and it's probably not worth +spending a ton of time understanding it as it will change a lot. + +A final thing I'll mention is that the `recur` operation is, I think, broken. Or +probably more accurately, the entire VM is broken in a way which prevents +`recur` from working correctly. It _does_ produce the correct output, so I +haven't prioritized debugging it, but for any large number of iterations it +takes a very long time to run. + +[vmvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/vm.go#L18 +[thunk]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/op.go#L11 +[evaledge]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/scope.go#L29 + +## Demo + +Finally, to show it off! I put together a super stupid `eval` binary which takes +two arguments: a graph to be used as an operation, and a value to be used as an +argument to that operation. It doesn't even read the code from a file, you have +to `cat` it in. + +The [README][readme] documents how to run the demo, so if you'd like to do so +then please clone the repo and give it a shot! It should look like this when you +do: + +``` +# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8 +21 +``` + +You can put any number you like instead of `8`, but as mentioned, `recur` is +broken so it can take a while for larger numbers. + +[readme]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/README.md + +## Next Steps + +The following are all the things I'd like to address the next time I work on +ginger: + +* `gg` + + * Allow for newlines (and `)` and `}`) to terminate statements, not just + `;`. + + * Allow names to have punctuation characters in them (maybe?). + + * Don't read all tokens into memory prior to parsing. + +* `vm` + + * Fix `recur`. + + * Implement tail call optimization. + +* General + + * A bit of polish on the `eval` tool. + + * Expose graph creation, traversal, and transformation functions as + builtins. + + * Create plan (if not actually implement it yet) for how code will be + imported from one file to another. Namespacing in general will fall into + this bucket. + + * Create plan (if not actually implement it yet) for how users can + extend/replace the lexer/parser. + +I don't know _when_ I'll get to work on these next, ginger will come back up in +my rotation of projects eventually. It could be a few months. In the meantime I +hope you're as excited about this progress as I am, and if you have any feedback +I'd love to hear it. + +Thanks for reading! |