summaryrefslogtreecommitdiff
path: root/static/src/_posts/2022-04-03-ginger-a-small-vm-update.md
blob: 74395f0c3c3b6482a40bf3101b818604808713b7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
---
title: >-
    Ginger: A Small VM Update
description: >-
    It works gooder now.
tags: tech
series: ginger
---

During some recent traveling I had to be pulled away from cryptic-net work for a
while. Instead I managed to spend a few free hours, and the odd international
plane ride, to fix the ginger vm.

The problem, as it stood, was that it only functioned "correctly" in a very
accidental sense. I knew from the moment that I published it that it would get
mostly rewritten immediately.

And so here we are, with a rewritten vm and some new realizations.

## Operation

The `Operation` type was previously defined like so:

```
type Operation interface {
	Perform([]Thunk, Operation) (Thunk, error)
}
```

I'm not going to explain it, because it's both confusing and wrong.

One thing that is helpful in a refactor, especially in a strongly typed
language, is to tag certain interfaces as being axiomatic, and conforming the
rest of your changes around those. If those interfaces are simple enough to
apply broadly _and_ accurately describe desired behavior, they will help
pre-decide many difficult decisions you'd otherwise have to make.

So with that mind, I tagged `Operation` as being an axiomatic interface, given
that ginger is aiming to be a functional language (and I'm wondering if I should
just rename `Operation` to `Function`, while I'm at it). The new definition of
the interface is:

```
type Operation interface {
	Perform(Value) Value
}
```

`Operation` takes and argument and returns a result, it could not possibly be
boiled down any further. By holding `Operation` to this definition and making
decisions from there, it was pretty clear what the next point of attack was.

## If/Recur

The reason that `Operation` had previously been defined in such a fucked up way
was to support the `if` and `recur` `Operation`s, as if they weren't different
than any other `Operation`s. But truthfully they are different, as they are
actually control flow constructs, and so require capabilities that no other
`Operation` would be allowed to use anyway.

The new implementation reflects this. `if` and `recur` are now both handled
directly by the compiler, while global `Operation`s like `tupEl` are
implementations of the `Operation` interface.

## Compile Step

The previous iteration of the vm hadn't distinguished between a compile step and
a run step. In a way it did both at the same time, by abusing the `Thunk` type.
Separating the two steps, and ditching the `Thunk` type in the process, was the
next major change in the refactoring.

The compile step can be modeled as a function which takes a `Graph` and returns
an `Operation`, where the `Graph`'s `in` and `out` names correspond to the
`Operation`'s argument and return, respectively. The run step then reads an
input from the user, calls the compiled `Operation` with that input, and outputs
the result back to the user.

As an example, given the following program:

```
* six-or-more.gg

max = {
    a = tupEl < (in, 0)
    b = tupEl < (in, 1)
    out = if < (gt < (a, b), a, b)
}

out = max < (in, 6)
```

we want to compile an `Operation` which accepts a number and returns the greater
of that number and 6. I'm going to use anonymous go functions to demonstrate the
anatomy of the compiled `Operation`, as that's what's happening in the current
compiler anyway.

```
// After compilation, this function will be in-memory and usable as an
// Operation.

sixOrMore := func(in Value) Value {

    max := func(in Value) Value {

        a := tupEl(in, 0)
        b := tupEl(in, 1)

        if a > b {
            return a
        }

        return b
    }

    return max(in, 6)
}
```

Or at least, this is what I tried for _initially_. What I found was that it was
easier, in the context of how `graph.MapReduce` works, to make even the leaf
values, e.g. `in`, `0`, `1`, and `6`, map to `Operations` as well. `in` is
replaced with an anonymous function which returns its argument, and the numbers
are replaced with anonymous functions which ignore their argument and always
return their respective number.

So the compiled `Operation` looks more like this:

```
// After compilation, this function will be in-memory and usable as an
// Operation.

sixOrMore := func(in Value) Value {

    max := func(in Value) Value {

        a := tupEl(
            func(in Value) Value { return in }(in),
            func(_ Value) Value { return 0}(in),
        )

        b := tupEl(
            func(in Value) Value { return in }(in),
            func(_ Value) Value { return 1}(in),
        )

        if a > b {
            return a
        }

        return b
    }

    return max(
        func(in Value) Value { return in }(in),
        func(_ Value) Value { return 6}(in),
    )
}
```

This added layer of indirection for all leaf values is not great for
performance, and there's probably further refactoring which could be done to
make the result look more like the original ideal.

To make things a bit messier, even that representation isn't quite accurate to
the current result. The compiler doesn't properly de-duplicate work when
following name values. In other words, everytime `a` is referenced within `max`,
the `Operation` which the compiler produces will recompute `a` via `tupEl`.

So the _actual_ compiled `Operation` looks more like this:

```
// After compilation, this function will be in-memory and usable as an
// Operation.

sixOrMore := func(in Value) Value {

    return func(in Value) Value {

        if tupEl(func(in Value) Value { return in }(in), func(_ Value) Value { return 0}(in)) >
            tupEl(func(in Value) Value { return in }(in), func(_ Value) Value { return 1}(in)) {

            return tupEl(func(in Value) Value { return in }(in), func(_ Value) Value { return 0}(in))
        }

        return tupEl(func(in Value) Value { return in }(in), func(_ Value) Value { return 1}(in))
    }(
        func(in Value) Value { return in }(in),
        func(_ Value) Value { return 6}(in),
    )
}
```

Clearly, there's some optimization to be done still.

## Results

While it's still not perfect, the new implementation is far and away better than
the old. This can be seen just in the performance for the fibonacci program:

```
# Previous VM

$ time ./eval "$(cat examples/fib.gg)" 10
55

real    0m8.737s
user    0m9.871s
sys     0m0.309s
```

```
# New VM

$ time ./eval "$(cat examples/fib.gg)" 50
12586269025

real    0m0.003s
user    0m0.003s
sys     0m0.000s
```

They're not even comparable.