diff options
Diffstat (limited to 'src/_posts')
21 files changed, 4764 insertions, 0 deletions
diff --git a/src/_posts/2013-04-09-erlang-tcp-socket-pull-pattern.md b/src/_posts/2013-04-09-erlang-tcp-socket-pull-pattern.md new file mode 100644 index 0000000..3e5f0af --- /dev/null +++ b/src/_posts/2013-04-09-erlang-tcp-socket-pull-pattern.md @@ -0,0 +1,256 @@ +--- +title: "Erlang, tcp sockets, and active true" +description: >- + Using `{active:once}` isn't always the best way to handle connections. +--- + +If you don't know erlang then [you're missing out][0]. If you do know erlang, +you've probably at some point done something with tcp sockets. Erlang's highly +concurrent model of execution lends itself well to server programs where a high +number of active connections is desired. Each thread can autonomously handle its +single client, greatly simplifying the logic of the whole application while +still retaining [great performance characteristics][1]. + +## Background + +For an erlang thread which owns a single socket there are three different ways +to receive data off of that socket. These all revolve around the `active` +[setopts][2] flag. A socket can be set to one of: + +* `{active,false}` - All data must be obtained through [recv/2][3] calls. This + amounts to syncronous socket reading. + +* `{active,true}` - All data on the socket gets sent to the controlling thread + as a normal erlang message. It is the thread's + responsibility to keep up with the buffered data in the + message queue. This amounts to asyncronous socket reading. + +* `{active,once}` - When set the socket is placed in `{active,true}` for a + single packet. That is, once set the thread can expect a + single message to be sent to when data comes in. To receive + any more data off of the socket the socket must either be + read from using [recv/2][3] or be put in `{active,once}` or + `{active,true}`. + +## Which to use? + +Many (most?) tutorials advocate using `{active,once}` in your application +\[0]\[1]\[2]. This has to do with usability and security. When in `{active,true}` +it's possible for a client to flood the connection faster than the receiving +process will process those messages, potentially eating up a lot of memory in +the VM. However, if you want to be able to receive both tcp data messages as +well as other messages from other erlang processes at the same time you can't +use `{active,false}`. So `{active,once}` is generally preferred because it +deals with both of these problems quite well. + +## Why not to use `{active,once}` + +Here's what your classic `{active,once}` enabled tcp socket implementation will +probably look like: + +```erlang +-module(tcp_test). +-compile(export_all). + +-define(TCP_OPTS, [ + binary, + {packet, raw}, + {nodelay,true}, + {active, false}, + {reuseaddr, true}, + {keepalive,true}, + {backlog,500} +]). + +%Start listening +listen(Port) -> + {ok, L} = gen_tcp:listen(Port, ?TCP_OPTS), + ?MODULE:accept(L). + +%Accept a connection +accept(L) -> + {ok, Socket} = gen_tcp:accept(L), + ?MODULE:read_loop(Socket), + io:fwrite("Done reading, connection was closed\n"), + ?MODULE:accept(L). + +%Read everything it sends us +read_loop(Socket) -> + inet:setopts(Socket, [{active, once}]), + receive + {tcp, _, _} -> + do_stuff_here, + ?MODULE:read_loop(Socket); + {tcp_closed, _}-> donezo; + {tcp_error, _, _} -> donezo + end. +``` + +This code isn't actually usable for a production system; it doesn't even spawn a +new process for the new socket. But that's not the point I'm making. If I run it +with `tcp_test:listen(8000)`, and in other window do: + +```bash +while [ 1 ]; do echo "aloha"; done | nc localhost 8000 +``` + +We'll be flooding the the server with data pretty well. Using [eprof][4] we can +get an idea of how our code performs, and where the hang-ups are: + +```erlang +1> eprof:start(). +{ok,<0.34.0>} + +2> P = spawn(tcp_test,listen,[8000]). +<0.36.0> + +3> eprof:start_profiling([P]). +profiling + +4> running_the_while_loop. +running_the_while_loop + +5> eprof:stop_profiling(). +profiling_stopped + +6> eprof:analyze(procs,[{sort,time}]). + +****** Process <0.36.0> -- 100.00 % of profiled time *** +FUNCTION CALLS % TIME [uS / CALLS] +-------- ----- --- ---- [----------] +prim_inet:type_value_2/2 6 0.00 0 [ 0.00] + +....snip.... + +prim_inet:enc_opts/2 6 0.00 8 [ 1.33] +prim_inet:setopts/2 12303599 1.85 1466319 [ 0.12] +tcp_test:read_loop/1 12303598 2.22 1761775 [ 0.14] +prim_inet:encode_opt_val/1 12303599 3.50 2769285 [ 0.23] +prim_inet:ctl_cmd/3 12303600 4.29 3399333 [ 0.28] +prim_inet:enc_opt_val/2 24607203 5.28 4184818 [ 0.17] +inet:setopts/2 12303598 5.72 4533863 [ 0.37] +erlang:port_control/3 12303600 77.13 61085040 [ 4.96] +``` + +eprof shows us where our process is spending the majority of its time. The `%` +column indicates percentage of time the process spent during profiling inside +any function. We can pretty clearly see that the vast majority of time was spent +inside `erlang:port_control/3`, the BIF that `inet:setopts/2` uses to switch the +socket to `{active,once}` mode. Amongst the calls which were called on every +loop, it takes up by far the most amount of time. In addition all of those other +calls are also related to `inet:setopts/2`. + +I'm gonna rewrite our little listen server to use `{active,true}`, and we'll do +it all again: + +```erlang +-module(tcp_test). +-compile(export_all). + +-define(TCP_OPTS, [ + binary, + {packet, raw}, + {nodelay,true}, + {active, false}, + {reuseaddr, true}, + {keepalive,true}, + {backlog,500} +]). + +%Start listening +listen(Port) -> + {ok, L} = gen_tcp:listen(Port, ?TCP_OPTS), + ?MODULE:accept(L). + +%Accept a connection +accept(L) -> + {ok, Socket} = gen_tcp:accept(L), + inet:setopts(Socket, [{active, true}]), %Well this is new + ?MODULE:read_loop(Socket), + io:fwrite("Done reading, connection was closed\n"), + ?MODULE:accept(L). + +%Read everything it sends us +read_loop(Socket) -> + %inet:setopts(Socket, [{active, once}]), + receive + {tcp, _, _} -> + do_stuff_here, + ?MODULE:read_loop(Socket); + {tcp_closed, _}-> donezo; + {tcp_error, _, _} -> donezo + end. +``` + +And the profiling results: + +```erlang +1> eprof:start(). +{ok,<0.34.0>} + +2> P = spawn(tcp_test,listen,[8000]). +<0.36.0> + +3> eprof:start_profiling([P]). +profiling + +4> running_the_while_loop. +running_the_while_loop + +5> eprof:stop_profiling(). +profiling_stopped + +6> eprof:analyze(procs,[{sort,time}]). + +****** Process <0.36.0> -- 100.00 % of profiled time *** +FUNCTION CALLS % TIME [uS / CALLS] +-------- ----- --- ---- [----------] +prim_inet:enc_value_1/3 7 0.00 1 [ 0.14] +prim_inet:decode_opt_val/1 1 0.00 1 [ 1.00] +inet:setopts/2 1 0.00 2 [ 2.00] +prim_inet:setopts/2 2 0.00 2 [ 1.00] +prim_inet:enum_name/2 1 0.00 2 [ 2.00] +erlang:port_set_data/2 1 0.00 2 [ 2.00] +inet_db:register_socket/2 1 0.00 3 [ 3.00] +prim_inet:type_value_1/3 7 0.00 3 [ 0.43] + +.... snip .... + +prim_inet:type_opt_1/1 19 0.00 7 [ 0.37] +prim_inet:enc_value/3 7 0.00 7 [ 1.00] +prim_inet:enum_val/2 6 0.00 7 [ 1.17] +prim_inet:dec_opt_val/1 7 0.00 7 [ 1.00] +prim_inet:dec_value/2 6 0.00 10 [ 1.67] +prim_inet:enc_opt/1 13 0.00 12 [ 0.92] +prim_inet:type_opt/2 19 0.00 33 [ 1.74] +erlang:port_control/3 3 0.00 59 [ 19.67] +tcp_test:read_loop/1 20716370 100.00 12187488 [ 0.59] +``` + +This time our process spent almost no time at all (according to eprof, 0%) +fiddling with the socket opts. Instead it spent all of its time in the +read_loop doing the work we actually want to be doing. + +## So what does this mean? + +I'm by no means advocating never using `{active,once}`. The security concern is +still a completely valid concern and one that `{active,once}` mitigates quite +well. I'm simply pointing out that this mitigation has some fairly serious +performance implications which have the potential to bite you if you're not +careful, especially in cases where a socket is going to be receiving a large +amount of traffic. + +## Meta + +These tests were done using R15B03, but I've done similar ones in R14 and found +similar results. I have not tested R16. + +* \[0] http://learnyousomeerlang.com/buckets-of-sockets +* \[1] http://www.erlang.org/doc/man/gen_tcp.html#examples +* \[2] http://erlycoder.com/25/erlang-tcp-server-tcp-client-sockets-with-gen_tcp + +[0]: http://learnyousomeerlang.com/content +[1]: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 +[2]: http://www.erlang.org/doc/man/inet.html#setopts-2 +[3]: http://www.erlang.org/doc/man/gen_tcp.html#recv-2 +[4]: http://www.erlang.org/doc/man/eprof.html diff --git a/src/_posts/2013-07-11-goplus.md b/src/_posts/2013-07-11-goplus.md new file mode 100644 index 0000000..5ee121e --- /dev/null +++ b/src/_posts/2013-07-11-goplus.md @@ -0,0 +1,77 @@ +--- +title: Go+ +description: >- + A simple proof-of-concept script for doing go dependency management. +--- + +Compared to other languages go has some strange behavior regarding its project +root settings. If you import a library called `somelib`, go will look for a +`src/somelib` folder in all of the folders in the `$GOPATH` environment +variable. This works nicely for globally installed packages, but it makes +encapsulating a project with a specific version, or modified version, rather +tedious. Whenever you go to work on this project you'll have to add its path to +your `$GOPATH`, or add the path permanently, which could break other projects +which may use a different version of `somelib`. + +My solution is in the form of a simple script I'm calling go+. go+ will search +in currrent directory and all of its parents for a file called `GOPROJROOT`. If +it finds that file in a directory, it prepends that directory's absolute path to +your `$GOPATH` and stops the search. Regardless of whether or not `GOPROJROOT` +was found go+ will passthrough all arguments to the actual go call. The +modification to `$GOPATH` will only last the duration of the call. + +As an example, consider the following: +``` +/tmp + /hello + GOPROJROOT + /src + /somelib/somelib.go + /hello.go +``` + +If `hello.go` depends on `somelib`, as long as you run go+ from `/tmp/hello` or +one of its children your project will still compile + +Here is the source code for go+: + +```bash +#!/bin/sh + +SEARCHING_FOR=GOPROJROOT +ORIG_DIR=$(pwd) + +STOPSEARCH=0 +SEARCH_DIR=$ORIG_DIR +while [ $STOPSEARCH = 0 ]; do + + RES=$( find $SEARCH_DIR -maxdepth 1 -type f -name $SEARCHING_FOR | \ + grep -P "$SEARCHING_FOR$" | \ + head -n1 ) + + if [ "$RES" = "" ]; then + if [ "$SEARCH_DIR" = "/" ]; then + STOPSEARCH=1 + fi + cd .. + SEARCH_DIR=$(pwd) + else + export GOPATH=$SEARCH_DIR:$GOPATH + STOPSEARCH=1 + fi +done + +cd "$ORIG_DIR" +exec go $@ +``` + +## UPDATE: Goat + +I'm leaving this post for posterity, but go+ has some serious flaws in it. For +one, it doesn't allow for specifying the version of a dependency you want to +use. To this end, I wrote [goat][0] which does all the things go+ does, plus +real dependency management, PLUS it is built in a way that if you've been +following go's best-practices for code organization you shouldn't have to change +any of your existing code AT ALL. It's cool, check it out. + +[0]: http://github.com/mediocregopher/goat diff --git a/src/_posts/2013-10-08-generations.md b/src/_posts/2013-10-08-generations.md new file mode 100644 index 0000000..c1c433d --- /dev/null +++ b/src/_posts/2013-10-08-generations.md @@ -0,0 +1,100 @@ +--- +title: Generations +description: >- + A simple file distribution strategy for very large scale, high-availability + file-services. +--- + +## The problem + +At [cryptic.io][cryptic] we plan on having millions of different +files, any of which could be arbitrarily chosen to be served any given time. +These files are uploaded by users at arbitrary times. + +Scaling such a system is no easy task. The solution I've seen implemented in the +past involves shuffling files around on a nearly constant basis, making sure +that files which are more "popular" are on fast drives, while at the same time +making sure that no drives are at capicty and at the same time that all files, +even newly uploaded ones, are stored redundantly. + +The problem with this solution is one of coordination. At any given moment the +app needs to be able to "find" a file so it can give the client a link to +download the file from one of the servers that it's on. Full-filling this simple +requirement means that all datastores/caches where information about where a +file lives need to be up-to-date at all times, and even then there are +race-conditions and network failures to contend with, while at all times the +requirements of the app evolve and change. + +## A simpler solution + +Let's say you want all files which get uploaded to be replicated in triplicate +in some capacity. You buy three identical hard-disks, and put each on a separate +server. As files get uploaded by clients, each file gets put on each drive +immediately. When the drives are filled (which should be at around the same +time), you stop uploading to them. + +That was generation 0. + +You buy three more drives, and start putting all files on them instead. This is +going to be generation 1. Repeat until you run out of money. + +That's it. + +### That's it? + +It seems simple and obvious, and maybe it's the standard thing which is done, +but as far as I can tell no-one has written about it (though I'm probably not +searching for the right thing, let me know if this is the case!). + +### Advantages + +* It's so simple to implement, you could probably do it in a day if you're +starting a project from scratch + +* By definition of the scheme all files are replicated in multiple places. + +* Minimal information about where a file "is" needs to be stored. When a file is +uploaded all that's needed is to know what generation it is in, and then what +nodes/drives are in that generation. If the file's name is generated +server-side, then the file's generation could be *part* of its name, making +lookup even faster. + +* Drives don't need to "know" about each other. What I mean by this is that +whatever is running as the receive point for file-uploads on each drive doesn't +have to coordinate with its siblings running on the other drives in the +generation. In fact it doesn't need to coordinate with anyone. You could +literally rsync files onto your drives if you wanted to. I would recommend using +[marlin][0] though :) + +* Scaling is easy. When you run out of space you can simply start a new +generation. If you don't like playing that close to the chest there's nothing to +say you can't have two generations active at the same time. + +* Upgrading is easy. As long as a generation is not marked-for-upload, you can +easily copy all files in the generation into a new set of bigger, badder drives, +add those drives into the generation in your code, remove the old ones, then +mark the generation as uploadable again. + +* Distribution is easy. You just copy a generation's files onto a new drive in +Europe or wherever you're getting an uptick in traffic from and you're good to +go. + +* Management is easy. It's trivial to find out how many times a file has been +replicated, or how many countries it's in, or what hardware it's being served +from (given you have easy access to information about specific drives). + +### Caveats + +The big caveat here is that this is just an idea. It has NOT been tested in +production. But we have enough faith in it that we're going to give it a shot at +[cryptic.io][cryptic]. I'll keep this page updated. + +The second caveat is that this scheme does not inherently support caching. If a +file suddenly becomes super popular the world over your hard-disks might not be +able to keep up, and it's probably not feasible to have an FIO drive in *every* +generation. I think that [groupcache][1] may be the answer to this problem, +assuming your files are reasonably small, but again I haven't tested it yet. + +[cryptic]: https://cryptic.io +[0]: https://github.com/cryptic-io/marlin +[1]: https://github.com/golang/groupcache diff --git a/src/_posts/2013-10-25-namecoind-ssl.md b/src/_posts/2013-10-25-namecoind-ssl.md new file mode 100644 index 0000000..2711a92 --- /dev/null +++ b/src/_posts/2013-10-25-namecoind-ssl.md @@ -0,0 +1,248 @@ +--- +title: Namecoin, A Replacement For SSL +description: >- + If we use the namecoin chain as a DNS service we get security almost for + free, along with lots of other benefits. +--- + +At [cryptic.io][cryptic] we are creating a client-side, in-browser encryption +system where a user can upload their already encrypted content to our storage +system and be 100% confident that their data can never be decrypted by anyone +but them. + +One of the main problems with this approach is that the client has to be sure +that the code that's being run in their browser is the correct code; that is, +that they aren't the subject of a man-in-the-middle attack where an attacker is +turning our strong encryption into weak encryption that they could later break. + +A component of our current solution is to deliver the site's javascript (and all +other assets, for that matter) using SSL encryption. This protects the files +from tampering in-between leaving our servers and being received by the client. +Unfortunately, SSL isn't 100% foolproof. This post aims to show why SSL is +faulty, and propose a solution. + +## SSL + +SSL is the mechanism by which web-browsers establish an encrypted connection to +web-servers. The goal of this connection is that only the destination +web-browser and the server know what data is passing between them. Anyone spying +on the connection would only see gibberish. To do this a secret key is first +established between the client and the server, and used to encrypt/decrypt all +data. As long as no-one but those parties knows that key, that data will never +be decrypted by anyone else. + +SSL is what's used to establish that secret key on a per-session basis, so that +a key isn't ever re-used and so only the client and the server know it. + +### Public-Private Key Cryptography + +SSL is based around public-private key cryptography. In a public-private key +system, you have both a public key which is generated from a private key. The +public key can be given to anyone, but the private key must remain hidden. There +are two main uses for these two keys: + +* Someone can encrypt a message with your public key, and only you (with the + private key) can decrypt it. + +* You can sign a message with your private key, and anyone with your public key + can verify that it was you and not someone else who signed it. + +These are both extremely useful functions, not just for internet traffic but for +any kind of communication form. Unfortunately, there remains a fundamental flaw. +At some point you must give your public key to the other person in an insecure +way. If an attacker was to intercept your message containing your public key and +swap it for their own, then all future communications could be compromised. That +attacker could create messages the other person would think are from you, and +the other person would encrypt messages meant for you but which would be +decrypt-able by the attacker. + +### How does SSL work? + +SSL is at its heart a public-private key system, but its aim is to be more +secure against the attack described above. + +SSL uses a trust-chain to verify that a public key is the intended one. Your web +browser has a built-in set of public keys, called the root certificates, that it +implicitly trusts. These root certificates are managed by a small number of +companies designated by some agency who decides on these things. + +When you receive a server's SSL certificate (its public key) that certificate +will be signed by a root certificate. You can verify that signature since you +have the root certificate's public key built into your browser. If the signature +checks out then you know a certificate authority trusts the public key the site +gave you, which means you can trust it too. + +There's a bit (a lot!) more to SSL than this, but this is enough to understand +the fundamental problems with it. + +### How SSL doesn't work + +SSL has a few glaring problems. One, it implies we trust the companies holding +the root certificates to not be compromised. If some malicious agency was to get +ahold of a root certificate they could listen in on any connection on the +internet by swapping a site's real certificate with one they generate on the +fly. They could trivially steal any data we send on the internet. + +The second problem is that it's expensive. Really expensive. If you're running a +business you'll have to shell out about $200 a year to keep your SSL certificate +signed (those signatures have an expiration date attached). Since there's very +few root authorities there's an effective monopoly on signatures, and there's +nothing we can do about it. For 200 bucks I know most people simply say "no +thanks" and go unencrypted. The solution is creating a bigger problem. + +## Bitcoins + +Time to switch gears, and propose a solution to the above issues: namecoins. I'm +going to first talk about what namecoins are, how they work, and why we need +them. To start with, namecoins are based on bitcoins. + +If you haven't yet checked out bitcoins, [I highly encourage you to do +so][bitcoins]. They're awesome, and I think they have a chance of really +changing the way we think of and use money in the future. At the moment they're +still a bit of a novelty in the tech realm, but they're growing in popularity. + +The rest of this post assumes you know more or less what bitcoins are, and how +they work. + +## Namecoins + +Few people actually know about bitcoins. Even fewer know that there's other +crypto-currencies besides bitcoins. Basically, developers of these alternative +currencies (altcoins, in the parlance of our times) took the original bitcoin +source code and modified it to produce a new, separate blockchain from the +original bitcoin one. The altcoins are based on the same idea as bitcoins +(namely, a chain of blocks representing all the transactions ever made), but +have slightly different characterstics. + +One of these altcoins is called namecoin. Where other altcoins aim to be digital +currencies, and used as such (like bitcoins), namecoin has a different goal. The +point of namecoin is to create a global, distributed, secure key-value store. +You spend namecoins to claim arbitrary keys (once you've claimed it, you own it +for a set period of time) and to give those keys arbitrary values. Anyone else +with namecoind running can see these values. + +### Why use it? + +A blockchain based on a digital currency seems like a weird idea at first. I +know when I first read about it I was less than thrilled. How is this better +than a DHT? It's a key-value store, why is there a currency involved? + +#### DHT + +DHT stands for Distributed Hash-Table. I'm not going to go too into how they +work, but suffice it to say that they are essentially a distributed key-value +store. Like namecoin. The difference is in the operation. DHTs operate by +spreading and replicating keys and their values across nodes in a P2P mesh. They +have [lots of issues][dht] as far as security goes, the main one being that it's +fairly easy for an attacker to forge the value for a given key, and very +difficult to stop them from doing so or even to detect that it's happened. + +Namecoins don't have this problem. To forge a particular key an attacker would +essentially have to create a new blockchain from a certain point in the existing +chain, and then replicate all the work put into the existing chain into that new +compromised one so that the new one is longer and other clients in the network +will except it. This is extremely non-trivial. + +#### Why a currency? + +To answer why a currency needs to be involved, we need to first look at how +bitcoin/namecoin work. When you take an action (send someone money, set a value +to a key) that action gets broadcast to the network. Nodes on the network +collect these actions into a block, which is just a collection of multiple +actions. Their goal is to find a hash of this new block, combined with some data +from the top-most block in the existing chain, combined with some arbitrary +data, such that the first n characters in the resulting hash are zeros (with n +constantly increasing). When they find one they broadcast it out on the network. +Assuming the block is legitimate they receive some number of coins as +compensation. + +That compensation is what keeps a blockchain based currency going. If there +were no compensation there would be no reason to mine except out of goodwill, so +far fewer people would do it. Since the chain can be compromised if a malicious +group has more computing power than all legitimate miners combined, having few +legitimate miners is a serious problem. + +In the case of namecoins, there's even more reason to involve a currency. Since +you have to spend money to make changes to the chain there's a disincentive for +attackers (read: idiots) to spam the chain with frivolous changes to keys. + +#### Why a *new* currency? + +I'll admit, it's a bit annoying to see all these altcoins popping up. I'm sure +many of them have some solid ideas backing them, but it also makes things +confusing for newcomers and dilutes the "market" of cryptocoin users; the more +users a particular chain has, the stronger it is. If we have many chains, all we +have are a bunch of weak chains. + +The exception to this gripe, for me, is namecoin. When I was first thinking +about this problem my instinct was to just use the existing bitcoin blockchain +as a key-value storage. However, the maintainers of the bitcoin clients +(who are, in effect, the maintainers of the chain) don't want the bitcoin +blockchain polluted with non-commerce related data. At first I disagreed; it's a +P2P network, no-one gets to say what I can or can't use the chain for! And +that's true. But things work out better for everyone involved if there's two +chains. + +Bitcoin is a currency. Namecoin is a key-value store (with a currency as its +driving force). Those are two completely different use-cases, with two +completely difference usage characteristics. And we don't know yet what those +characteristics are, or if they'll change. If the chain-maintainers have to deal +with a mingled chain we could very well be tying their hands with regards to +what they can or can't change with regards to the behavior of the chain, since +improving performance for one use-case may hurt the performance of the other. +With two separate chains the maintainers of each are free to do what they see +fit to keep their respective chains operating as smoothly as possible. +Additionally, if for some reason bitcoins fall by the wayside, namecoin will +still have a shot at continuing operation since it isn't tied to the former. +Tldr: separation of concerns. + +## Namecoin as an alternative to SSL + +And now to tie it all together. + +There are already a number of proposed formats for standardizing how we store +data on the namecoin chain so that we can start building tools around it. I'm +not hugely concerned with the particulars of those standards, only that we can, +in some way, standardize on attaching a public key (or a fingerprint of one) to +some key on the namecoin blockchain. When you visit a website, the server +would then send both its public key and the namecoin chain key to be checked +against to the browser, and the browser would validate that the public key it +received is the same as the one on the namecoin chain. + +The main issue with this is that it requires another round-trip when visiting a +website: One for DNS, and one to check the namecoin chain. And where would this +chain even be hosted? + +My proposition is there would exist a number of publicly available servers +hosting a namecoind process that anyone in the world could send requests for +values on the chain. Browsers could then be made with a couple of these +hardwired in. ISPs could also run their own copies at various points in their +network to improve response-rates and decrease load on the globally public +servers. Furthermore, the paranoid could host their own and be absolutely sure +that the data they're receiving is valid. + +If the above scheme sounds a lot like what we currently use for DNS, that's +because it is. In fact, one of namecoin's major goals is that it be used as a +replacement for DNS, and most of the talk around it is focused on this subject. +DNS has many of the same problems as SSL, namely single-point-of-failure and +that it's run by a centralized agency that we have to pay arbitrarily high fees +to. By switching our DNS and SSL infrastructure to use namecoin we could kill +two horribly annoying, monopolized, expensive birds with a single stone. + +That's it. If we use the namecoin chain as a DNS service we get security almost +for free, along with lots of other benefits. To make this happen we need +cooperation from browser makers, and to standardize on a simple way of +retrieving DNS information from the chain that the browsers can use. The +protocol doesn't need to be very complex, I think HTTP/REST should suffice, +since the meat of the data will be embedded in the JSON value on the namecoin +chain. + +If you want to contribute or learn more please check out [namecoin][nmc] and +specifically the [d namespace proposal][dns] for it. + +[cryptic]: http://cryptic.io +[bitcoins]: http://vimeo.com/63502573 +[dht]: http://www.globule.org/publi/SDST_acmcs2009.pdf +[nsa]: https://www.schneier.com/blog/archives/2013/09/new_nsa_leak_sh.html +[nmc]: http://dot-bit.org/Main_Page +[dns]: http://dot-bit.org/Namespace:Domain_names_v2.0 diff --git a/src/_posts/2014-01-11-diamond-square.md b/src/_posts/2014-01-11-diamond-square.md new file mode 100644 index 0000000..665e07c --- /dev/null +++ b/src/_posts/2014-01-11-diamond-square.md @@ -0,0 +1,494 @@ +--- +title: Diamond Square +description: >- + Tackling the problem of semi-realistic looking terrain generation in + clojure. +updated: 2018-09-06 +--- + +![terrain][terrain] + +I recently started looking into the diamond-square algorithm (you can find a +great article on it [here][diamondsquare]). The following is a short-ish +walkthrough of how I tackled the problem in clojure and the results. You can +find the [leiningen][lein] repo [here][repo] and follow along within that, or +simply read the code below to get an idea. + +Also, Marco ported my code into clojurescript, so you can get random terrain +in your browser. [Check it out!][marco] + +```clojure +(ns diamond-square.core) + +; == The Goal == +; Create a fractal terrain generator using clojure + +; == The Algorithm == +; Diamond-Square. We start with a grid of points, each with a height of 0. +; +; 1. Take each corner point of the square, average the heights, and assign that +; to be the height of the midpoint of the square. Apply some random error to +; the midpoint. +; +; 2. Creating a line from the midpoint to each corner we get four half-diamonds. +; Average the heights of the points (with some random error) and assign the +; heights to the midpoints of the diamonds. +; +; 3. We now have four square sections, start at 1 for each of them (with +; decreasing amount of error for each iteration). +; +; This picture explains it better than I can: +; https://blog.mediocregopher.com/img/diamond-square/dsalg.png +; (http://nbickford.wordpress.com/2012/12/21/creating-fake-landscapes/dsalg/) +; +; == The Strategy == +; We begin with a vector of vectors of numbers, and iterate over it, filling in +; spots as they become available. Our grid will have the top-left being (0,0), +; y being pointing down and x going to the right. The outermost vector +; indicating row number (y) and the inner vectors indicate the column number (x) +; +; = Utility = +; First we create some utility functions for dealing with vectors of vectors. + +(defn print-m + "Prints a grid in a nice way" + [m] + (doseq [n m] + (println n))) + +(defn get-m + "Gets a value at the given x,y coordinate of the grid, with [0,0] being in the + top left" + [m x y] + ((m y) x)) + +(defn set-m + "Sets a value at the given x,y coordinat of the grid, with [0,0] being in the + top left" + [m x y v] + (assoc m y + (assoc (m y) x v))) + +(defn add-m + "Like set-m, but adds the given value to the current on instead of overwriting + it" + [m x y v] + (set-m m x y + (+ (get-m m x y) v))) + +(defn avg + "Returns the truncated average of all the given arguments" + [& l] + (int (/ (reduce + l) (count l)))) + +; = Grid size = +; Since we're starting with a blank grid we need to find out what sizes the +; grids can be. For convenience the size (height and width) should be odd, so we +; easily get a midpoint. And on each iteration we'll be halfing the grid, so +; whenever we do that the two resultrant grids should be odd and halfable as +; well, and so on. +; +; The algorithm that fits this is size = 2^n + 1, where 1 <= n. For the rest of +; this guide I'll be referring to n as the "degree" of the grid. + + +(def exp2-pre-compute + (vec (map #(int (Math/pow 2 %)) (range 31)))) + +(defn exp2 + "Returns 2^n as an integer. Uses pre-computed values since we end up doing + this so much" + [n] + (exp2-pre-compute n)) + +(def grid-sizes + (vec (map #(inc (exp2 %)) (range 1 31)))) + +(defn grid-size [degree] + (inc (exp2 degree))) + +; Available grid heights/widths are as follows: +;[3 5 9 17 33 65 129 257 513 1025 2049 4097 8193 16385 32769 65537 131073 +;262145 524289 1048577 2097153 4194305 8388609 16777217 33554433 67108865 +;134217729 268435457 536870913 1073741825]) + +(defn blank-grid + "Generates a grid of the given degree, filled in with zeros" + [degree] + (let [gsize (grid-size degree)] + (vec (repeat gsize + (vec (repeat gsize 0)))))) + +(comment + (print-m (blank-grid 3)) +) + +; = Coordinate Pattern (The Tricky Part) = +; We now have to figure out which coordinates need to be filled in on each pass. +; A pass is defined as a square step followed by a diamond step. The next pass +; will be the square/dimaond steps on all the smaller squares generated in the +; pass. It works out that the number of passes required to fill in the grid is +; the same as the degree of the grid, where the first pass is 1. +; +; So we can easily find patterns in the coordinates for a given degree/pass, +; I've laid out below all the coordinates for each pass for a 3rd degree grid +; (which is 9x9). + +; Degree 3 Pass 1 Square +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . 1 . . . .] (4,4) +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] + +; Degree 3 Pass 1 Diamond +; [. . . . 2 . . . .] (4,0) +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] +; [2 . . . . . . . 2] (0,4) (8,4) +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . 2 . . . .] (4,8) + +; Degree 3 Pass 2 Square +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . 3 . . . 3 . .] (2,2) (6,2) +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . . . . . . . .] +; [. . 3 . . . 3 . .] (2,6) (6,6) +; [. . . . . . . . .] +; [. . . . . . . . .] + +; Degree 3 Pass 2 Diamond +; [. . 4 . . . 4 . .] (2,0) (6,0) +; [. . . . . . . . .] +; [4 . . . 4 . . . 4] (0,2) (4,2) (8,2) +; [. . . . . . . . .] +; [. . 4 . . . 4 . .] (2,4) (6,4) +; [. . . . . . . . .] +; [4 . . . 4 . . . 4] (0,6) (4,6) (8,6) +; [. . . . . . . . .] +; [. . 4 . . . 4 . .] (2,8) (6,8) + +; Degree 3 Pass 3 Square +; [. . . . . . . . .] +; [. 5 . 5 . 5 . 5 .] (1,1) (3,1) (5,1) (7,1) +; [. . . . . . . . .] +; [. 5 . 5 . 5 . 5 .] (1,3) (3,3) (5,3) (7,3) +; [. . . . . . . . .] +; [. 5 . 5 . 5 . 5 .] (1,5) (3,5) (5,5) (7,5) +; [. . . . . . . . .] +; [. 5 . 5 . 5 . 5 .] (1,7) (3,7) (5,7) (7,7) +; [. . . . . . . . .] + +; Degree 3 Pass 3 Square +; [. 6 . 6 . 6 . 6 .] (1,0) (3,0) (5,0) (7,0) +; [6 . 6 . 6 . 6 . 6] (0,1) (2,1) (4,1) (6,1) (8,1) +; [. 6 . 6 . 6 . 6 .] (1,2) (3,2) (5,2) (7,2) +; [6 . 6 . 6 . 6 . 6] (0,3) (2,3) (4,3) (6,3) (8,3) +; [. 6 . 6 . 6 . 6 .] (1,4) (3,4) (5,4) (7,4) +; [6 . 6 . 6 . 6 . 6] (0,5) (2,5) (4,5) (6,5) (8,5) +; [. 6 . 6 . 6 . 6 .] (1,6) (3,6) (5,6) (7,6) +; [6 . 6 . 6 . 6 . 6] (0,7) (2,7) (4,7) (6,7) (8,7) +; [. 6 . 6 . 6 . 6 .] (1,8) (3,8) (5,8) (7,8) +; +; I make two different functions, one to give the coordinates for the square +; portion of each pass and one for the diamond portion of each pass. To find the +; actual patterns it was useful to first look only at the pattern in the +; y-coordinates, and figure out how that translated into the pattern for the +; x-coordinates. + +(defn grid-square-coords + "Given a grid degree and pass number, returns all the coordinates which need + to be computed for the square step of that pass" + [degree pass] + (let [gsize (grid-size degree) + start (exp2 (- degree pass)) + interval (* 2 start) + coords (map #(+ start (* interval %)) + (range (exp2 (dec pass))))] + (mapcat (fn [y] + (map #(vector % y) coords)) + coords))) +; +; (grid-square-coords 3 2) +; => ([2 2] [6 2] [2 6] [6 6]) + +(defn grid-diamond-coords + "Given a grid degree and a pass number, returns all the coordinates which need + to be computed for the diamond step of that pass" + [degree pass] + (let [gsize (grid-size degree) + interval (exp2 (- degree pass)) + num-coords (grid-size pass) + coords (map #(* interval %) (range 0 num-coords))] + (mapcat (fn [y] + (if (even? (/ y interval)) + (map #(vector % y) (take-nth 2 (drop 1 coords))) + (map #(vector % y) (take-nth 2 coords)))) + coords))) + +; (grid-diamond-coords 3 2) +; => ([2 0] [6 0] [0 2] [4 2] [8 2] [2 4] [6 4] [0 6] [4 6] [8 6] [2 8] [6 8]) + +; = Height Generation = +; We now work on functions which, given a coordinate, will return what value +; coordinate will have. + +(defn avg-points + "Given a grid and an arbitrary number of points (of the form [x y]) returns + the average of all the given points that are on the map. Any points which are + off the map are ignored" + [m & coords] + (let [grid-size (count m)] + (apply avg + (map #(apply get-m m %) + (filter + (fn [[x y]] + (and (< -1 x) (> grid-size x) + (< -1 y) (> grid-size y))) + coords))))) + +(defn error + "Returns a number between -e and e, inclusive" + [e] + (- (rand-int (inc (* 2 e))) e)) + +; The next function is a little weird. It primarily takes in a point, then +; figures out the distance from that point to the points we'll take the average +; of. The locf (locator function) is used to return back the actual points to +; use. For the square portion it'll be the points diagonal from the given one, +; for the diamond portion it'll be the points to the top/bottom/left/right from +; the given one. +; +; Once it has those points, it finds the average and applies the error. The +; error function is nothing more than a number between -interval and +interval, +; where interval is the distance between the given point and one of the averaged +; points. It is important that the error decreases the more passes you do, which +; is why the interval is used. +; +; The error function is what should be messed with primarily if you want to +; change what kind of terrain you generate (a giant mountain instead of +; hills/valleys, for example). The one we use is uniform for all intervals, so +; it generates a uniform terrain. + +(defn- grid-fill-point + [locf m degree pass x y] + (let [interval (exp2 (- degree pass)) + leftx (- x interval) + rightx (+ x interval) + upy (- y interval) + downy (+ y interval) + v (apply avg-points m + (locf x y leftx rightx upy downy))] + (add-m m x y (+ v (error interval))))) + +(def grid-fill-point-square + "Given a grid, the grid's degree, the current pass number, and a point on the + grid, fills in that point with the average (plus some error) of the + appropriate corner points, and returns the resultant grid" + (partial grid-fill-point + (fn [_ _ leftx rightx upy downy] + [[leftx upy] + [rightx upy] + [leftx downy] + [rightx downy]]))) + +(def grid-fill-point-diamond + "Given a grid, the grid's degree, the current pass number, and a point on the + grid, fills in that point with the average (plus some error) of the + appropriate edge points, and returns the resultant grid" + (partial grid-fill-point + (fn [x y leftx rightx upy downy] + [[leftx y] + [rightx y] + [x upy] + [x downy]]))) + +; = Filling in the Grid = +; We finally compose the functions we've been creating to fill in the entire +; grid + +(defn- grid-fill-point-passes + "Given a grid, a function to fill in coordinates, and a function to generate + those coordinates, fills in all coordinates for a given pass, returning the + resultant grid" + [m fill-f coord-f degree pass] + (reduce + (fn [macc [x y]] (fill-f macc degree pass x y)) + m + (coord-f degree pass))) + +(defn grid-pass + "Given a grid and a pass number, does the square then the diamond portion of + the pass" + [m degree pass] + (-> m + (grid-fill-point-passes + grid-fill-point-square grid-square-coords degree pass) + (grid-fill-point-passes + grid-fill-point-diamond grid-diamond-coords degree pass))) + +; The most important function in this guide, does all the work +(defn terrain + "Given a grid degree, generates a uniformly random terrain on a grid of that + degree" + ([degree] + (terrain (blank-grid degree) degree)) + ([m degree] + (reduce + #(grid-pass %1 degree %2) + m + (range 1 (inc degree))))) + +(comment + (print-m + (terrain 5)) +) + +; == The Results == +; We now have a generated terrain, probably. We should check it. First we'll +; create an ASCII representation. But to do that we'll need some utility +; functions. + +(defn max-terrain-height + "Returns the maximum height found in the given terrain grid" + [m] + (reduce max + (map #(reduce max %) m))) + +(defn min-terrain-height + "Returns the minimum height found in the given terrain grid" + [m] + (reduce min + (map #(reduce min %) m))) + +(defn norm + "Given x in the range (A,B), normalizes it into the range (0,new-height)" + [A B new-height x] + (int (/ (* (- x A) new-height) (- B A)))) + +(defn normalize-terrain + "Given a terrain map and a number of \"steps\", normalizes the terrain so all + heights in it are in the range (0,steps)" + [m steps] + (let [max-height (max-terrain-height m) + min-height (min-terrain-height m) + norm-f (partial norm min-height max-height steps)] + (vec (map #(vec (map norm-f %)) m)))) + +; We now define which ASCII characters we want to use for which heights. The +; vector starts with the character for the lowest height and ends with the +; character for the heighest height. + +(def tiles + [\~ \~ \" \" \x \x \X \$ \% \# \@]) + +(defn tile-terrain + "Given a terrain map, converts it into an ASCII tile map" + [m] + (vec (map #(vec (map tiles %)) + (normalize-terrain m (dec (count tiles)))))) + +(comment + (print-m + (tile-terrain + (terrain 5))) + +; [~ ~ " " x x x X % $ $ $ X X X X X X $ x x x X X X x x x x " " " ~] +; [" ~ " " x x X X $ $ $ X X X X X X X X X X X X X X x x x x " " " "] +; [" " " x x x X X % $ % $ % $ $ X X X X $ $ $ X X X X x x x x " " "] +; [" " " x x X $ % % % % % $ % $ $ X X $ $ $ $ X X x x x x x x " " x] +; [" x x x x X $ $ # % % % % % % $ X $ X X % $ % X X x x x x x x x x] +; [x x x X $ $ $ % % % % % $ % $ $ $ % % $ $ $ $ X X x x x x x x x x] +; [X X X $ % $ % % # % % $ $ % % % % $ % $ $ X $ X $ X X x x x X x x] +; [$ $ X $ $ % $ % % % % $ $ $ % # % % % X X X $ $ $ X X X x x x x x] +; [% X X % % $ % % % $ % $ % % % # @ % $ $ X $ X X $ X x X X x x x x] +; [$ $ % % $ $ % % $ $ X $ $ % % % % $ $ X $ $ X X X X X X x x x x x] +; [% % % X $ $ % $ $ X X $ $ $ $ % % $ $ X X X $ X X X x x X x x X X] +; [$ $ $ X $ $ X $ X X X $ $ $ $ % $ $ $ $ $ X $ X x X X X X X x X X] +; [$ $ $ $ X X $ X X X X X $ % % % % % $ X $ $ $ X x X X X $ X X $ $] +; [X $ $ $ $ $ X X X X X X X % $ % $ $ $ X X X X X x x X X x X X $ $] +; [$ $ X X $ X X x X $ $ X X $ % X X X X X X X X X x X X x x X X X X] +; [$ $ X X X X X X X $ $ $ $ $ X $ X X X X X X X x x x x x x x X X X] +; [% % % $ $ X $ X % X X X % $ $ X X X X X X x x x x x x x x x X X $] +; [$ % % $ $ $ X X $ $ $ $ $ $ X X X X x X x x x x " x x x " x x x x] +; [$ X % $ $ $ $ $ X X X X X $ $ X X X X X X x x " " " " " " " " x x] +; [$ X $ $ % % $ X X X $ X X X x x X X x x x x x " " " " " ~ " " " "] +; [$ $ X X % $ % X X X X X X X X x x X X X x x x " " " " " " ~ " " "] +; [$ $ X $ % $ $ X X X X X X x x x x x x x x x " " " " " " " " " ~ ~] +; [$ $ $ $ $ X X $ X X X X X x x x x x x x x " " " " " " " ~ " " " ~] +; [$ % X X $ $ $ $ X X X X x x x x x x x x x x " " " " ~ " " ~ " " ~] +; [% $ $ X $ X $ X $ X $ X x x x x x x x x x x " " " " ~ ~ ~ " ~ " ~] +; [$ X X X X $ $ $ $ $ X x x x x x x x x x x " " " " ~ ~ ~ ~ ~ ~ ~ ~] +; [X x X X x X X X X X X X X x x x x x x x x x " " " ~ ~ " " ~ ~ ~ ~] +; [x x x x x x X x X X x X X X x x x x x x x " x " " " " " ~ ~ ~ ~ ~] +; [x x x x x x x x X X X X $ X X x X x x x x x x x x " ~ ~ ~ ~ ~ ~ ~] +; [" x x x x x X x X X X X X X X X X x x x x x x " " " " ~ ~ ~ ~ ~ ~] +; [" " " x x x X X X X $ $ $ X X X X X X x x x x x x x x " " ~ ~ ~ ~] +; [" " " " x x x X X X X X $ $ X X x X X x x x x x x x " " " " " ~ ~] +; [~ " " x x x x X $ X $ X $ $ X x X x x x x x x x x x x x x " " " ~] +) + +; = Pictures! = +; ASCII is cool, but pictures are better. First we import some java libraries +; that we'll need, then define the colors for each level just like we did tiles +; for the ascii representation. + +(import + 'java.awt.image.BufferedImage + 'javax.imageio.ImageIO + 'java.io.File) + +(def colors + [0x1437AD 0x04859D 0x007D1C 0x007D1C 0x24913C + 0x00C12B 0x38E05D 0xA3A3A4 0x757575 0xFFFFFF]) + +; Finally we reduce over a BufferedImage instance to output every tile as a +; single pixel on it. + +(defn img-terrain + "Given a terrain map and a file name, outputs a png representation of the + terrain map to that file" + [m file] + (let [img (BufferedImage. (count m) (count m) BufferedImage/TYPE_INT_RGB)] + (reduce + (fn [rown row] + (reduce + (fn [coln tile] + (.setRGB img coln rown (colors tile)) + (inc coln)) + 0 row) + (inc rown)) + 0 (normalize-terrain m (dec (count colors)))) + (ImageIO/write img "png" (File. file)))) + +(comment + (img-terrain + (terrain 10) + "resources/terrain.png") + + ; https://blog.mediocregopher.com/img/diamond-square/terrain.png +) + +; == Conclusion == +; There's still a lot of work to be done. The algorithm starts taking a +; non-trivial amount of time around the 10th degree, which is only a 1025x1025px +; image. I need to profile the code and find out where the bottlenecks are. It's +; possible re-organizing the code to use pmaps instead of reduces in some places +; could help. +``` + +[marco]: http://marcopolo.io/diamond-square/ +[terrain]: /img/diamond-square/terrain.png +[diamondsquare]: http://www.gameprogrammer.com/fractal.html +[lein]: https://github.com/technomancy/leiningen +[repo]: https://github.com/mediocregopher/diamond-square diff --git a/src/_posts/2014-10-29-erlang-pitfalls.md b/src/_posts/2014-10-29-erlang-pitfalls.md new file mode 100644 index 0000000..32a8095 --- /dev/null +++ b/src/_posts/2014-10-29-erlang-pitfalls.md @@ -0,0 +1,192 @@ +--- +title: Erlang Pitfalls +description: >- + Common pitfalls that people may run into when designing and writing + large-scale erlang applications. +--- + +I've been involved with a large-ish scale erlang project at Grooveshark since +sometime around 2011. I started this project knowing absolutely nothing about +erlang, but now I feel I have accumulated enough knowlege over time that I could +conceivably give some back. Specifically, common pitfalls that people may run +into when designing and writing a large-scale erlang application. Some of these +may show up when searching for them, but some of them you may not even know you +need to search for. + +## now() vs timestamp() + +The cononical way of getting the current timestamp in erlang is to use +`erlang:now()`. This works great at small loads, but if you find your +application slowing down greatly at highly parallel loads and you're calling +`erlang:now()` a lot, it may be the culprit. + +A property of this method you may not realize is that it is monotonically +increasing, meaning even if two processes call it at the *exact* same time they +will both receive different output. This is done through some locking on the +low-level, as well as a bit of math to balance out the time getting out of sync +in the scenario. + +There are situations where fetching always unique timestamps is useful, such as +seeding RNGs and generating unique identifiers for things, but usually when +people fetch a timestamp they just want a timestamp. For these cases, +`os:timestamp()` can be used. It is not blocked by any locks, it simply returns +the time. + +## The rpc module is slow + +The built-in `rpc` module is slower than you'd think. This mostly stems from it +doing a lot of extra work for every `call` and `cast` that you do, ensuring that +certain conditions are accounted for. If, however, it's sufficient for the +calling side to know that a call timed-out on them and not worry about it any +further you may benefit from simply writing your own rpc module. Alternatively, +use [one which already exists](https://github.com/cloudant/rexi). + +## Don't send anonymous functions between nodes + +One of erlang's niceties is transparent message sending between two phsyical +erlang nodes. Once nodes are connected, a process on one can send any message to +a process on the other exactly as if they existed on the same node. This is fine +for many data-types, but for anonymous functions it should be avoided. + +For example: + +```erlang +RemotePid ! {fn, fun(I) -> I + 1 end}. +``` + +Would be better written as + +```erlang +incr(I) -> + I + 1. + +RemotePid ! {fn, ?MODULE, incr}. +``` + +and then using an `apply` on the RemotePid to actually execute the function. + +This is because hot-swapping code messes with anonymous functions quite a bit. +Erlang isn't actually sending a function definition across the wire; it's simply +sending a reference to a function. If you've changed the code within the +anonymous function on a node, that reference changes. The sending node is +sending a reference to a function which may not exist anymore on the receiving +node, and you'll get a weird error which Google doesn't return many results for. + +Alternatively, if you simply send atoms across the wire and use `apply` on the +other side, only atoms are sent and the two nodes involved can have totally +different ideas of what the function itself does without any problems. + +## Hot-swapping code is a convenience, not a crutch + +Hot swapping code is the bees-knees. It lets you not have to worry about +rolling-restarts for trivial code changes, and so adds stability to your +cluster. My warning is that you should not rely on it. If your cluster can't +survive a node being restarted for a code change, then it can't survive if that +node fails completely, or fails and comes back up. Design your system pretending +that hot-swapping does not exist, and only once you've done that allow yourself +to use it. + +## GC sometimes needs a boost + +Erlang garbage collection (GC) acts on a per-erlang-process basis, meaning that +each process decides on its own to garbage collect itself. This is nice because +it means stop-the-world isn't a problem, but it does have some interesting +effects. + +We had a problem with our node memory graphs looking like an upwards facing +line, instead of a nice sinusoid relative to the number of connections during +the day. We couldn't find a memory leak *anywhere*, and so started profiling. We +found that the memory seemed to be comprised of mostly binary data in process +heaps. On a hunch my coworker Mike Cugini (who gets all the credit for this) ran +the following on a node: + +```erlang +lists:foreach(erlang:garbage_collect/1, erlang:processes()). +``` + +and saw memory drop in a huge way. We made that code run every 10 minutes or so +and suddenly our memory problem went away. + +The problem is that we had a lot of processes which individually didn't have +much heap data, but all-together were crushing the box. Each didn't think it had +enough to garbage collect very often, so memory just kept going up. Calling the +above forces all processes to garbage collect, and thus throw away all those +little binary bits they were hoarding. + +## These aren't the solutions you are looking for + +The `erl` process has tons of command-line options which allow you to tweak all +kinds of knobs. We've had tons of performance problems with our application, as +of yet not a single one has been solved with turning one of these knobs. They've +all been design issues or just run-of-the-mill bugs. I'm not saying the knobs +are *never* useful, but I haven't seen it yet. + +## Erlang processes are great, except when they're not + +The erlang model of allowing processes to manage global state works really well +in many cases. Possibly even most cases. There are, however, times when it +becomes a performance problem. This became apparent in the project I was working +on for Grooveshark, which was, at its heart, a pubsub server. + +The architecture was very simple: each channel was managed by a process, client +connection processes subscribed to that channel and received publishes from it. +Easy right? The problem was that extremely high volume channels were simply not +able to keep up with the load. The channel process could do certain things very +fast, but there were some operations which simply took time and slowed +everything down. For example, channels could have arbitrary properties set on +them by their owners. Retrieving an arbitrary property from a channel was a +fairly fast operation: client `call`s the channel process, channel process +immediately responds with the property value. No blocking involved. + +But as soon as there was any kind of call which required the channel process to +talk to yet *another* process (unfortunately necessary), things got hairy. On +high volume channels publishes/gets/set operations would get massively backed up +in the message queue while the process was blocked on another process. We tried +many things, but ultimately gave up on the process-per-channel approach. + +We instead decided on keeping *all* channel state in a transactional database. +When client processes "called" operations on a channel, they really are just +acting on the database data inline, no message passing involved. This means that +read-only operations are super-fast because there is minimal blocking, and if +some random other process is being slow it only affects the one client making +the call which is causing it to be slow, and not holding up a whole host of +other clients. + +## Mnesia might not be what you want + +This one is probably a bit controversial, and definitely subject to use-cases. +Do your own testing and profiling, find out what's right for you. + +Mnesia is erlang's solution for global state. It's an in-memory transactional +database which can scale to N nodes and persist to disk. It is hosted +directly in the erlang processes memory so you interact with it in erlang +directly in your code; no calling out to database drivers and such. Sounds great +right? + +Unfortunately mnesia is not a very full-featured database. It is essentially a +key-value store which can hold arbitrary erlang data-types, albeit in a set +schema which you lay out for it during startup. This means that more complex +types like sorted sets and hash maps (although this was addressed with the +introduction of the map data-type in R17) are difficult to work with within +mnesia. Additionally, erlang's data model of immutability, while awesome +usually, can bite you here because it's difficult (impossible?) to pull out +chunks of data within a record without accessing the whole record. + +For example, when retrieving the list of processes subscribed to a channel our +application doesn't simply pull the full list and iterate over it. This is too +slow, and in some cases the subscriber list was so large it wasn't actually +feasible. The channel process wasn't cleaning up its heap fast enough, so +multiple publishes would end up with multiple copies of the giant list in +memory. This became a problem. Instead we chain spawned processes, each of which +pull a set chunk of the subsciber list, and iterate over that. This is very +difficult to implement in mnesia without pulling the full subscriber list into +the process' memory at some point in the process. + +It is, however, fairly trivial to implement in redis using sorted sets. For this +case, and many other cases after, the motto for performance improvements became +"stick it in redis". The application is at the point where *all* state which +isn't directly tied to a specific connection is kept in redis, encoded using +`term_to_binary`. The performance hit of going to an outside process for data +was actually much less than we'd originally thought, and ended up being a plus +since we had much more freedom to do interesting hacks to speedup up our +accesses. diff --git a/src/_posts/2015-03-11-rabbit-hole.md b/src/_posts/2015-03-11-rabbit-hole.md new file mode 100644 index 0000000..97c2b80 --- /dev/null +++ b/src/_posts/2015-03-11-rabbit-hole.md @@ -0,0 +1,165 @@ +--- +title: Rabbit Hole +description: >- + Complex systems sometimes require complex debugging. +--- + +We've begun rolling out [SkyDNS][skydns] at my job, which has been pretty neat. +We're basing a couple future projects around being able to use it, and it's made +dynamic configuration and service discovery nice and easy. + +This post chronicles catching a bug because of our switch to SkyDNS, and how we +discover its root cause. I like to call these kinds of bugs "rabbit holes"; they +look shallow at first, but anytime you make a little progress forward a little +more is always required, until you discover the ending somewhere totally +unrelated to the start. + +## The Bug + +We are seeing *tons* of these in the SkyDNS log: + +``` +[skydns] Feb 20 17:21:15.168 INFO | no nameservers defined or name too short, can not forward +``` + +I fire up tcpdump to see if I can see anything interesting, and sure enough run +across a bunch of these: + +``` +# tcpdump -vvv -s 0 -l -n port 53 +tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes + ... + $fen_ip.50257 > $skydns_ip.domain: [udp sum ok] 16218+ A? unknown. (25) + $fen_ip.27372 > $skydns_ip.domain: [udp sum ok] 16218+ A? unknown. (25) + $fen_ip.35634 > $skydns_ip.domain: [udp sum ok] 59227+ A? unknown. (25) + $fen_ip.64363 > $skydns_ip.domain: [udp sum ok] 59227+ A? unknown. (25) +``` + +It appears that some of our front end nodes (FENs) are making tons of DNS +fequests trying to find the A record of `unknown`. Something on our FENs is +doing something insane and is breaking. + +## The FENs + +Hopping over to my favorite FEN we're able to see the packets in question +leaving on a tcpdump as well, but that's not helpful for finding the root cause. +We have lots of processes running on the FENs and any number of them could be +doing something crazy. + +We fire up sysdig, which is similar to systemtap and strace in that it allows +you to hook into the kernel and view various kernel activites in real time, but +it's easier to use than both. The following command dumps all UDP packets being +sent and what process is sending them: + +``` +# sysdig fd.l4proto=udp +... +2528950 22:17:35.260606188 0 php-fpm (21477) < connect res=0 tuple=$fen_ip:61173->$skydns_ip:53 +2528961 22:17:35.260611327 0 php-fpm (21477) > sendto fd=102(<4u>$fen_ip:61173->$skydns_ip:53) size=25 tuple=NULL +2528991 22:17:35.260631917 0 php-fpm (21477) < sendto res=25 data=.r...........unknown..... +2530470 22:17:35.261879032 0 php-fpm (21477) > ioctl fd=102(<4u>$fen_ip:61173->$skydns_ip:53) request=541B argument=7FFF82DC8728 +2530472 22:17:35.261880574 0 php-fpm (21477) < ioctl res=0 +2530474 22:17:35.261881226 0 php-fpm (21477) > recvfrom fd=102(<4u>$fen_ip:61173->$skydns_ip:53) size=1024 +2530476 22:17:35.261883424 0 php-fpm (21477) < recvfrom res=25 data=.r...........unknown..... tuple=$skydns_ip:53->$fen_ip:61173 +2530485 22:17:35.261888997 0 php-fpm (21477) > close fd=102(<4u>$fen_ip:61173->$skydns_ip:53) +2530488 22:17:35.261892626 0 php-fpm (21477) < close res=0 +``` + +Aha! We can see php-fpm is requesting something over udp with the string +`unknown` in it. We've now narrowed down the guilty process, the rest should be +easy right? + +## Which PHP? + +Unfortunately we're a PHP shop; knowing that php-fpm is doing something on a FEN +narrows down the guilty codebase little. Taking the FEN out of our load-balancer +stops the requests for `unknown`, so we *can* say that it's some user-facing +code that is the culprit. Our setup on the FENs involves users hitting nginx +for static content and nginx proxying PHP requests back to php-fpm. Since all +our virtual domains are defined in nginx, we are able to do something horrible. + +On the particular FEN we're on we make a guess about which virtual domain the +problem is likely coming from (our main app), and proxy all traffic from all +other domains to a different FEN. We still see requests for `unknown` leaving +the box, so we've narrowed the problem down a little more. + +## The Despair + +Nothing in our code is doing any direct DNS calls as far as we can find, and we +don't see any places PHP might be doing it for us. We have lots of PHP +extensions in place, all written in C and all black boxes; any of them could be +the culprit. Grepping through the likely candidates' source code for the string +`unknown` proves fruitless. + +We try xdebug at this point. xdebug is a profiler for php which will create +cachegrind files for the running code. With cachegrind you can see every +function which was ever called, how long spent within each function, a full +call-graph, and lots more. Unfortunately xdebug outputs cachegrind files on a +per-php-fpm-process basis, and overwrites the previous file on each new request. +So xdebug is pretty much useless, since what is in the cachegrind file isn't +necessarily what spawned the DNS request. + +## Gotcha (sorta) + +We turn back to the tried and true method of dumping all the traffic using +tcpdump and perusing through that manually. + +What we find is that nearly everytime there is a DNS request for `unknown`, if +we scroll up a bit there is (usually) a particular request to memcache. The +requested key is always in the style of `function-name:someid:otherstuff`. When +looking in the code around that function name we find this ominous looking call: + +```php +$ipAddress = getIPAddress(); +$geoipInfo = getCountryInfoFromIP($ipAddress); +``` + +This points us in the right direction. On a hunch we add some debug +logging to print out the `$ipAddress` variable, and sure enough it comes back as +`unknown`. AHA! + +So what we surmise is happening is that for some reason our geoip extension, +which we use to get the location data of an IP address and which +`getCountryInfoFromIP` calls, is seeing something which is *not* an IP address +and trying to resolve it. + +## Gotcha (for real) + +So the question becomes: why are we getting the string `unknown` as an IP +address? + +Adding some debug logging around the area we find before showed that +`$_SERVER['REMOTE_ADDR']`, which is the variable populated with the IP address +of the client, is sometimes `unknown`. We guess that this has something to do +with some magic we are doing on nginx's side to populate `REMOTE_ADDR` with the +real IP address of the client in the case of them going through a proxy. + +Many proxies send along the header `X-Forwarded-For` to indicate the real IP of +the client they're proxying for, otherwise the server would only see the proxy's +IP. In our setup I decided that in those cases we should set the `REMOTE_ADDR` +to the real client IP so our application logic doesn't even have to worry about +it. There are a couple problems with this which render it a bad decision, one +being that if some misbahaving proxy was to, say, start sending +`X-Forwarded-For: unknown` then some written applications might mistake that to +mean the client's IP is `unknown`. + +## The Fix + +The fix here was two-fold: + +1) We now always set `$_SERVER['REMOTE_ADDR']` to be the remote address of the +requests, regardless of if it's a proxy, and also send the application the +`X-Forwarded-For` header to do with as it pleases. + +2) Inside our app we look at all the headers sent and do some processing to +decide what the actual client IP is. PHP can handle a lot more complex logic +than nginx can, so we can do things like check to make sure the IP is an IP, and +also that it's not some NAT'd internal ip, and so forth. + +And that's it. From some weird log messages on our DNS servers to an nginx +mis-configuration on an almost unrelated set of servers, this is one of those +strange bugs that never has a nice solution and goes unsolved for a long time. +Spending the time to dive down the rabbit hole and find the answer is often +tedious, but also often very rewarding. + +[skydns]: https://github.com/skynetservices/skydns diff --git a/src/_posts/2015-07-15-go-http.md b/src/_posts/2015-07-15-go-http.md new file mode 100644 index 0000000..7da7d6b --- /dev/null +++ b/src/_posts/2015-07-15-go-http.md @@ -0,0 +1,547 @@ +--- +title: Go's http package by example +description: >- + The basics of using, testing, and composing apps built using go's net/http + package. +--- + +Go's [http](http://golang.org/pkg/net/http/) package has turned into one of my +favorite things about the Go programming language. Initially it appears to be +somewhat complex, but in reality it can be broken down into a couple of simple +components that are extremely flexible in how they can be used. This guide will +cover the basic ideas behind the http package, as well as examples in using, +testing, and composing apps built with it. + +This guide assumes you have some basic knowledge of what an interface in Go is, +and some idea of how HTTP works and what it can do. + +## Handler + +The building block of the entire http package is the `http.Handler` interface, +which is defined as follows: + +```go +type Handler interface { + ServeHTTP(ResponseWriter, *Request) +} +``` + +Once implemented the `http.Handler` can be passed to `http.ListenAndServe`, +which will call the `ServeHTTP` method on every incoming request. + +`http.Request` contains all relevant information about an incoming http request +which is being served by your `http.Handler`. + +The `http.ResponseWriter` is the interface through which you can respond to the +request. It implements the `io.Writer` interface, so you can use methods like +`fmt.Fprintf` to write a formatted string as the response body, or ones like +`io.Copy` to write out the contents of a file (or any other `io.Reader`). The +response code can be set before you begin writing data using the `WriteHeader` +method. + +Here's an example of an extremely simple http server: + +```go +package main + +import ( + "fmt" + "log" + "net/http" +) + +type helloHandler struct{} + +func (h helloHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "hello, you've hit %s\n", r.URL.Path) +} + +func main() { + err := http.ListenAndServe(":9999", helloHandler{}) + log.Fatal(err) +} +``` + +`http.ListenAndServe` serves requests using the handler, listening on the given +address:port. It will block unless it encounters an error listening, in which +case we `log.Fatal`. + +Here's an example of using this handler with curl: + +``` + ~ $ curl localhost:9999/foo/bar + hello, you've hit /foo/bar +``` + + +## HandlerFunc + +Often defining a full type to implement the `http.Handler` interface is a bit +overkill, especially for extremely simple `ServeHTTP` functions like the one +above. The `http` package provides a helper function, `http.HandlerFunc`, which +wraps a function which has the signature +`func(w http.ResponseWriter, r *http.Request)`, returning an `http.Handler` +which will call it in all cases. + +The following behaves exactly like the previous example, but uses +`http.HandlerFunc` instead of defining a new type. + +```go +package main + +import ( + "fmt" + "log" + "net/http" +) + +func main() { + h := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "hello, you've hit %s\n", r.URL.Path) + }) + + err := http.ListenAndServe(":9999", h) + log.Fatal(err) +} +``` + +## ServeMux + +On their own, the previous examples don't seem all that useful. If we wanted to +have different behavior for different endpoints we would end up with having to +parse path strings as well as numerous `if` or `switch` statements. Luckily +we're provided with `http.ServeMux`, which does all of that for us. Here's an +example of it being used: + +```go +package main + +import ( + "fmt" + "log" + "net/http" +) + +func main() { + h := http.NewServeMux() + + h.HandleFunc("/foo", func(w http.ResponseWriter, r *http.Request) { + fmt.Fprintln(w, "Hello, you hit foo!") + }) + + h.HandleFunc("/bar", func(w http.ResponseWriter, r *http.Request) { + fmt.Fprintln(w, "Hello, you hit bar!") + }) + + h.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(404) + fmt.Fprintln(w, "You're lost, go home") + }) + + err := http.ListenAndServe(":9999", h) + log.Fatal(err) +} +``` + +The `http.ServeMux` is itself an `http.Handler`, so it can be passed into +`http.ListenAndServe`. When it receives a request it will check if the request's +path is prefixed by any of its known paths, choosing the longest prefix match it +can find. We use the `/` endpoint as a catch-all to catch any requests to +unknown endpoints. Here's some examples of it being used: + +``` + ~ $ curl localhost:9999/foo +Hello, you hit foo! + + ~ $ curl localhost:9999/bar +Hello, you hit bar! + + ~ $ curl localhost:9999/baz +You're lost, go home +``` + +`http.ServeMux` has both `Handle` and `HandleFunc` methods. These do the same +thing, except that `Handle` takes in an `http.Handler` while `HandleFunc` merely +takes in a function, implicitly wrapping it just as `http.HandlerFunc` does. + +### Other muxes + +There are numerous replacements for `http.ServeMux` like +[gorilla/mux](http://www.gorillatoolkit.org/pkg/mux) which give you things like +automatically pulling variables out of paths, easily asserting what http methods +are allowed on an endpoint, and more. Most of these replacements will implement +`http.Handler` like `http.ServeMux` does, and accept `http.Handler`s as +arguments, and so are easy to use in conjunction with the rest of the things +I'm going to talk about in this post. + +## Composability + +When I say that the `http` package is composable I mean that it is very easy to +create re-usable pieces of code and glue them together into a new working +application. The `http.Handler` interface is the way all pieces communicate with +each other. Here's an example of where we use the same `http.Handler` to handle +multiple endpoints, each slightly differently: + +```go +package main + +import ( + "fmt" + "log" + "net/http" +) + +type numberDumper int + +func (n numberDumper) ServeHTTP(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "Here's your number: %d\n", n) +} + +func main() { + h := http.NewServeMux() + + h.Handle("/one", numberDumper(1)) + h.Handle("/two", numberDumper(2)) + h.Handle("/three", numberDumper(3)) + h.Handle("/four", numberDumper(4)) + h.Handle("/five", numberDumper(5)) + + h.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(404) + fmt.Fprintln(w, "That's not a supported number!") + }) + + err := http.ListenAndServe(":9999", h) + log.Fatal(err) +} +``` + +`numberDumper` implements `http.Handler`, and can be passed into the +`http.ServeMux` multiple times to serve multiple endpoints. Here's it in action: + +``` + ~ $ curl localhost:9999/one +Here's your number: 1 + ~ $ curl localhost:9999/five +Here's your number: 5 + ~ $ curl localhost:9999/bazillion +That's not a supported number! +``` + +## Testing + +Testing http endpoints is extremely easy in Go, and doesn't even require you to +actually listen on any ports! The `httptest` package provides a few handy +utilities, including `NewRecorder` which implements `http.ResponseWriter` and +allows you to effectively make an http request by calling `ServeHTTP` directly. +Here's an example of a test for our previously implemented `numberDumper`, +commented with what exactly is happening: + +```go +package main + +import ( + "fmt" + "net/http" + "net/http/httptest" + . "testing" +) + +func TestNumberDumper(t *T) { + // We first create the http.Handler we wish to test + n := numberDumper(1) + + // We create an http.Request object to test with. The http.Request is + // totally customizable in every way that a real-life http request is, so + // even the most intricate behavior can be tested + r, _ := http.NewRequest("GET", "/one", nil) + + // httptest.Recorder implements the http.ResponseWriter interface, and as + // such can be passed into ServeHTTP to receive the response. It will act as + // if all data being given to it is being sent to a real client, when in + // reality it's being buffered for later observation + w := httptest.NewRecorder() + + // Pass in our httptest.Recorder and http.Request to our numberDumper. At + // this point the numberDumper will act just as if it was responding to a + // real request + n.ServeHTTP(w, r) + + // httptest.Recorder gives a number of fields and methods which can be used + // to observe the response made to our request. Here we check the response + // code + if w.Code != 200 { + t.Fatalf("wrong code returned: %d", w.Code) + } + + // We can also get the full body out of the httptest.Recorder, and check + // that its contents are what we expect + body := w.Body.String() + if body != fmt.Sprintf("Here's your number: 1\n") { + t.Fatalf("wrong body returned: %s", body) + } + +} +``` + +In this way it's easy to create tests for your individual components that you +are using to build your application, keeping the tests near to the functionality +they're testing. + +Note: if you ever do need to spin up a test server in your tests, `httptest` +also provides a way to create a server listening on a random open port for use +in tests as well. + +## Middleware + +Serving endpoints is nice, but often there's functionality you need to run for +*every* request before the actual endpoint's handler is run. For example, access +logging. A middleware component is one which implements `http.Handler`, but will +actually pass the request off to another `http.Handler` after doing some set of +actions. The `http.ServeMux` we looked at earlier is actually an example of +middleware, since it passes the request off to another `http.Handler` for actual +processing. Here's an example of our previous example with some logging +middleware: + +```go +package main + +import ( + "fmt" + "log" + "net/http" +) + +type numberDumper int + +func (n numberDumper) ServeHTTP(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "Here's your number: %d\n", n) +} + +func logger(h http.Handler) http.Handler { + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + log.Printf("%s requested %s", r.RemoteAddr, r.URL) + h.ServeHTTP(w, r) + }) +} + +func main() { + h := http.NewServeMux() + + h.Handle("/one", numberDumper(1)) + h.Handle("/two", numberDumper(2)) + h.Handle("/three", numberDumper(3)) + h.Handle("/four", numberDumper(4)) + h.Handle("/five", numberDumper(5)) + + h.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(404) + fmt.Fprintln(w, "That's not a supported number!") + }) + + hl := logger(h) + + err := http.ListenAndServe(":9999", hl) + log.Fatal(err) +} +``` + +`logger` is a function which takes in an `http.Handler` called `h`, and returns +a new `http.Handler` which, when called, will log the request it was called with +and then pass off its arguments to `h`. To use it we pass in our +`http.ServeMux`, so all incoming requests will first be handled by the logging +middleware before being passed to the `http.ServeMux`. + +Here's an example log entry which is output when the `/five` endpoint is hit: + +``` +2015/06/30 20:15:41 [::1]:34688 requested /five +``` + +## Middleware chaining + +Being able to chain middleware together is an incredibly useful ability which we +get almost for free, as long as we use the signature +`func(http.Handler) http.Handler`. A middleware component returns the same type +which is passed into it, so simply passing the output of one middleware +component into the other is sufficient. + +However, more complex behavior with middleware can be tricky. For instance, what +if you want a piece of middleware which takes in a parameter upon creation? +Here's an example of just that, with a piece of middleware which will set a +header and its value for all requests: + +```go +package main + +import ( + "fmt" + "log" + "net/http" +) + +type numberDumper int + +func (n numberDumper) ServeHTTP(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "Here's your number: %d\n", n) +} + +func logger(h http.Handler) http.Handler { + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + log.Printf("%s requested %s", r.RemoteAddr, r.URL) + h.ServeHTTP(w, r) + }) +} + +type headerSetter struct { + key, val string + handler http.Handler +} + +func (hs headerSetter) ServeHTTP(w http.ResponseWriter, r *http.Request) { + w.Header().Set(hs.key, hs.val) + hs.handler.ServeHTTP(w, r) +} + +func newHeaderSetter(key, val string) func(http.Handler) http.Handler { + return func(h http.Handler) http.Handler { + return headerSetter{key, val, h} + } +} + +func main() { + h := http.NewServeMux() + + h.Handle("/one", numberDumper(1)) + h.Handle("/two", numberDumper(2)) + h.Handle("/three", numberDumper(3)) + h.Handle("/four", numberDumper(4)) + h.Handle("/five", numberDumper(5)) + + h.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(404) + fmt.Fprintln(w, "That's not a supported number!") + }) + + hl := logger(h) + hhs := newHeaderSetter("X-FOO", "BAR")(hl) + + err := http.ListenAndServe(":9999", hhs) + log.Fatal(err) +} +``` + +And here's the curl output: + +``` + ~ $ curl -i localhost:9999/three + HTTP/1.1 200 OK + X-Foo: BAR + Date: Wed, 01 Jul 2015 00:39:48 GMT + Content-Length: 22 + Content-Type: text/plain; charset=utf-8 + + Here's your number: 3 + +``` + +`newHeaderSetter` returns a function which accepts and returns an +`http.Handler`. Calling that returned function with an `http.Handler` then gets +you an `http.Handler` which will set the header given to `newHeaderSetter` +before continuing on to the given `http.Handler`. + +This may seem like a strange way of organizing this; for this example the +signature for `newHeaderSetter` could very well have looked like this: + +``` +func newHeaderSetter(key, val string, h http.Handler) http.Handler +``` + +And that implementation would have worked fine. But it would have been more +difficult to compose going forward. In the next section I'll show what I mean. + +## Composing middleware with alice + +[Alice](https://github.com/justinas/alice) is a very simple and convenient +helper for working with middleware using the function signature we've been using +thusfar. Alice is used to create and use chains of middleware. Chains can even +be appended to each other, giving even further flexibility. Here's our previous +example with a couple more headers being set, but also using alice to manage the +added complexity. + +```go +package main + +import ( + "fmt" + "log" + "net/http" + + "github.com/justinas/alice" +) + +type numberDumper int + +func (n numberDumper) ServeHTTP(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "Here's your number: %d\n", n) +} + +func logger(h http.Handler) http.Handler { + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + log.Printf("%s requested %s", r.RemoteAddr, r.URL) + h.ServeHTTP(w, r) + }) +} + +type headerSetter struct { + key, val string + handler http.Handler +} + +func (hs headerSetter) ServeHTTP(w http.ResponseWriter, r *http.Request) { + w.Header().Set(hs.key, hs.val) + hs.handler.ServeHTTP(w, r) +} + +func newHeaderSetter(key, val string) func(http.Handler) http.Handler { + return func(h http.Handler) http.Handler { + return headerSetter{key, val, h} + } +} + +func main() { + h := http.NewServeMux() + + h.Handle("/one", numberDumper(1)) + h.Handle("/two", numberDumper(2)) + h.Handle("/three", numberDumper(3)) + h.Handle("/four", numberDumper(4)) + + fiveHS := newHeaderSetter("X-FIVE", "the best number") + h.Handle("/five", fiveHS(numberDumper(5))) + + h.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(404) + fmt.Fprintln(w, "That's not a supported number!") + }) + + chain := alice.New( + newHeaderSetter("X-FOO", "BAR"), + newHeaderSetter("X-BAZ", "BUZ"), + logger, + ).Then(h) + + err := http.ListenAndServe(":9999", chain) + log.Fatal(err) +} +``` + +In this example all requests will have the headers `X-FOO` and `X-BAZ` set, but +the `/five` endpoint will *also* have the `X-FIVE` header set. + +## Fin + +Starting with a simple idea of an interface, the `http` package allows us to +create for ourselves an incredibly useful and flexible (yet still rather simple) +ecosystem for building web apps with re-usable components, all without breaking +our static checks. diff --git a/src/_posts/2015-11-21-happy-trees.md b/src/_posts/2015-11-21-happy-trees.md new file mode 100644 index 0000000..8d36a91 --- /dev/null +++ b/src/_posts/2015-11-21-happy-trees.md @@ -0,0 +1,235 @@ +--- +title: Happy Trees +description: >- + Visualizing a forest of happy trees. +--- + +Source code related to this post is available [here](https://github.com/mediocregopher/happy-tree). + +This project was inspired by [this video](https://www.youtube.com/watch?v=_DpzAvb3Vk4), +which you should watch first in order to really understand what's going on. + +My inspiration came from his noting that happification could be done on numbers +in bases other than 10. I immediately thought of hexadecimal, base-16, since I'm +a programmer and that's what I think of. I also was trying to think of how one +would graphically represent a large happification tree, when I realized that +hexadecimal numbers are colors, and colors graphically represent things nicely! + +## Colors + +Colors to computers are represented using 3-bytes, encompassing red, green, and +blue. Each byte is represented by two hexadecimal digits, and they are appended +together. For example `FF0000` represents maximum red (`FF`) added to no green +and no blue. `FF5500` represents maximum red (`FF`), some green (`55`) and no +blue (`00`), which when added together results in kind of an orange color. + +## Happifying colors + +In base 10, happifying a number is done by splitting its digits, squaring each +one individually, and adding the resulting numbers. The principal works the same +for hexadecimal numbers: + +``` +A4F +A*A + 4*4 + F*F +64 + 10 + E1 +155 // 341 in decimal +``` + +So if all colors are 6-digit hexadecimal numbers, they can be happified easily! + +``` +FF5500 +F*F + F*F + 5*5 + 5*5 + 0*0 + 0*0 +E1 + E1 + 19 + 19 + 0 + 0 +0001F4 +``` + +So `FF5500` (an orangish color) happifies to `0001F4` (a darker blue). Since +order of digits doesn't matter, `5F50F0` also happifies to `0001F4`. From this +fact, we can make a tree (hence the happification tree). I can do this process +on every color from `000000` (black) to `FFFFFF` (white), so I will! + +## Representing the tree + +So I know I can represent the tree using color, but there's more to decide on +than that. The easy way to represent a tree would be to simply draw a literal +tree graph, with a circle for each color and lines pointing to its parent and +children. But this is boring, and also if I want to represent *all* colors the +resulting image would be enormous and/or unreadable. + +I decided on using a hollow, multi-level pie-chart. Using the example +of `000002`, it would look something like this: + +![An example of a partial multi-level pie chart](/img/happy-tree/partial.png) + +The inner arc represents the color `000002`. The second arc represents the 15 +different colors which happify into `000002`, each of them may also have their +own outer arc of numbers which happify to them, and so on. + +This representation is nice because a) It looks cool and b) it allows the +melancoils of the hexadecimals to be placed around the happification tree +(numbers which happify into `000001`), which is convenient. It's also somewhat +easier to code than a circle/branch based tree diagram. + +An important feature I had to implement was proportional slice sizes. If I were +to give each child of a color an equal size on that arc's edge the image would +simply not work. Some branches of the tree are extremely deep, while others are +very shallow. If all were given the same space, those deep branches wouldn't +even be representable by a single pixel's width, and would simply fail to show +up. So I implemented proportional slice sizes, where the size of every slice is +determined to be proportional to how many total (recursively) children it has. +You can see this in the above example, where the second level arc is largely +comprised of one giant slice, with many smaller slices taking up the end. + +## First attempt + +My first attempt resulted in this image (click for 5000x5000 version): + +[![Result of first attempt](/img/happy-tree/happy-tree-atmp1-small.png)](/img/happy-tree/happy-tree-atmp1.png) + +The first thing you'll notice is that it looks pretty neat. + +The second thing you'll notice is that there's actually only one melancoil in +the 6-digit hexadecimal number set. The innermost black circle is `000000` which +only happifies to itself, and nothing else will happify to it (sad `000000`). +The second circle represents `000001`, and all of its runty children. And +finally the melancoil, comprised of: + +``` +00000D -> 0000A9 -> 0000B5 -> 000092 -> 000055 -> 00003 -> ... +``` + +The final thing you'll notice (or maybe it was the first, since it's really +obvious) is that it's very blue. Non-blue colors are really only represented as +leaves on their trees and don't ever really have any children of their own, so +the blue and black sections take up vastly more space. + +This makes sense. The number which should generate the largest happification +result, `FFFFFF`, only results in `000546`, which is primarily blue. So in effect +all colors happify to some shade of blue. + +This might have been it, technically this is the happification tree and the +melancoil of 6 digit hexadecimal numbers represented as colors. But it's also +boring, and I wanted to do better. + +## Second attempt + +The root of the problem is that the definition of "happification" I used +resulted in not diverse enough results. I wanted something which would give me +numbers where any of the digits could be anything. Something more random. + +I considered using a hash instead, like md5, but that has its own problems. +There's no gaurantee that any number would actually reach `000001`, which isn't +required but it's a nice feature that I wanted. It also would be unlikely that +there would be any melancoils that weren't absolutely gigantic. + +I ended up redefining what it meant to happify a hexadecimal number. Instead of +adding all the digits up, I first split up the red, green, and blue digits into +their own numbers, happified those numbers, and finally reassembled the results +back into a single number. For example: + +``` +FF5500 +FF, 55, 00 +F*F + F*F, 5*5 + 5*5, 0*0 + 0*0 +1C2, 32, 00 +C23200 +``` + +I drop that 1 on the `1C2`, because it has no place in this system. Sorry 1. + +Simply replacing that function resulted in this image (click for 5000x5000) version: + +[![Result of second attempt](/img/happy-tree/happy-tree-atmp2-small.png)](/img/happy-tree/happy-tree-atmp2.png) + +The first thing you notice is that it's so colorful! So that goal was achieved. + +The second thing you notice is that there's *significantly* more melancoils. +Hundreds, even. Here's a couple of the melancoils (each on its own line): + +``` +00000D -> 0000A9 -> 0000B5 -> 000092 -> 000055 -> 000032 -> ... +000D0D -> 00A9A9 -> 00B5B5 -> 009292 -> 005555 -> 003232 -> ... +0D0D0D -> A9A9A9 -> B5B5B5 -> 929292 -> 555555 -> 323232 -> ... +0D0D32 -> A9A90D -> B5B5A9 -> 9292B5 -> 555592 -> 323255 -> ... +... +``` + +And so on. You'll notice the first melancoil listed is the same as the one from +the first attempt. You'll also notice that the same numbers from the that +melancoil are "re-used" in the rest of them as well. The second coil listed is +the same as the first, just with the numbers repeated in the 3rd and 4th digits. +The third coil has those numbers repeated once more in the 1st and 2nd digits. +The final coil is the same numbers, but with the 5th and 6th digits offset one +place in the rotation. + +The rest of the melancoils in this attempt work out to just be every conceivable +iteration of the above. This is simply a property of the algorithm chosen, and +there's not a whole lot we can do about it. + +## Third attempt + +After talking with [Mr. Marco](/members/#marcopolo) about the previous attempts +I got an idea that would lead me towards more attempts. The main issue I was +having in coming up with new happification algorithms was figuring out what to +do about getting a number greater than `FFFFFF`. Dropping the leading digits +just seemed.... lame. + +One solution I came up with was to simply happify again. And again, and again. +Until I got a number less than or equal to `FFFFFF`. + +With this new plan, I could increase the power by which I'm raising each +individual digit, and drop the strategy from the second attempt of splitting the +number into three parts. In the first attempt I was doing happification to the +power of 2, but what if I wanted to happify to the power of 6? It would look +something like this (starting with the number `34BEEF`): + +``` +34BEEF +3^6 + 4^6 + B^6 + E^6 + E^6 + E^6 + F^6 +2D9 + 1000 + 1B0829 + 72E440 + 72E440 + ADCEA1 +1AEB223 + +1AEB223 is greater than FFFFFF, so we happify again + +1^6 + A^6 + E^6 + B^6 + 2^6 + 2^6 + 3^6 +1 + F4240 + 72E440 + 1B0829 + 40 + 40 + 2D9 +9D3203 +``` + +So `34BEEF` happifies to `9D3203`, when happifying to the power of 6. + +As mentioned before the first attempt in this blog was the 2nd power tree, +here's the trees for the 3rd, 4th, 5th, and 6th powers (each image is a link to +a larger version): + +3rd power: +[![Third attempt, 3rd power](/img/happy-tree/happy-tree-atmp3-pow3-small.png)](/img/happy-tree/happy-tree-atmp3-pow3.png) + +4th power: +[![Third attempt, 4th power](/img/happy-tree/happy-tree-atmp3-pow4-small.png)](/img/happy-tree/happy-tree-atmp3-pow4.png) + +5th power: +[![Third attempt, 5th power](/img/happy-tree/happy-tree-atmp3-pow5-small.png)](/img/happy-tree/happy-tree-atmp3-pow5.png) + +6th power: +[![Third attempt, 6th power](/img/happy-tree/happy-tree-atmp3-pow6-small.png)](/img/happy-tree/happy-tree-atmp3-pow6.png) + +A couple things to note: + +* 3-5 are still very blue. It's not till the 6th power that the distribution + becomes random enough to become very colorful. + +* Some powers have more coils than others. Power of 3 has a lot, and actually a + lot of them aren't coils, but single narcissistic numbers. Narcissistic + numbers are those which happify to themselves. `000000` and `000001` are + narcissistic numbers in all powers, power of 3 has quite a few more. + +* 4 looks super cool. + +Using unsigned 64-bit integers I could theoretically go up to the power of 15. +But I hit a roadblock at power of 7, in that there's actually a melancoil which +occurs whose members are all greater than `FFFFFF`. This means that my strategy +of repeating happifying until I get under `FFFFFF` doesn't work for any numbers +which lead into that coil. diff --git a/src/_posts/2017-09-06-brian-bars.md b/src/_posts/2017-09-06-brian-bars.md new file mode 100644 index 0000000..2c56272 --- /dev/null +++ b/src/_posts/2017-09-06-brian-bars.md @@ -0,0 +1,105 @@ +--- +title: Brian Bars +description: >- + Cheap and easy to make, healthy, vegan, high-carb, high-protein. "The Good + Stuff". +updated: 2018-01-18 +--- + +It actually blows my mind it's been 4 years since I used this blog. It was +previously a tech blog, but then I started putting all my tech-related posts on +[the cryptic blog](https://cryptic.io). As of now this is a lifestyle/travel +blog. The me of 4 years ago would be horrified. + +Now I just have to come up with a lifestyle and do some traveling. + +## Recipe + +This isn't a real recipe because I'm not going to preface it with my entire +fucking life story. Let's talk about the food. + +Brian bars: + +* Are like Clif Bars, but with the simplicity of ingredients that Larabars have. +* Are easy to make, only needing a food processor (I use a magic bullet) and a + stovetop oven. +* Keep for a long time and don't really need refrigerating (but don't mind it + neither) +* Are paleo, vegan, gluten-free, free-range, grass-fed, whatever... +* Are really really filling. +* Are named after me, deal with it. + +I've worked on this recipe for a bit, trying to make it workable, and will +probably keep adjusting it (and this post) as time goes on. + +### Ingredients + +Nuts and seeds. Most of this recipe is nuts and seeds. Here's the ones I used: + +* 1 cup almonds +* 1 cup peanuts +* 1 cup walnuts +* 1 cup coconut flakes/shavings/whatever +* 1/2 cup flax seeds +* 1/2 cup sesame seeds + +For all of those above it doesn't _really_ matter what nuts/seeds you use, it's +all gonna get ground up anyway. So whatever's cheap works fine. Also, avoid +salt-added ones if you can. + +The other ingredients are: + +* 1 cup raisins/currants +* 1.5 lbs of pitted dates (no added sugar! you don't need it!) +* 2 cups oats + +### Grind up the nuts + +Throw the nuts into the food processor and grind them into a powder. Then throw +that powder into a bowl along with the seeds, coconuts, raisins, and oats, and +mix em good. + +I don't _completely_ grind up the nuts, instead leaving some chunks in it here +and there, but you do you. + +### Prepare the dates + +This is the harder part, and is what took me a couple tries to get right. The +best strategy I've found is to steam the dates a bit over a stove to soften +them. Then, about a cup at a time, you can throw them in the food processor and +turn them into a paste. You may have to add a little water if your processor is +having trouble. + +Once processed you can add the dates to the mix from before and stir it all up. +It'll end up looking something like cookie dough. Except unlike cookie dough +it's completely safe to eat and maybe sorta healthy. + +### Bake it, Finish it + +Put the dough stuff in a pan of some sort, flatten it out, and stick it in the +oven at like 250 or 300 for a few hours. You're trying to cook out the water you +added earlier when you steamed the dates, as well as whatever little moisture +the dates had in the first place. + +Once thoroughly baked you can stick the pan in the fridge to cool and keep, +and/or cut it up into individual bars. Keep in mind that the bars are super +filling and allow for pretty small portions. Wrap em in foil or plastic wrap and +take them to-go, or keep them around for a snack. Or both. Or whatever you want +to do, it's your food. + +### Cleanup + +Dates are simultaneously magical and the most annoying thing to work with, so +there's cleanup problems you may run into with them: + +Protip #1: When cleaning your processed date slime off of your cooking utensils +I'd recommend just letting them soak in water for a while. Dry-ish date slime +will stick to everything, while soaked date slime will come right off. + +Protip #2: Apparently if you want ants, dates are a great way to get ants. My +apartment has never had an ant problem until 3 hours after I made a batch of +these and didn't wipe down my counter enough. I'm still dealing with the ants. +Apparently there's enviromentally friendly ant poisons where the ants happily +carry the poison back into the nest and the whole nest eats it and dies. Which +feels kinda mean in some way, but is also pretty clever and they're just ants +anyway so fuck it. diff --git a/src/_posts/2018-10-25-rethinking-identity.md b/src/_posts/2018-10-25-rethinking-identity.md new file mode 100644 index 0000000..d3520d7 --- /dev/null +++ b/src/_posts/2018-10-25-rethinking-identity.md @@ -0,0 +1,292 @@ +--- +title: Rethinking Identity +description: >- + A more useful way of thinking about identity on the internet, and using that + to build a service which makes our online life better. +--- + +In my view, the major social media platforms (Facebook, Twitter, Instagram, +etc...) are broken. They worked well at small scales, but billions of people are +now exposed to them, and [Murphy's Law][murphy] has come into effect. The weak +points in the platforms have been found and exploited, to the point where +they're barely usable for interacting with anyone you don't already know in +person. + +[murphy]: https://en.wikipedia.org/wiki/Murphy%27s_law + +On the other hand, social media, at its core, is a powerful tool that humans +have developed, and it's not one to be thrown away lightly (if it can be thrown +away at all). It's worthwhile to try and fix it. So that's what this post is +about. + +A lot of moaning and groaning has already been done on how social media is toxic +for the average person. But the average person isn't doing anything more than +receiving and reacting to their environment. If that environment is toxic, the +person in it becomes so as well. It's certainly possible to filter the toxicity +out, and use a platform to your own benefit, but that takes work on the user's +part. It would be nice to think that people will do more than follow the path of +least resistance, but at scale that's simply not how reality is, and people +shouldn't be expected to do that work. + +To identify what has become toxic about the platforms, first we need to identify +what a non-toxic platform would look like. + +The ideal definition for social media is to give people a place to socialize +with friends, family, and the rest of the world. Defining "socialize" is tricky, +and probably an exercise only a socially awkward person who doesn't do enough +socializing would undertake. "Expressing one's feelings, knowledge, and +experiences to other people, and receiving theirs in turn" feels like a good +approximation. A platform where true socializing was the only activity would be +ideal. + +Here are some trends on our social media which have nothing to do with +socializing: artificially boosted follower numbers on Instagram to obtain +product sponsors, shills in Reddit comments boosting a product or company, +russian trolls on Twitter spreading propaganda, trolls everywhere being dicks +and switching IPs when they get banned, and [that basketball president whose +wife used burner Twitter accounts to trash talk players][president]. + +[president]: https://www.nytimes.com/2018/06/07/sports/bryan-colangelo-sixers-wife.html + +These are all examples of how anonymity can be abused on social media. I want +to say up front that I'm _not_ against anonymity on the internet, and that I +think we can have our cake and eat it too. But we _should_ acknowledge the +direct and indirect problems anonymity causes. We can't trust that anyone on +social media is being honest about who they are and what their motivation is. +This problem extends outside of social media too, to Amazon product reviews (and +basically any other review system), online polls and raffles, multiplayer games, +and surely many other other cases. + +## Identity + +To fix social media, and other large swaths of the internet, we need to rethink +identity. This process started for me a long time ago, when I watched [this TED +talk][identity], which discusses ways in which we misunderstand identity. +Crucially, David Birch points out that identity is not a name, it's more +fundamental than that. + +[identity]: https://www.ted.com/talks/david_birch_identity_without_a_name + +In the context of online platforms, where a user creates an account which +identifies them in some way, identity breaks down into 3 distinct problems +which are often conflated: + +* Authentication: Is this identity owned by this person? +* Differentiation: Is this identity unique to this person? +* Authorization: Is this identity allowed to do X? + +For internet platform developers, authentication has been given the full focus. +Blog posts, articles, guides, and services abound which deal with properly +hashing and checking passwords, two factor authentication, proper account +recovery procedure, etc... While authentication is not a 100% solved problem, +it's had the most work done on it, and the problems which this post deals with +are not affected by it. + +The problem which should instead be focused on is differentiation. + +## Differentiation + +I want to make very clear, once more, that I am _not_ in favor of de-anonymizing +the web, and doing so is not what I'm proposing. + +Differentiation is without a doubt the most difficult identity problem to solve. +It's not even clear that it's solvable offline. Take this situation: you are in +a room, and you are told that one person is going to walk in, then leave, then +another person will do the same. These two persons may or may not be the same +person. You're allowed to do anything you like to each person (with their +consent) in order to determine if they are the same person or not. + +For the vast, vast majority of cases you can simply look with your eyeballs and +see if they are different people. But this will not work 100% of the time. +Identical twins are an obvious example of two persons looking like one, but a +malicious actor with a disguise might be one person posing as two. Biometrics +like fingerprints, iris scanning, and DNA testing fail for many reasons (the +identical twin case being one). You could attempt to give the first a unique +marking on their skin, but who's to say they don't have a solvent, which can +clean that marking off, waiting right outside the door? + +The solutions and refutations can continue on pedantically for some time, but +the point is that there is likely not a 100% solution, and even the 90% +solutions require significant investment. Differentiation is a hard problem, +which most developers don't want to solve. Most are fine with surrogates like +checking that an email or phone number is unique to the platform, but these +aren't enough to stop a dedicated individual or organization. + +### Roll Your Own Differentiation + +If a platform wants to roll their own solution to the differentiation problem, a +proper solution, it might look something like this: + +* Submit an image of your passport, or other government issued ID. This would + have to be checked against the appropriate government agency to ensure the + ID is legitimate. + +* Submit an image of your face, alongside a written note containing a code given + by the platform. Software to detect manipulated images would need to be + employed, as well as reverse image searching to ensure the image isn't being + reused. + +* Once completed, all data needs to be hashed/fingerprinted and then destroyed, + so sensitive data isn't sitting around on servers, but can still be checked + against future users signing up for the platform. + +* A dedicated support team would be needed to handle edge-cases and mistakes. + +None of these is trivial, nor would I trust an up-and-coming platform which is +being bootstrapped out of a basement to implement any of them correctly. +Additionally, going through with this process would be a _giant_ point of +friction for a user creating a new account; they likely would go use a different +platform instead, which didn't have all this nonsense required. + +### Differentiation as a Service + +This is the crux of this post. + +Instead of each platform rolling their own differentiation, what if there was a +service for it. Users would still have to go through the hassle described above, +but only once forever, and on a more trustable site. Then platforms, no matter +what stage of development they're at, could use that service to ensure that +their community of users is free from the problems of fake accounts and trolls. + +This is what the service would look like: + +* A user would have to, at some point, have gone through the steps above to + create an account on the differentiation-as-a-service (DaaS) platform. This + account would have the normal authentication mechanisms that most platforms + do (password, two-factor, etc...). + +* When creating an account on a new platform, the user would login to their DaaS + account (similar to the common "login with Google/Facebook/Twitter" buttons). + +* The DaaS then returns an opaque token, an effectively random string which + uniquely identifies that user, to the platform. The platform can then check in + its own user database for any other users using that token, and know if the + user already has an account. All of this happens without any identifying + information being passed to the platform. + +Similar to how many sites outsource to Cloudflare to handle DDoS protection, +which is better handled en masse by people familiar with the problem, the DaaS +allows for outsourcing the problem of differentiation. Users are more likely to +trust an established DaaS service than a random website they're signing up for. +And signing up for a DaaS is a one-time event, so if enough platforms are using +the DaaS it could become worthwhile for them to do so. + +Finally, since the DaaS also handles authentication, a platform could outsource +that aspect of identity management to it as well. This is optional for the +platform, but for smaller platforms which are just starting up it might be +worthwhile to save that development time. + +### Traits of a Successful DaaS + +It's possible for me to imagine a world where use of DaaS' is common, but +bridging the gap between that world and this one is not as obvious. Still, I +think it's necessary if the internet is to ever evolve passed being, primarily, +a home for trolls. There are a number of traits of an up-and-coming DaaS which +would aid it in being accepted by the internet: + +* **Patience**: there is a critical mass of users and platforms using DaaS' + where it becomes more advantageous for platforms to use the DaaS than not. + Until then, the DaaS and platforms using it need to take deliberate but small + steps. For example: making DaaS usage optional for platform users, and giving + their accounts special marks to indicate they're "authentic" (like Twitter's + blue checkmark); giving those users' activity higher weight in algorithms; + allowing others to filter out activity of non-"authentic" users; etc... These + are all preliminary steps which can be taken which encourage but don't require + platform users to use a DaaS. + +* **User-friendly**: most likely the platforms using a DaaS are what are going + to be paying the bills. A successful DaaS will need to remember that, no + matter where the money comes from, if the users aren't happy they'll stop + using the DaaS, and platforms will be forced to switch to a different one or + stop using them altogether. User-friendliness means more than a nice + interface; it means actually caring for the users' interests, taking their + privacy and security seriously, and in all other aspects being on their side. + In that same vein, competition is important, and so... + +* **No country/government affiliation**: If the DaaS was to be run by a + government agency it would have no incentive to provide a good user + experience, since the users aren't paying the bills (they might not even be in + that country). A DaaS shouldn't be exclusive to any one government or country + anyway. Perhaps it starts out that way, to get off the ground, but ultimately + the internet is a global institution, and is healthiest when it's connecting + individuals _around the world_. A successful DaaS will reach beyond borders + and try to connect everyone. + +Obviously actually starting a DaaS would be a huge undertaking, and would +require proper management and good developers and all that, but such things +apply to most services. + +## Authorization + +The final aspect of identity management, which I haven't talked about yet, is +authorization. This aspect deals with what a particular identity is allowed to +do. For example, is an identity allowed to claim they have a particular name, or +are from a particular place, or are of a particular age? Other things like +administration and moderation privileges also fall under authorization, but they +are generally defined and managed within a platform. + +A DaaS has the potential to help with authorization as well, though with a giant +caveat. If a DaaS were to not fingerprint and destroy the user's data, like +their name and birthday and whatnot, but instead store them, then the following +use-case could also be implemented: + +* A platform wants to know if a user is above a certain age, let's say. It asks + the DaaS for that information. + +* The DaaS asks the user, OAuth style, whether the user is ok with giving the + platform that information. + +* If so, the platform is given that information. + +This is a tricky situation. It adds a lot of liablity for the user, since their +raw data will be stored with the DaaS, ripe for hacking. It also places a lot of +trust with the DaaS to be responsible with users' data and not go giving it out +willy-nilly to others, and instead to only give out the bare-minimum that the +user allows. Since the user is not the DaaS' direct customer, this might be too +much to ask. Nevertheless, it's a use-case which is worth thinking about. + +## Dapps + +The idea of decentralized applications, or dapps, has begun to gain traction. +While not mainstream yet, I think they have potential, and it's necessary to +discuss how a DaaS would operate in a world where the internet is no longer +hosted in central datacenters. + +Consider an Ethereum-based dapp. If a user were to register one ethereum address +(which are really public keys) with their DaaS account, the following use-case +could be implemented: + +* A charity dapp has an ethereum contract, which receives a call from an + ethereum address asking for money. The dapp wants to ensure every person it + sends money to hasn't received any that day. + +* The DaaS has a separate ethereum contract it manages, where it stores all + addresses which have been registered to a user. There is no need to keep any + other user information in the contract. + +* The charity dapp's contract calls the DaaS' contract, asking it if the address + is one of its addresses. If so, and if the charity contract hasn't given to + that address yet today, it can send money to that address. + +There would perhaps need to be some mechanism by which a user could change their +address, which would be complex since that address might be in use by a dapp +already, but it's likely a solvable problem. + +A charity dapp is a bit of a silly example; ideally with a charity dapp there'd +also be some mechanism to ensure a person actually _needs_ the money. But +there's other dapp ideas which would become feasible, due to the inability of a +person to impersonate many people, if DaaS use becomes normal. + +## Why Did I Write This? + +Perhaps you've gotten this far and are asking: "Clearly you've thought about +this a lot, why don't you make this yourself and make some phat stacks of cash +with a startup?" The answer is that this project would need to be started and +run by serious people, who can be dedicated and thorough and responsible. I'm +not sure I'm one of those people; I get distracted easily. But I would like to +see this idea tried, and so I've written this up thinking maybe someone else +would take the reins. + +I'm not asking for equity or anything, if you want to try; it's a free idea for +the taking. But if it turns out to be a bazillion dollar Good Idea™, I won't say +no to a donation... diff --git a/src/_posts/2018-11-12-viz-1.md b/src/_posts/2018-11-12-viz-1.md new file mode 100644 index 0000000..8fd9fd9 --- /dev/null +++ b/src/_posts/2018-11-12-viz-1.md @@ -0,0 +1,54 @@ +--- +title: >- + Visualization 1 +description: >- + Using clojurescript and quil to generate interesting visuals +series: viz +git_repo: https://github.com/mediocregopher/viz.git +git_commit: v1 +--- + +First I want to appologize if you've seen this already, I originally had this up +on my normal website, but I've decided to instead consolidate all my work to my +blog. + +This is the first of a series of visualization posts I intend to work on, each +building from the previous one. + +<script src="/assets/viz/1/goog/base.js"></script> +<script src="/assets/viz/1/cljs_deps.js"></script> +<script>goog.require("viz.core");</script> +<p align="center"><canvas id="viz"></canvas></p> + +This visualization follows a few simple rules: + +* Any point can only be occupied by a single node. A point may be alive (filled) + or dead (empty). + +* On every tick each live point picks from 0 to N new points to spawn, where N is + the number of empty adjacent points to it. If it picks 0, it becomes dead. + +* Each line indicates the parent of a point. Lines have an arbitrary lifetime of + a few ticks, and occupy the points they connect (so new points may not spawn + on top of a line). + +* When a dead point has no lines it is cleaned up, and its point is no longer + occupied. + +The resulting behavior is somewhere between [Conway's Game of +Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) and white noise. +Though each point operates independently, they tend to move together in groups. +When two groups collide head on they tend to cancel each other out, killing most +of both. When they meet while both heading in a common direction they tend to +peacefully merge towards that direction. + +Sometimes their world becomes so cluttered there's hardly room to move. +Sometimes a major coincidence of events leads to multiple groups canceling each +other at once, opening up the world and allowing for an explosion of new growth. + +Some groups spiral about a single point, sustaining themselves and defending +from outside groups in the same movement. This doesn't last for very long. + +The performance of this visualization is not very optimized, and will probably +eat up your CPU like nothing else. Most of the slowness comes from drawing the +lines; since there's so many individual small ones it's quite cumbersome to do. diff --git a/src/_posts/2018-11-12-viz-2.md b/src/_posts/2018-11-12-viz-2.md new file mode 100644 index 0000000..c3e342e --- /dev/null +++ b/src/_posts/2018-11-12-viz-2.md @@ -0,0 +1,49 @@ +--- +title: >- + Visualization 2 +description: >- + Now in glorious technicolor! +series: viz +git_repo: https://github.com/mediocregopher/viz.git +git_commit: v2 +--- + + +<script src="/assets/viz/2/goog/base.js"></script> +<script src="/assets/viz/2/cljs_deps.js"></script> +<script>goog.require("viz.core");</script> +<p align="center"><canvas id="viz"></canvas></p> + +This visualization builds on the previous. Structurally the cartesian grid has +been turned into an isometric one, but this is more of an environmental change +than a behavioral one. + +Behavioral changes which were made: + +* When a live point is deciding its next spawn points, it first sorts the set of + empty adjacent points from closest-to-the-center to farthest. It then chooses + a number `n` between `0` to `N` (where `N` is the sorted set's size) and + spawns new points from the first `n` points of the sorted set. `n` is chosen + based on: + + * The live point's linear distance from the center. + + * A random multiplier. + +* Each point is spawned with an attached color, where the color chosen is a + slightly different hue than its parent. The change is deterministic, so all + child points of the same generation have the same color. + +The second change is purely cosmetic, but does create a mesmerizing effect. The +first change alters the behavior dramatically. Only the points which compete for +the center are able to reproduce, but by the same token are more likely to be +starved out by other points doing the same. + +In the previous visualization the points moved around in groups aimlessly. Now +the groups are all competing for the same thing, the center. As a result they +congregate and are able to be viewed as a larger whole. + +The constant churn of the whole takes many forms, from a spiral in the center, +to waves crashing against each other, to outright chaos, to random purges of +nearly all points. Each form lasts for only a few seconds before giving way to +another. diff --git a/src/_posts/2019-08-02-program-structure-and-composability.md b/src/_posts/2019-08-02-program-structure-and-composability.md new file mode 100644 index 0000000..b44c534 --- /dev/null +++ b/src/_posts/2019-08-02-program-structure-and-composability.md @@ -0,0 +1,587 @@ +--- +title: >- + Program Structure and Composability +description: >- + Discussing the nature of program structure, the problems presented by + complex structures, and a pattern that helps in solving those problems. +--- + +## Part 0: Introduction + +This post is focused on a concept I call “program structure,” which I will try +to shed some light on before discussing complex program structures. I will then +discuss why complex structures can be problematic to deal with, and will finally +discuss a pattern for dealing with those problems. + +My background is as a backend engineer working on large projects that have had +many moving parts; most had multiple programs interacting with each other, used +many different databases in various contexts, and faced large amounts of load +from millions of users. Most of this post will be framed from my perspective, +and will present problems in the way I have experienced them. I believe, +however, that the concepts and problems I discuss here are applicable to many +other domains, and I hope those with a foot in both backend systems and a second +domain can help to translate the ideas between the two. + +Also note that I will be using Go as my example language, but none of the +concepts discussed here are specific to Go. To that end, I’ve decided to favor +readable code over “correct” code, and so have elided things that most gophers +hold near-and-dear, such as error checking and proper documentation, in order to +make the code as accessible as possible to non-gophers as well. As with before, +I trust that someone with a foot in Go and another language can help me +translate between the two. + +## Part 1: Program Structure + +In this section I will discuss the difference between directory and program +structure, show how global state is antithetical to compartmentalization (and +therefore good program structure), and finally discuss a more effective way to +think about program structure. + +### Directory Structure + +For a long time, I thought about program structure in terms of the hierarchy +present in the filesystem. In my mind, a program’s structure looked like this: + +``` +// The directory structure of a project called gobdns. +src/ + config/ + dns/ + http/ + ips/ + persist/ + repl/ + snapshot/ + main.go +``` + +What I grew to learn was that this conflation of “program structure” with +“directory structure” is ultimately unhelpful. While it can’t be denied that +every program has a directory structure (and if not, it ought to), this does not +mean that the way the program looks in a filesystem in any way corresponds to +how it looks in our mind’s eye. + +The most notable way to show this is to consider a library package. Here is the +structure of a simple web-app which uses redis (my favorite database) as a +backend: + +``` +src/ + redis/ + http/ + main.go +``` + +If I were to ask you, based on that directory structure, what the program does +in the most abstract terms, you might say something like: “The program +establishes an http server that listens for requests. It also establishes a +connection to the redis server. The program then interacts with redis in +different ways based on the http requests that are received on the server.” + +And that would be a good guess. Here’s a diagram that depicts the program +structure, wherein the root node, `main.go`, takes in requests from `http` and +processes them using `redis`. + +{% include image.html + dir="program-structure" file="diag1.jpg" width=519 + descr="Example 1" + %} + +This is certainly a viable guess for how a program with that directory +structure operates, but consider another answer: “A component of the program +called `server` establishes an http server that listens for requests. `server` +also establishes a connection to a redis server. `server` then interacts with +that redis connection in different ways based on the http requests that are +received on the http server. Additionally, `server` tracks statistics about +these interactions and makes them available to other components. The root +component of the program establishes a connection to a second redis server, and +stores those statistics in that redis server.” Here’s another diagram to depict +_that_ program. + +{% include image.html + dir="program-structure" file="diag2.jpg" width=712 + descr="Example 2" + %} + +The directory structure could apply to either description; `redis` is just a +library which allows for interaction with a redis server, but it doesn’t +specify _which_ or _how many_ servers. However, those are extremely important +factors that are definitely reflected in our concept of the program’s +structure, and not in the directory structure. **What the directory structure +reflects are the different _kinds_ of components available to use, but it does +not reflect how a program will use those components.** + + +### Global State vs Compartmentalization + +The directory-centric view of structure often leads to the use of global +singletons to manage access to external resources like RPC servers and +databases. In examples 1 and 2 the `redis` library might contain code which +looks something like this: + +```go +// A mapping of connection names to redis connections. +var globalConns = map[string]*RedisConn{} + +func Get(name string) *RedisConn { + if globalConns[name] == nil { + globalConns[name] = makeRedisConnection(name) + } + return globalConns[name] +} +``` + +Even though this pattern would work, it breaks with our conception of the +program structure in more complex cases like example 2. Rather than the `redis` +component being owned by the `server` component, which actually uses it, it +would be practically owned by _all_ components, since all are able to use it. +Compartmentalization has been broken, and can only be held together through +sheer human discipline. + +**This is the problem with all global state. It is shareable among all +components of a program, and so is accountable to none of them.** One must look +at an entire codebase to understand how a globally held component is used, +which might not even be possible for a large codebase. Therefore, the +maintainers of these shared components rely entirely on the discipline of their +fellow coders when making changes, usually discovering where that discipline +broke down once the changes have been pushed live. + +Global state also makes it easier for disparate programs/components to share +datastores for completely unrelated tasks. In example 2, rather than creating a +new redis instance for the root component’s statistics storage, the coder might +have instead said, “well, there’s already a redis instance available, I’ll just +use that.” And so, compartmentalization would have been broken further. Perhaps +the two instances _could_ be coalesced into the same instance for the sake of +resource efficiency, but that decision would be better made at runtime via the +configuration of the program, rather than being hardcoded into the code. + +From the perspective of team management, global state-based patterns do nothing +except slow teams down. The person/team responsible for maintaining the central +library in which shared components live (`redis`, in the above examples) +becomes the bottleneck for creating new instances for new components, which +will further lead to re-using existing instances rather than creating new ones, +further breaking compartmentalization. Additionally the person/team responsible +for the central library, rather than the team using it, often finds themselves +as the maintainers of the shared resource. + +### Component Structure + +So what does proper program structure look like? In my mind the structure of a +program is a hierarchy of components, or, in other words, a tree. The leaf +nodes of the tree are almost _always_ IO related components, e.g., database +connections, RPC server frameworks or clients, message queue consumers, etc. +The non-leaf nodes will _generally_ be components that bring together the +functionalities of their children in some useful way, though they may also have +some IO functionality of their own. + +Let's look at an even more complex structure, still only using the `redis` and +`http` component types: + +{% include image.html + dir="program-structure" file="diag3.jpg" width=729 + descr="Example 3" + %} + +This component structure contains the addition of the `debug` component. +Clearly the `http` and `redis` components are reusable in different contexts, +but for this example the `debug` endpoint is as well. It creates a separate +http server that can be queried to perform runtime debugging of the program, +and can be tacked onto virtually any program. The `rest-api` component is +specific to this program and is therefore not reusable. Let’s dive into it a +bit to see how it might be implemented: + +```go +// RestAPI is very much not thread-safe, hopefully it doesn't have to handle +// more than one request at once. +type RestAPI struct { + redisConn *redis.RedisConn + httpSrv *http.Server + + // Statistics exported for other components to see + RequestCount int + FooRequestCount int + BarRequestCount int +} + +func NewRestAPI() *RestAPI { + r := new(RestAPI) + r.redisConn := redis.NewConn("127.0.0.1:6379") + + // mux will route requests to different handlers based on their URL path. + mux := http.NewServeMux() + mux.HandleFunc("/foo", r.fooHandler) + mux.HandleFunc("/bar", r.barHandler) + r.httpSrv := http.NewServer(mux) + + // Listen for requests and serve them in the background. + go r.httpSrv.Listen(":8000") + + return r +} + +func (r *RestAPI) fooHandler(rw http.ResponseWriter, r *http.Request) { + r.redisConn.Command("INCR", "fooKey") + r.RequestCount++ + r.FooRequestCount++ +} + +func (r *RestAPI) barHandler(rw http.ResponseWriter, r *http.Request) { + r.redisConn.Command("INCR", "barKey") + r.RequestCount++ + r.BarRequestCount++ +} +``` + + +In that snippet `rest-api` coalesced `http` and `redis` into a simple REST-like +api using pre-made library components. `main.go`, the root component, does much +the same: + +```go +func main() { + // Create debug server and start listening in the background + debugSrv := debug.NewServer() + + // Set up the RestAPI, this will automatically start listening + restAPI := NewRestAPI() + + // Create another redis connection and use it to store statistics + statsRedisConn := redis.NewConn("127.0.0.1:6380") + for { + time.Sleep(1 * time.Second) + statsRedisConn.Command("SET", "numReqs", restAPI.RequestCount) + statsRedisConn.Command("SET", "numFooReqs", restAPI.FooRequestCount) + statsRedisConn.Command("SET", "numBarReqs", restAPI.BarRequestCount) + } +} +``` + +One thing that is clearly missing in this program is proper configuration, +whether from command-line or environment variables, etc. As it stands, all +configuration parameters, such as the redis addresses and http listen +addresses, are hardcoded. Proper configuration actually ends up being somewhat +difficult, as the ideal case would be for each component to set up its own +configuration variables without its parent needing to be aware. For example, +`redis` could set up `addr` and `pool-size` parameters. The problem is that there +are two `redis` components in the program, and their parameters would therefore +conflict with each other. An elegant solution to this problem is discussed in +the next section. + +## Part 2: Components, Configuration, and Runtime + +The key to the configuration problem is to recognize that, even if there are +two of the same component in a program, they can’t occupy the same place in the +program’s structure. In the above example, there are two `http` components: one +under `rest-api` and the other under `debug`. Because the structure is +represented as a tree of components, the “path” of any node in the tree +uniquely represents it in the structure. For example, the two `http` components +in the previous example have these paths: + +``` +root -> rest-api -> http +root -> debug -> http +``` + +If each component were to know its place in the component tree, then it would +easily be able to ensure that its configuration and initialization didn’t +conflict with other components of the same type. If the `http` component sets +up a command-line parameter to know what address to listen on, the two `http` +components in that program would set up: + +``` +--rest-api-listen-addr +--debug-listen-addr +``` + +So how can we enable each component to know its path in the component structure? +To answer this, we’ll have to take a detour through a type, called `Component`. + +### Component and Configuration + +The `Component` type is a made-up type (though you’ll be able to find an +implementation of it at the end of this post). It has a single primary purpose, +and that is to convey the program’s structure to new components. + +To see how this is done, let's look at a couple of `Component`'s methods: + +```go +// Package mcmp + +// New returns a new Component which has no parents or children. It is therefore +// the root component of a component hierarchy. +func New() *Component + +// Child returns a new child of the called upon Component. +func (*Component) Child(name string) *Component + +// Path returns the Component's path in the component hierarchy. It will return +// an empty slice if the Component is the root component. +func (*Component) Path() []string +``` + +`Child` is used to create a new `Component`, corresponding to a new child node +in the component structure, and `Path` is used retrieve the path of any +`Component` within that structure. For the sake of keeping the examples simple, +let’s pretend these functions have been implemented in a package called `mcmp`. +Here’s an example of how `Component` might be used in the `redis` component’s +code: + +```go +// Package redis + +func NewConn(cmp *mcmp.Component, defaultAddr string) *RedisConn { + cmp = cmp.Child("redis") + paramPrefix := strings.Join(cmp.Path(), "-") + + addrParam := flag.String(paramPrefix+"-addr", defaultAddr, "Address of redis instance to connect to") + // finish setup + + return redisConn +} +``` + +In our above example, the two `redis` components' parameters would be: + +``` +// This first parameter is for the stats redis, whose parent is the root and +// therefore doesn't have a prefix. Perhaps stats should be broken into its own +// component in order to fix this. +--redis-addr +--rest-api-redis-addr +``` + +`Component` definitely makes it easier to instantiate multiple redis components +in our program, since it allows them to know their place in the component +structure. + +Having to construct the prefix for the parameters ourselves is pretty annoying, +so let’s introduce a new package, `mcfg`, which acts like `flag` but is aware +of `Component`. Then `redis.NewConn` is reduced to: + +```go +// Package redis + +func NewConn(cmp *mcmp.Component, defaultAddr string) *RedisConn { + cmp = cmp.Child("redis") + addrParam := mcfg.String(cmp, "addr", defaultAddr, "Address of redis instance to connect to") + // finish setup + + return redisConn +} +``` + +Easy-peasy. + +#### But What About Parse? + +Sharp-eyed gophers will notice that there is a key piece missing: When is +`flag.Parse`, or its `mcfg` counterpart, called? When does `addrParam` actually +get populated? It can’t happen inside `redis.NewConn` because there might be +other components after `redis.NewConn` that want to set up parameters. To +illustrate the problem, let’s look at a simple program that wants to set up two +`redis` components: + +```go +func main() { + // Create the root Component, an empty Component. + cmp := mcmp.New() + + // Create the Components for two sub-components, foo and bar. + cmpFoo := cmp.Child("foo") + cmpBar := cmp.Child("bar") + + // Now we want to try to create a redis sub-component for each component. + + // This will set up the parameter "--foo-redis-addr", but bar hasn't had a + // chance to set up its corresponding parameter, so the command-line can't + // be parsed yet. + fooRedis := redis.NewConn(cmpFoo, "127.0.0.1:6379") + + // This will set up the parameter "--bar-redis-addr", but, as mentioned + // before, redis.NewConn can't parse command-line. + barRedis := redis.NewConn(cmpBar, "127.0.0.1:6379") + + // It is only after all components have been instantiated that the + // command-line arguments can be parsed + mcfg.Parse() +} +``` + +While this solves our argument parsing problem, fooRedis and barRedis are not +usable yet because the actual connections have not been made. This is a classic +chicken and the egg problem. The func `redis.NewConn` needs to make a connection +which it cannot do until _after_ `mcfg.Parse` is called, but `mcfg.Parse` cannot +be called until after `redis.NewConn` has returned. We will solve this problem +in the next section. + +### Instantiation vs Initialization + +Let’s break down `redis.NewConn` into two phases: instantiation and +initialization. Instantiation refers to creating the component on the component +structure and having it declare what it needs in order to initialize (e.g., +configuration parameters). During instantiation, nothing external to the +program is performed; no IO, no reading of the command-line, no logging, etc. +All that’s happened is that the empty template of a `redis` component has been +created. + +Initialization is the phase during which the template is filled in. +Configuration parameters are read, startup actions like the creation of database +connections are performed, and logging is output for informational and debugging +purposes. + +The key to making effective use of this dichotomy is to allow _all_ components +to instantiate themselves before they initialize themselves. By doing this we +can ensure, for example, that all components have had the chance to declare +their configuration parameters before configuration parsing is done. + +So let’s modify `redis.NewConn` so that it follows this dichotomy. It makes +sense to leave instantiation-related code where it is, but we need a mechanism +by which we can declare initialization code before actually calling it. For +this, I will introduce the idea of a “hook.” + +#### But First: Augment Component + +In order to support hooks, however, `Component` will need to be augmented with +a few new methods. Right now, it can only carry with it information about the +component structure, but here we will add the ability to carry arbitrary +key/value information as well: + +```go +// Package mcmp + +// SetValue sets the given key to the given value on the Component, overwriting +// any previous value for that key. +func (*Component) SetValue(key, value interface{}) + +// Value returns the value which has been set for the given key, or nil if the +// key was never set. +func (*Component) Value(key interface{}) interface{} + +// Children returns the Component's children in the order they were created. +func (*Component) Children() []*Component +``` + +The final method allows us to, starting at the root `Component`, traverse the +component structure and interact with each `Component`’s key/value store. This +will be useful for implementing hooks. + +#### Hooks + +A hook is simply a function that will run later. We will declare a new package, +calling it `mrun`, and say that it has two new functions: + +```go +// Package mrun + +// InitHook registers the given hook to the given Component. +func InitHook(cmp *mcmp.Component, hook func()) + +// Init runs all hooks registered using InitHook. Hooks are run in the order +// they were registered. +func Init(cmp *mcmp.Component) +``` + +With these two functions, we are able to defer the initialization phase of +startup by using the same `Components` we were passing around for the purpose +of denoting component structure. + +Now, with these few extra pieces of functionality in place, let’s reconsider the +most recent example, and make a program that creates two redis components which +exist independently of each other: + +```go +// Package redis + +// NOTE that NewConn has been renamed to InstConn, to reflect that the returned +// *RedisConn is merely instantiated, not initialized. + +func InstConn(cmp *mcmp.Component, defaultAddr string) *RedisConn { + cmp = cmp.Child("redis") + + // we instantiate an empty RedisConn instance and parameters for it. Neither + // has been initialized yet. They will remain empty until initialization has + // occurred. + redisConn := new(RedisConn) + addrParam := mcfg.String(cmp, "addr", defaultAddr, "Address of redis instance to connect to") + + mrun.InitHook(cmp, func() { + // This hook will run after parameter initialization has happened, and + // so addrParam will be usable. Once this hook as run, redisConn will be + // usable as well. + *redisConn = makeRedisConnection(*addrParam) + }) + + // Now that cmp has had configuration parameters and intialization hooks + // set into it, return the empty redisConn instance back to the parent. + return redisConn +} +``` + +```go +// Package main + +func main() { + // Create the root Component, an empty Component. + cmp := mcmp.New() + + // Create the Components for two sub-components, foo and bar. + cmpFoo := cmp.Child("foo") + cmpBar := cmp.Child("bar") + + // Add redis components to each of the foo and bar sub-components. + redisFoo := redis.InstConn(cmpFoo, "127.0.0.1:6379") + redisBar := redis.InstConn(cmpBar, "127.0.0.1:6379") + + // Parse will descend into the Component and all of its children, + // discovering all registered configuration parameters and filling them from + // the command-line. + mcfg.Parse(cmp) + + // Now that configuration parameters have been initialized, run the Init + // hooks for all Components. + mrun.Init(cmp) + + // At this point the redis components have been fully initialized and may be + // used. For this example we'll copy all keys from one to the other. + keys := redisFoo.Command("KEYS", "*") + for i := range keys { + val := redisFoo.Command("GET", keys[i]) + redisBar.Command("SET", keys[i], val) + } +} +``` + +## Conclusion + +While the examples given here are fairly simplistic, the pattern itself is quite +powerful. Codebases naturally accumulate small, domain-specific behaviors and +optimizations over time, especially around the IO components of the program. +Databases are used with specific options that an organization finds useful, +logging is performed in particular places, metrics are counted around certain +pieces of code, etc. + +By programming with component structure in mind, we are able to keep these +optimizations while also keeping the clarity and compartmentalization of the +code intact. We can keep our code flexible and configurable, while also +re-usable and testable. Also, the simplicity of the tools involved means they +can be extended and retrofitted for nearly any situation or use-case. + +Overall, this is a powerful pattern that I’ve found myself unable to do without +once I began using it. + +### Implementation + +As a final note, you can find an example implementation of the packages +described in this post here: + +* [mcmp](https://godoc.org/github.com/mediocregopher/mediocre-go-lib/mcmp) +* [mcfg](https://godoc.org/github.com/mediocregopher/mediocre-go-lib/mcfg) +* [mrun](https://godoc.org/github.com/mediocregopher/mediocre-go-lib/mrun) + +The packages are not stable and are likely to change frequently. You’ll also +find that they have been extended quite a bit from the simple descriptions found +here, based on what I’ve found useful as I’ve implemented programs using +component structures. With these two points in mind, I would encourage you to +look and take whatever functionality you find useful for yourself, and not use +the packages directly. The core pieces are not different from what has been +described in this post. diff --git a/src/_posts/2020-04-26-trading-in-the-rain.md b/src/_posts/2020-04-26-trading-in-the-rain.md new file mode 100644 index 0000000..3a31a95 --- /dev/null +++ b/src/_posts/2020-04-26-trading-in-the-rain.md @@ -0,0 +1,55 @@ +--- +title: >- + Trading in the Rain +description: >- + All those... gains... will be lost like... tears... +--- + +<!-- MIDI.js --> +<!-- polyfill --> +<script src="/assets/trading-in-the-rain/MIDI.js/inc/shim/Base64.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/inc/shim/Base64binary.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/inc/shim/WebAudioAPI.js" type="text/javascript"></script> +<!-- MIDI.js package --> +<script src="/assets/trading-in-the-rain/MIDI.js/js/midi/audioDetect.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/js/midi/gm.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/js/midi/loader.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/js/midi/plugin.audiotag.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/js/midi/plugin.webaudio.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/js/midi/plugin.webmidi.js" type="text/javascript"></script> +<!-- utils --> +<script src="/assets/trading-in-the-rain/MIDI.js/js/util/dom_request_xhr.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MIDI.js/js/util/dom_request_script.js" type="text/javascript"></script> +<!-- / MIDI.js --> + +<script src="/assets/trading-in-the-rain/Distributor.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/MusicBox.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/RainCanvas.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/CW.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/SeriesComposer.js" type="text/javascript"></script> +<script src="/assets/trading-in-the-rain/main.js" type="text/javascript"></script> + + +<div id="tradingInRainModal"> +For each pair listed below, live trade data will be pulled down from the +<a href="https://docs.cryptowat.ch/websocket-api/">Cryptowat.ch Websocket +API</a> and used to generate musical rain drops. The price of each trade +determines both the musical note and position of the rain drop on the screen, +while the volume of each trade determines how long the note is held and how big +the rain drop is. + +<p id="markets">Pairs to be generated, by color:<br/><br/></p> + +<button id="button" onclick="run()">Click Here to Begin</button> +<p id="progress"></p> + +<script type="text/javascript"> + fillMarketP(); + if (window.addEventListener) window.addEventListener("load", autorun, false); + else if (window.attachEvent) window.attachEvent("onload", autorun); + else window.onload = autorun; +</script> +</div> + + +<canvas id="rainCanvas" style=""></canvas> diff --git a/src/_posts/2020-05-30-denver-protests.md b/src/_posts/2020-05-30-denver-protests.md new file mode 100644 index 0000000..710987f --- /dev/null +++ b/src/_posts/2020-05-30-denver-protests.md @@ -0,0 +1,161 @@ +--- +title: >- + Denver Protests +description: >- + Craziness +--- + +# Saturday, May 30th + +We went to the May 30th protest at Civic Center Park. We were there for a few +hours during the day, leaving around 4pm. I would describe the character of the +protest as being energetic, angry, but contained. A huge crowd moved in and +around civic center, chanting and being rowdy, but clearly was being led. + +After a last hurrah at the pavilion it seemed that the organized event was +"over". We stayed a while longer, and eventually headed back home. I don't feel +that people really left the park at the same time we did; mostly everyone just +dispersed around the park and found somewhere to keep hanging out. + +Tonight there has been an 8pm curfew. The police lined up on the north side of +the park, armored and clearly ready for action. We watched all of this on the +live news stations, gritting our teeth through the comentary of their reporters. +As the police stood there, the clock counting down to 8, the protesters grew +more and more irritated. They taunted the police, and formed a line of their +own. The braver (or more dramatic) protesters walked around in the no-man's land +between them, occasionally earning themselves some teargas. + +The police began pushing forward just before 8 a little, but began pushing in +earnest just after 8, after the howling. They would advance, wait, advance, wait +again. An armada of police cars, ambulance, and fire trucks followed the line as +it advanced. + +The police did not give the protesters anywhere to go except into Capital Hill, +southeast of Civic Center Park. We watched as a huge crowd marched past the +front of our house, chanting their call and response: "What's his name?" "GEORGE +FLOYD". The feeling wasn't of violence still, just anger. Indignant at a curfew +aimed at quelling a movement, the protesters simply kept moving. The police were +never far behind. + +We sat on our front stoop with our neighbors and watched the night unfold. I +don't think a single person in our building or the buildings to the left and +right of us hadn't gone to protest today in some capacity. We came back from our +various outings and sat out front, watching the crowds and patrolling up and +down the street to keep tabs on things. + +Around 9pm the fires started. We saw them on the news, and in person. They were +generally dumpster fires, generally placed such that they were away from +buildings, clearly being done more to be annoying than to accomplish anything +specific. A very large set of fires was started a block south of us, in the +middle of the street. The fire department was there within a few minutes to put +those out, before moving on. + +From the corner of my eye, sitting back on the stoop, I noticed our neighbors +running into their backyard. We ran after them, and they told us there was a +dumpster fire in our alley. They were running with fire extinguishers, and we +ran inside to grab some of our own. By the time we got to the backyard the fire +was only smouldering, and the fire department was coming down the alley. We +scurried back into the backyard. A few minutes later I peeked my head around the +corner, into the alley, to see what happening. I was greeted by at least two +police in riot gear, guarding the dumpster as the fire department worked. They +saw me but didn't move, and I quickly retreated back to the yard. + +Talking to our neighbor later we found out she had seen a group of about 10 +people back there, and watched them jump the fence into another backyard in +order to escape the alley. She thinks they, or some subset of them, started the +fire. She looked one in the eye, she says, and didn't get the impression they +were trying to cause damage, just to make a statement. + +The fires stopped not long after that, it seems. We're pretty sure the fire +trucks were just driving up and down the main roads, looking into alleys and +stopping all fires they could find. In all this time the police didn't do much. +They would hold a line, but never chase anyone. Even now, as I write this around +midnight, people are still out, meandering around in small groups, and police +are present but not really doing anything. + +It's hard to get a good view of everything though. All we have is livestreams on +youtube to go on at this point. There's a couple intrepid amateur reporters out +there, getting into the crowds and streaming events as they happen. Right now +we're watching people moving down Lincoln towards Civic Center Park, some of +them trying to smash windows of buildings as they go. + +The violence of these protests is going to be the major story of tonight, I know +that already. That I know of there's been 3 police injured, some broken +windows, and quite a bit of graffiti. I do believe the the tactic of pushing +everyone into Cap Hill had the desired effect of reducing looting (again, as far +as I can tell so far), but at that expense of those who live here who have to +endure latent tear gas, dumpster fires, and sirens all through the night. + +Even now, at midnight, from what I can see from my porch and from these live +streams, the protesters are not violent. At worst they are guilty of a lot of +loitering. The graffiti, the smashed windows, the injured officers, all of these +things will be held up as examples of the anarchy and violence inherent to the +protesters. But I don't think that's an honest picture. The vast, vast majority +of those out right now are civily disobeying an unjust curfew, trying to keep +the energy of the movement alive. + +My thoughts about these things are complicated. When turning a corner on the +street I'm far more afraid to see the police than to see other protesters. The +fires have been annoying, and stupid, and unhelpful, but were never threatening. +The violence is stupid, though I don't shed many tears for a looted Chili's or +Papa Johns. The police have actually shown more restraint than I expected in all +of this, though funneling the protest into a residential neighborhood was an +incredibly stupid move. Could the protesters not have just stayed in the park? +Yes, the park would likely have been turned into an encampment, but it was +already heading into that direction due to Covid-19. Overall, this night didn't +need to be so hard, but Denver handled this well. + +But, it's only 1am, and the night has a long way to go. Things could still get +worse. Even now I'm watching people trying to break into the supreme court +building. Civic Center Park appears to be very populated again, and the police +are very present there again. It's possible I may eat my words. + +# Monday, June 1st + +Yesterday was quite a bit more tame than the craziness Saturday. I woke up +Sunday morning feeling antsy, and rode my bike around to see the damage. I had a +long conversation with a homeless man named Gary in Civic Center Park. He was +pissed, and had a lot to say about the "suburban kids" destroying the park he +and many others live in, causing it to be shut down and tear gassed. The +protesters saw it as a game, according to him, but it was life and death for the +homeless; three of his guys got beat up in the street, and neither police nor +protesters stopped it. + +Many people had shown up to the park early to help clean it up. Apart from the +graffiti, which was also in the process of being cleaned, it was hard to tell +anything had actually happened. Gary had some words about them as well, that +they were only there for the gram and some pats on the back, but once they left +his life would be back as it was. I could feel that, but I also appreciated that +people were cognizant that damage was being done and were willing to do +something about it. + +I rode around 16th street mall, down colfax, and back up 13th, looking to see if +anything had happened. For the most part there was no damage, save the graffiti. +A mediterranean restaurant got its windows smashed, as well as the Office Depot. +The restaurant was unfortunate, Office Depot will be ok. + +The protest yesterday was much more peaceful. The cops were nowhere to be found +when curfew hit, but did eventually show up when the protest moved down Colfax. +They had lined the streets around their precinct building there, but for the +most part the protesters just kept walking. This is when the "violence" started. +The cops moved into the street, forming a line across Colfax behind the +protesters. Police cars and vans started moving. As the protest turned back, +presumably to head back to the capitol lawn, it ran into the riot line. + +Predictably, everyone scattered. The cat-and-mouse game had begun, which meant +dumpster fires, broken windows, tear gas, and all the rest. Watching the whole +thing it was extremely clear to us, though not the news casters, unfortunately, +that if the police hadn't moved out into Colfax nothing would have ever +happened. Instead, the news casters lamented that people were bringing things +like helmets, gas masks, traffic cones, shields, etc... and so were clearly not there +"for the right reasons". + +The thing that the news casters couldn't seem to grasp was that the police +attempting to control these situations are what are catalyzing them in the first +place. These are protests _against_ the police, they cannot take place under the +terms the police choose. If the police were not here setting terms, but instead +working with the peaceful protesters (the vast, vast majority) to quell the +violence, no one would be here with helmets, gas masks, traffic cones, +shields... But instead the protesters feel they need to protect themselves in +order to be heard, and the police feel they have to exercise their power to +maintain control, and so the situation degrades. diff --git a/src/_posts/2020-07-07-viz-3.md b/src/_posts/2020-07-07-viz-3.md new file mode 100644 index 0000000..f56dbb6 --- /dev/null +++ b/src/_posts/2020-07-07-viz-3.md @@ -0,0 +1,154 @@ +--- +title: >- + Visualization 3 +description: >- + All the pixels. +series: viz +--- + +<canvas id="canvas" style="padding-bottom: 2rem;"></canvas> + +This visualization is built from the ground up. On every frame a random set of +pixels is chosen. Each chosen pixel calculates the average of its color and the +color of a random neighbor. Some random color drift is added in as well. It +replaces its own color with that calculated color. + +Choosing a neighbor is done using the "asteroid rule", ie a pixel at the very +top row is considered to be the neighbor of the pixel on the bottom row of the +same column. + +Without the asteroid rule the pixels would all eventually converge into a single +uniform color, generally a light blue, due to the colors at the edge, the reds, +being quickly averaged away. With the asteroid rule in place the canvas has no +edges, thus no position on the canvas is favored and balance can be maintained. + +<script type="text/javascript"> +let rectSize = 12; + +function randn(n) { + return Math.floor(Math.random() * n); +} + +let canvas = document.getElementById("canvas"); +canvas.width = window.innerWidth - (window.innerWidth % rectSize); +canvas.height = window.innerHeight- (window.innerHeight % rectSize); +let ctx = canvas.getContext("2d"); + +let w = canvas.width / rectSize; +let h = canvas.height / rectSize; + +let matrices = new Array(2); +matrices[0] = new Array(w); +matrices[1] = new Array(w); +for (let x = 0; x < w; x++) { + matrices[0][x] = new Array(h); + matrices[1][x] = new Array(h); + for (let y = 0; y < h; y++) { + let el = { + h: 360 * (x / w), + s: "100%", + l: "50%", + }; + matrices[0][x][y] = el; + matrices[1][x][y] = el; + } +} + +// draw initial canvas, from here on out only individual rectangles will be +// filled as they get updated. +for (let x = 0; x < w; x++) { + for (let y = 0; y < h; y++) { + let el = matrices[0][x][y]; + ctx.fillStyle = `hsl(${el.h}, ${el.s}, ${el.l})`; + ctx.fillRect(x * rectSize, y * rectSize, rectSize, rectSize); + } +} + + +let requestAnimationFrame = + window.requestAnimationFrame || + window.mozRequestAnimationFrame || + window.webkitRequestAnimationFrame || + window.msRequestAnimationFrame; + +let neighbors = [ + [-1, -1], [0, -1], [1, -1], + [-1, 0], [1, 0], + [-1, 1], [0, 1], [1, 1], +]; + +function randNeighborAsteroid(matrix, x, y) { + let neighborCoord = neighbors[randn(neighbors.length)]; + let neighborX = x+neighborCoord[0]; + let neighborY = y+neighborCoord[1]; + neighborX = (neighborX + w) % w; + neighborY = (neighborY + h) % h; + return matrix[neighborX][neighborY]; +} + +function randNeighbor(matrix, x, y) { + while (true) { + let neighborCoord = neighbors[randn(neighbors.length)]; + let neighborX = x+neighborCoord[0]; + let neighborY = y+neighborCoord[1]; + if (neighborX < 0 || neighborX >= w || neighborY < 0 || neighborY >= h) { + continue; + } + return matrix[neighborX][neighborY]; + } +} + +let drift = 10; +function genChildH(elA, elB) { + // set the two h values, h1 <= h2 + let h1 = elA.h; + let h2 = elB.h; + if (h1 > h2) { + h1 = elB.h; + h2 = elA.h; + } + + // diff must be between 0 (inclusive) and 360 (exclusive). If it's greater + // than 180 then it's not the shortest path around, that must be the other + // way around the circle. + let hChild; + let diff = h2 - h1; + if (diff > 180) { + diff = 360 - diff; + hChild = h2 + (diff / 2); + } else { + hChild = h1 + (diff / 2); + } + + hChild += (Math.random() * drift * 2) - drift; + hChild = (hChild + 360) % 360; + return hChild; +} + +let tick = 0; +function doTick() { + tick++; + let currI = tick % 2; + let curr = matrices[currI]; + let lastI = (tick - 1) % 2; + let last = matrices[lastI]; + + for (let i = 0; i < (w * h / 2); i++) { + let x = randn(w); + let y = randn(h); + if (curr[x][y].lastTick == tick) continue; + + let neighbor = randNeighborAsteroid(last, x, y); + curr[x][y].h = genChildH(curr[x][y], neighbor); + curr[x][y].lastTick = tick; + ctx.fillStyle = `hsl(${curr[x][y].h}, ${curr[x][y].s}, ${curr[x][y].l})`; + ctx.fillRect(x * rectSize, y * rectSize, rectSize, rectSize); + } + + matrices[currI] = curr; + requestAnimationFrame(doTick); +} + +requestAnimationFrame(doTick); + +</script> diff --git a/src/_posts/2020-11-16-component-oriented-programming.md b/src/_posts/2020-11-16-component-oriented-programming.md new file mode 100644 index 0000000..3400090 --- /dev/null +++ b/src/_posts/2020-11-16-component-oriented-programming.md @@ -0,0 +1,352 @@ +--- +title: >- + Component-Oriented Programming +description: >- + A concise description of. +--- + +[A previous post in this +blog](/2019/08/02/program-structure-and-composability.html) focused on a +framework developed to make designing component-based programs easier. In +retrospect, the proposed pattern/framework was over-engineered. This post +attempts to present the same ideas in a more distilled form, as a simple +programming pattern and without the unnecessary framework. + +## Components + +Many languages, libraries, and patterns make use of a concept called a +"component," but in each case the meaning of "component" might be slightly +different. Therefore, to begin talking about components, it is necessary to first +describe what is meant by "component" in this post. + +For the purposes of this post, the properties of components include the +following. + + 1... **Abstract**: A component is an interface consisting of one or more +methods. + + 1a... A function might be considered a single-method component +_if_ the language supports first-class functions. + + 1b... A component, being an interface, may have one or more +implementations. Generally, there will be a primary implementation, which is +used during a program's runtime, and secondary "mock" implementations, which are +only used when testing other components. + + 2... **Instantiatable**: An instance of a component, given some set of +parameters, can be instantiated as a standalone entity. More than one of the +same component can be instantiated, as needed. + + 3... **Composable**: A component may be used as a parameter of another +component's instantiation. This would make it a child component of the one being +instantiated (the parent). + + 4... **Pure**: A component may not use mutable global variables (i.e., +singletons) or impure global functions (e.g., system calls). It may only use +constants and variables/components given to it during instantiation. + + 5... **Ephemeral**: A component may have a specific method used to clean +up all resources that it's holding (e.g., network connections, file handles, +language-specific lightweight threads, etc.). + + 5a... This cleanup method should _not_ clean up any child +components given as instantiation parameters. + + 5b... This cleanup method should not return until the +component's cleanup is complete. + + 5c... A component should not be cleaned up until all its +parent components are cleaned up. + +Components are composed together to create component-oriented programs. This is +done by passing components as parameters to other components during +instantiation. The `main` procedure of the program is responsible for +instantiating and composing the components of the program. + +## Example + +It's easier to show than to tell. This section posits a simple program and then +describes how it would be implemented in a component-oriented way. The program +chooses a random number and exposes an HTTP interface that allows users to try +and guess that number. The following are requirements of the program: + +* A guess consists of a name that identifies the user performing the guess and + the number that is being guessed; + +* A score is kept for each user who has performed a guess; + +* Upon an incorrect guess, the user should be informed of whether they guessed + too high or too low, and 1 point should be deducted from their score; + +* Upon a correct guess, the program should pick a new random number against + which to check subsequent guesses, and 1000 points should be added to the + user's score; + +* The HTTP interface should have two endpoints: one for users to submit guesses, + and another that lists out user scores from highest to lowest; + +* Scores should be saved to disk so they survive program restarts. + +It seems clear that there will be two major areas of functionality for our +program: score-keeping and user interaction via HTTP. Each of these can be +encapsulated into components called `scoreboard` and `httpHandlers`, +respectively. + +`scoreboard` will need to interact with a filesystem component to save/restore +scores (because it can't use system calls directly; see property 4). It would be +wasteful for `scoreboard` to save the scores to disk on every score update, so +instead it will do so every 5 seconds. A time component will be required to +support this. + +`httpHandlers` will be choosing the random number which is being guessed, and +will therefore need a component that produces random numbers. `httpHandlers` +will also be recording score changes to `scoreboard`, so it will need access to +`scoreboard`. + +The example implementation will be written in go, which makes differentiating +HTTP handler functionality from the actual HTTP server quite easy; thus, there +will be an `httpServer` component that uses `httpHandlers`. + +Finally, a `logger` component will be used in various places to log useful +information during runtime. + +[The example implementation can be found +here.](/assets/component-oriented-design/v1/main.html) While most of it can be +skimmed, it is recommended to at least read through the `main` function to see +how components are composed together. Note that `main` is where all components +are instantiated, and that all components' take in their child components as +part of their instantiation. + +## DAG + +One way to look at a component-oriented program is as a directed acyclic graph +(DAG), where each node in the graph represents a component, and each edge +indicates that one component depends upon another component for instantiation. +For the previous program, it's quite easy to construct such a DAG just by +looking at `main`, as in the following: + +``` +net.Listener rand.Rand os.File + ^ ^ ^ + | | | + httpServer --> httpHandlers --> scoreboard --> time.Ticker + | | | + +---------------+---------------+--> log.Logger +``` + +Note that all the leaves of the DAG (i.e., nodes with no children) describe the +points where the program meets the operating system via system calls. The leaves +are, in essence, the program's interface with the outside world. + +While it's not necessary to actually draw out the DAG for every program one +writes, it can be helpful to at least think about the program's structure in +these terms. + +## Benefits + +Looking at the previous example implementation, one would be forgiven for having +the immediate reaction of "This seems like a lot of extra work for little gain. +Why can't I just make the system calls where I need to, and not bother with +wrapping them in interfaces and all these other rules?" + +The following sections will answer that concern by showing the benefits gained +by following a component-oriented pattern. + +### Testing + +Testing is important, that much is being assumed. + +A distinction to be made with testing is between unit and non-unit tests. Unit +tests are those for which there are no requirements for the environment outside +the test, such as the existence of global variables, running databases, +filesystems, or network services. Unit tests do not interact with the world +outside the testing procedure, but instead use mocks in place of the +functionality that would be expected by that world. + +Unit tests are important because they are faster to run and more consistent than +non-unit tests. Unit tests also force the programmer to consider different +possible states of a component's dependencies during the mocking process. + +Unit tests are often not employed by programmers, because they are difficult to +implement for code that does not expose any way to swap out dependencies for +mocks of those dependencies. The primary culprit of this difficulty is the +direct usage of singletons and impure global functions. For component-oriented +programs, all components inherently allow for the swapping out of any +dependencies via their instantiation parameters, so there's no extra effort +needed to support unit tests. + +[Tests for the example implementation can be found +here.](/assets/component-oriented-design/v1/main_test.html) Note that all +dependencies of each component being tested are mocked/stubbed next to them. + +### Configuration + +Practically all programs require some level of runtime configuration. This may +take the form of command-line arguments, environment variables, configuration +files, etc. + +For a component-oriented program, all components are instantiated in the same +place, `main`, so it's very easy to expose any arbitrary parameter to the user +via configuration. For any component that is affected by a configurable +parameter, that component merely needs to take an instantiation parameter for +that configurable parameter; `main` can connect the two together. This accounts +for the unit testing of a component with different configurations, while still +allowing for the configuration of any arbitrary internal functionality. + +For more complex configuration systems, it is also possible to implement a +`configuration` component that wraps whatever configuration-related +functionality is needed, which other components use as a sub-component. The +effect is the same. + +To demonstrate how configuration works in a component-oriented program, the +example program's requirements will be augmented to include the following: + +* The point change values for both correct and incorrect guesses (currently + hardcoded at 1000 and 1, respectively) should be configurable on the + command-line; + +* The save file's path, HTTP listen address, and save interval should all be + configurable on the command-line. + +[The new implementation, with newly configurable parameters, can be found +here.](/assets/component-oriented-design/v2/main.html) Most of the program has +remained the same, and all unit tests from before remain valid. The primary +difference is that `scoreboard` takes in two new parameters for the point change +values, and configuration is set up inside `main` using the `flags` package. + +### Setup/Runtime/Cleanup + +A program can be split into three stages: setup, runtime, and cleanup. Setup is +the stage during which the internal state is assembled to make runtime possible. +Runtime is the stage during which a program's actual function is being +performed. Cleanup is the stage during which the runtime stops and internal +state is disassembled. + +A graceful (i.e., reliably correct) setup is quite natural to accomplish for +most. On the other hand, a graceful cleanup is, unfortunately, not a programmer's +first concern (if it is a concern at all). + +When building reliable and correct programs, a graceful cleanup is as important +as a graceful setup and runtime. A program is still running while it is being +cleaned up, and it's possibly still acting on the outside world. Shouldn't +it behave correctly during that time? + +Achieving a graceful setup and cleanup with components is quite simple. + +During setup, a single-threaded procedure (`main`) first constructs the leaf +components, then the components that take those leaves as parameters, then the +components that take _those_ as parameters, and so on, until the component DAG +is fully constructed. + +At this point, the program's runtime has begun. + +Once the runtime is over, signified by a process signal or some other mechanism, +it's only necessary to call each component's cleanup method (if any; see +property 5) in the reverse of the order in which the components were +instantiated. This order is inherently deterministic, as the components were +instantiated by a single-threaded procedure. + +Inherent to this pattern is the fact that each component will certainly be +cleaned up before any of its child components, as its child components must have +been instantiated first, and a component will not clean up child components +given as parameters (properties 5a and 5c). Therefore, the pattern avoids +use-after-cleanup situations. + +To demonstrate a graceful cleanup in a component-oriented program, the example +program's requirements will be augmented to include the following: + +* The program will terminate itself upon an interrupt signal; + +* During termination (cleanup), the program will save the latest set of scores + to disk one final time. + +[The new implementation that accounts for these new requirements can be found +here.](/assets/component-oriented-design/v3/main.html) For this example, go's +`defer` feature could have been used instead, which would have been even +cleaner, but was omitted for the sake of those using other languages. + + +## Conclusion + +The component pattern helps make programs more reliable with only a small amount +of extra effort incurred. In fact, most of the pattern has to do with +establishing sensible abstractions around global functionality and remembering +certain idioms for how those abstractions should be composed together, something +most of us already do to some extent anyway. + +While beneficial in many ways, component-oriented programming is merely a tool +that can be applied in many cases. It is certain that there are cases where it +is not the right tool for the job, so apply it deliberately and intelligently. + +## Criticisms/Questions + +In lieu of a FAQ, I will attempt to premeditate questions and criticisms of the +component-oriented programming pattern laid out in this post. + +**This seems like a lot of extra work.** + +Building reliable programs is a lot of work, just as building a +reliable _anything_ is a lot of work. Many of us work in an industry that likes +to balance reliability (sometimes referred to by the more specious "quality") +with malleability and deliverability, which naturally leads to skepticism of any +suggestions requiring more time spent on reliability. This is not necessarily a +bad thing, it's just how the industry functions. + +All that said, a pattern need not be followed perfectly to be worthwhile, and +the amount of extra work incurred by it can be decided based on practical +considerations. I merely maintain that code which is (mostly) component-oriented +is easier to maintain in the long run, even if it might be harder to get off the +ground initially. + +**My language makes this difficult.** + +I don't know of any language which makes this pattern particularly easier than +others, so, unfortunately, we're all in the same boat to some extent (though I +recognize that some languages, or their ecosystems, make it more difficult than +others). It seems to me that this pattern shouldn't be unbearably difficult for +anyone to implement in any language either, however, as the only language +feature required is abstract typing. + +It would be nice to one day see a language that explicitly supports this +pattern by baking the component properties in as compiler-checked rules. + +**My `main` is too big** + +There's no law saying all component construction needs to happen in `main`, +that's just the most sensible place for it. If there are large sections of your +program that are independent of each other, then they could each have their own +construction functions that `main` then calls. + +Other questions that are worth asking include: Can my program be split up +into multiple programs? Can the responsibilities of any of my components be +refactored to reduce the overall complexity of the component DAG? Can the +instantiation of any components be moved within their parent's +instantiation function? + +(This last suggestion may seem to be disallowed, but is fine as long as the +parent's instantiation function remains pure.) + +**Won't this will result in over-abstraction?** + +Abstraction is a necessary tool in a programmer's toolkit, there is simply no +way around it. The only questions are "how much?" and "where?" + +The use of this pattern does not affect how those questions are answered, in my +opinion, but instead aims to more clearly delineate the relationships and +interactions between the different abstracted types once they've been +established using other methods. Over-abstraction is possible and avoidable +regardless of which language, pattern, or framework is being used. + +**Does CoP conflict with object-oriented or functional programming?** + +I don't think so. OoP languages will have abstract types as part of their core +feature-set; most difficulties are going to be with deliberately _not_ using +other features of an OoP language, and with imported libraries in the language +perhaps making life inconvenient by not following CoP (specifically regarding +cleanup and the use of singletons). + +For functional programming, it may well be that, depending on the language, CoP +is technically being used, as functional languages are already generally +antagonistic toward globals and impure functions, which is most of the battle. +If anything, the transition from functional to component-oriented programming +will generally be an organizational task. diff --git a/src/_posts/2021-01-01-new-year-new-resolution.md b/src/_posts/2021-01-01-new-year-new-resolution.md new file mode 100644 index 0000000..8e9edc7 --- /dev/null +++ b/src/_posts/2021-01-01-new-year-new-resolution.md @@ -0,0 +1,50 @@ +--- +title: >- + New Year, New Resolution +description: >- + This blog is about to get some action. +--- + +At this point I'm fairly well known amongst friends and family for my new year's +resolutions, to the point that earlier this month a friend of mine asked me +"What's it going to be this year?". In the past I've done things like no +chocoloate, no fast food, no added sugar (see a theme?), and no social media. +They've all been of the "I won't do this" sort, because it's a lot easier to +stop doing something than to start doing something new. Doing something new +inherently means _also_ not doing something else; there's only so many hours in +the day, afterall. + +## This Year + +This year I'm going to shake things up, I'm going to do something new. My +resolution is to have published 52 posts on this blog by Jan 1, 2022, 00:00 UTC. +Only one post per day can count towards the 52. A post must be "substantial" to +count towards the 52. A non-substantial post would be something like the 100 +word essay about my weekend that I wrote in first grade, which went something +like "My weekend was really really really ('really' 96 more times) really really +boring". + +Other than that, it's pretty open-ended. + +## Why + +My hope is that I'll get more efficient at writing these things. Usually I take +a lot of time to craft a post, weeks in some cases. I really appreciate those of +you that have taken the time to read them, but to be frank the time commitment +just isn't worth it. With practice I can hopefully learn what exactly I have to +say that others are interested in, and then go back to spending a lot of time +crafting the things being said. + +Another part of this is going to be learning how to market myself properly, +something I've always been reticent to do. Our world is filled with people +shouting into the void of the internet, each with their own reasons for wanting +to be heard. Does it need another? Probably not. But here I am. I guess what I'm +really going to be doing is learning _why_ I want to do this; I know I want to +have others read what I write, but is it possible that that desire isn't +entirely selfish? Is it ok if it is? + +Once I'm comfortable with why I'm doing this it will, hopefully, be easier to +figure out a marketing avenue I feel comfortable with putting a lot of energy +towards. There must be at least _one_... + +So consider this #1, world. Only 51 to go. diff --git a/src/_posts/2021-01-09-ginger.md b/src/_posts/2021-01-09-ginger.md new file mode 100644 index 0000000..3a97d7f --- /dev/null +++ b/src/_posts/2021-01-09-ginger.md @@ -0,0 +1,352 @@ +--- +title: >- + Ginger +description: >- + Yes, it does exist. +--- + +This post is about a programming language that's been bouncing around in my head +for a _long_ time. I've tried to actually implement the language three or more +times now, but everytime I get stuck or run out of steam. It doesn't help that +everytime I try again the form of the language changes significantly. But all +throughout the name of the language has always been "Ginger". It's a good name. + +In the last few years the form of the language has somewhat solidified in my +head, so in lieu of actually working on it I'm going to talk about what it +currently looks like. + +## Abstract Syntax Lists + +_In the beginning_ there was assembly. Well, really in the beginning there were +punchcards, and probably something even more esoteric before that, but it was +all effectively the same thing: a list of commands the computer would execute +sequentially, with the ability to jump to odd places in the sequence depending +on conditions at runtime. For the purpose of this post, we'll call this class of +languages "abstract syntax list" (ASL) languages. + +Here's a hello world program in my favorite ASL language, brainfuck: + +``` +++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.++ ++.------.--------.>>+.>++. +``` + +(If you've never seen brainfuck, it's deliberately unintelligible. But it _is_ +an ASL, each character representing a single command, executed by the brainfuck +runtime from left to right.) + +ASLs did the job at the time, but luckily we've mostly moved on past them. + +## Abstract Syntax Trees + +Eventually programmers upgraded to C-like languages. Rather than a sequence of +commands, these languages were syntactically represented by an "abstract syntax +tree" (AST). Rather than executing commands in essentially the same order they +are written, an AST language compiler reads the syntax into a tree of syntax +nodes. What it then does with the tree is language dependent. + +Here's a program which outputs all numbers from 0 to 9 to stdout, written in +(slightly non-idiomatic) Go: + +```go +i := 0 +for { + if i == 10 { + break + } + fmt.Println(i) + i++ +} +``` + +When the Go compiler sees this, it's going to first parse the syntax into an +AST. The AST might look something like this: + +``` +(root) + |-(:=) + | |-(i) + | |-(0) + | + |-(for) + |-(if) + | |-(==) + | | |-(i) + | | |-(10) + | | + | |-(break) + | + |-(fmt.Println) + | |-(i) + | + |-(++) + |-(i) +``` + +Each of the non-leaf nodes in the tree represents an operation, and the children +of the node represent the arguments to that operation, if any. From here the +compiler traverses the tree depth-first in order to turn each operation it finds +into the appropriate machine code. + +There's a sub-class of AST languages called the LISP ("LISt Processor") +languages. In a LISP language the AST is represented using lists of elements, +where the first element in each list denotes the operation and the rest of the +elements in the list (if any) represent the arguments. Traditionally each list +is represented using parenthesis. For example `(+ 1 1)` represents adding 1 and +1 together. + +As a more complex example, here's how to print numbers 0 through 9 to stdout +using my favorite (and, honestly, only) LISP, Clojure: + +```clj +(doseq + [n (range 10)] + (println n)) +``` + +Much smaller, but the idea is there. In LISPs there is no differentiation +between the syntax, the AST, and the language's data structures; they are all +one and the same. For this reason LISPs generally have very powerful macro +support, wherein one uses code written in the language to transform code written +in that same language. With macros users can extend a language's functionality +to support nearly anything they need to, but because macro generation happens +_before_ compilation they can still reap the benefits of compiler optimizations. + +### AST Pitfalls + +The ASL (assembly) is essentially just a thin layer of human readability on top +of raw CPU instructions. It does nothing in the way of representing code in the +way that humans actually think about it (relationships of types, flow of data, +encapsulation of behavior). The AST is a step towards expressing code in human +terms, but it isn't quite there in my opinion. Let me show why by revisiting the +Go example above: + +```go +i := 0 +for { + if i > 9 { + break + } + fmt.Println(i) + i++ +} +``` + +When I understand this code I don't understand it in terms of its syntax. I +understand it in terms of what it _does_. And what it does is this: + +* with a number starting at 0, start a loop. +* if the number is greater than 9, stop the loop. +* otherwise, print the number. +* add one to the number. +* go to start of loop. + +This behavior could be further abstracted into the original problem statement, +"it prints numbers 0 through 9 to stdout", but that's too general, as there +are different ways for that to be accomplished. The Clojure example first +defines a list of numbers 0 through 9 and then iterates over that, rather than +looping over a single number. These differences are important when understanding +what code is doing. + +So what's the problem? My problem with ASTs is that the syntax I've written down +does _not_ reflect the structure of the code or the flow of data which is in my +head. In the AST representation if you want to follow the flow of data (a single +number) you _have_ to understand the semantic meaning of `i` and `:=`; the AST +structure itself does not convey how data is being moved or modified. +Essentially, there's an extra implicit transformation that must be done to +understand the code in human terms. + +## Ginger: An Abstract Syntax Graph Language + +In my view the next step is towards using graphs rather than trees for +representing our code. A graph has the benefit of being able to reference +"backwards" into itself, where a tree cannot, and so can represent the flow of +data much more directly. + +I would like Ginger to be an ASG language where the language is the graph, +similar to a LISP. But what does this look like exactly? Well, I have a good +idea about what the graph _structure_ will be like and how it will function, but +the syntax is something I haven't bothered much with yet. Representing graph +structures in a text file is a problem to be tackled all on its own. For this +post we'll use a made-up, overly verbose, and probably non-usable syntax, but +hopefully it will convey the graph structure well enough. + +### Nodes, Edges, and Tuples + +All graphs have nodes, where each node contains a value. A single unique value +can only have a single node in a graph. Nodes are connected by edges, where +edges have a direction and can contain a value themselves. + +In the context of Ginger, a node represents a value as expected, and the value +on an edge represents an operation to take on that value. For example: + +``` +5 -incr-> n +``` + +`5` and `n` are both nodes in the graph, with an edge going from `5` to `n` that +has the value `incr`. When it comes time to interpret the graph we say that the +value of `n` can be calculated by giving `5` as the input to the operation +`incr` (increment). In other words, the value of `n` is `6`. + +What about operations which have more than one input value? For this Ginger +introduces the tuple to its graph type. A tuple is like a node, except that it's +anonymous, which allows more than one to exist within the same graph, as they do +not share the same value. For the purposes of this blog post we'll represent +tuples like this: + +``` +1 -> } -add-> t +2 -> } +``` + +`t`'s value is the result of passing a tuple of two values, `1` and `2`, as +inputs to the operation `add`. In other words, the value of `t` is `3`. + +For the syntax being described in this post we allow that a single contiguous +graph can be represented as multiple related sections. This can be done because +each node's value is unique, so when the same value is used in disparate +sections we can merge the two sections on that value. For example, the following +two graphs are exactly equivalent (note the parenthesis wrapping the graph which +has been split): + +``` +1 -> } -add-> t -incr-> tt +2 -> } +``` + +``` +( + 1 -> } -add-> t + 2 -> } + + t -incr-> tt +) +``` + +(`tt` is `4` in both cases.) + +A tuple with only one input edge, a 1-tuple, is a no-op, semantically, but can +be useful structurally to chain multiple operations together without defining +new value names. In the above example the `t` value can be eliminated using a +1-tuple. + +``` +1 -> } -add-> } -incr-> tt +2 -> } +``` + +When an integer is used as an operation on a tuple value then the effect is to +output the value in the tuple at that index. For example: + +``` +1 -> } -0-> } -incr-> t +2 -> } +``` + +(`t` is `2`.) + +### Operations + +When a value sits on an edge it is used as an operation on the input of that +edge. Some operations will no doubt be builtin, like `add`, but users should be +able to define their own operations. This can be done using the `in` and `out` +special values. When a graph is used as an operation it is scanned for both `in` +and `out` values. `in` is set to the input value of the operation, and the value +of `out` is used as the output of the operation. + +Here we will define the `incr` operation and then use it. Note that we set the +`incr` value to be an entire sub-graph which represents the operation's body. + +``` +( in -> } -add-> out + 1 -> } ) -> incr + +5 -incr-> n +``` + +(`n` is `6`.) + +The output of an operation may itself be a tuple. Here's an implementation and +usage of `double-incr`, which increments two values at once. + +``` +( in -0-> } -incr-> } -> out + } + in -1-> } -incr-> } ) -> double-incr + +1 -> } -double-incr-> t -add-> tt +2 -> } +``` + +(`t` is a 2-tuple with values `2`, and `3`, `tt` is `5.) + +### Conditionals + +The conditional is a bit weird, and I'm not totally settled on it yet. For now +we'll use this. The `if` operation expects as an input a 2-tuple whose first +value is a boolean and whose second value will be passed along. The `if` +operation is special in that it has _two_ output edges. The first will be taken +if the boolean is true, the second if the boolean is false. The second value in +the input tuple, the one to be passed along, is used as the input to whichever +branch is taken. + +Here is an implementation and usage of `max`, which takes two numbers and +outputs the greater of the two. Note that the `if` operation has two output +edges, but our syntax doesn't represent that very cleanly. + +``` +( in -gt-> } -if-> } -0-> out + in -> } -> } -1-> out ) -> max + +1 -> } -max-> t +2 -> } +``` + +(`t` is `2`.) + +It would be simple enough to create a `switch` macro on top of `if`, to allow +for multiple conditionals to be tested at once. + +### Loops + +Loops are tricky, and I have two thoughts about how they might be accomplished. +One is to literally draw an edge from the right end of the graph back to the +left, at the point where the loop should occur, as that's conceptually what's +happening. But representing that in a text file is difficult. For now I'll +introduce the special `recur` value, and leave this whole section as TBD. + +`recur` is cousin of `in` and `out`, in that it's a special value and not an +operation. It takes whatever value it's set to and calls the current operation +with that as input. As an example, here is our now classic 0 through 9 printer +(assume `println` outputs whatever it was input): + +``` +// incr-1 is an operation which takes a 2-tuple and returns the same 2-tuple +// with the first element incremented. +( in -0-> } -incr-> } -> out + in -1-> } ) -> incr-1 + +( in -eq-> } -if-> out + in -> } -> } -0-> } -println-> } -incr-1-> } -> recur ) -> print-range + +0 -> } -print-range-> } +10 -> } +``` + +## Next Steps + +This post is long enough, and I think gives at least a basic idea of what I'm +going for. The syntax presented here is _extremely_ rudimentary, and is almost +definitely not what any final version of the syntax would look like. But the +general idea behind the structure is sound, I think. + +I have a lot of further ideas for Ginger I haven't presented here. Hopefully as +time goes on and I work on the language more some of those ideas can start +taking a more concrete shape and I can write about them. + +The next thing I need to do for Ginger is to implement (again) the graph type +for it, since the last one I implemented didn't include tuples. Maybe I can +extend it instead of re-writing it. After that it will be time to really buckle +down and figure out a syntax. Once a syntax is established then it's time to +start on the compiler! diff --git a/src/_posts/2021-01-14-the-web.md b/src/_posts/2021-01-14-the-web.md new file mode 100644 index 0000000..4d47a57 --- /dev/null +++ b/src/_posts/2021-01-14-the-web.md @@ -0,0 +1,239 @@ +--- +title: >- + The Web +description: >- + What is it good for? +--- + +With the recent crisis in the US's democratic process, there's been much abuzz +in the world about social media's undoubted role in the whole debacle. The +extent to which the algorithms of Facebook, Twitter, Youtube, TikTok, etc, have +played a role in the radicalization of large segments of the world's population +is one popular topic. Another is the tactics those same companies are now +employing to try and euthanize the monster they made so much ad money in +creating. + +I don't want to talk about any of that; there is more to the web than +social media. I want to talk about what the web could be, and to do that I want +to first talk about what it has been. + +## Web 1.0 + +In the 1950's computers were generally owned by large organizations like +companies, universities, and governments. They were used to compute and manage +large amounts of data, and each existed independently of the other. + +In the 60's protocols began to be developed which would allow them to +communicate over large distances, and thereby share resources (both +computational and informational). + +The funding of ARPANET by the US DoD led to the initial versions of the TCP/IP +protocol in the 70's, still used today as the backbone of virtually all internet +communication. Email also came about from ARPANET around this time. + +The 80s saw the growth of the internet across the world, as ARPANET gave way to +NSFNET. It was during this time that the domain name system we use today was +developed. At this point the internet use was still mostly for large +non-commercial organizations; there was little commercial footprint, and little +private access. The first commercially available ISP, which allowed access to +the internet from private homes via dialup, wasn't launched until 1989. + +And so we find ourselves in the year 1989, when Tim Berners-Lee (TBL) first +proposed the World-Wide Web (WWW, or "the web"). You can find the original +proposal, which is surprisingly short and non-technical, +[here](https://www.w3.org/Proposal.html). + +From reading TBL's proposal it's clear that what he was after was some mechanism +for hosting information on his machine in such a way that others could find and +view it, without it needing to be explicitly sent to them. He includes the +following under the "Applications" header: + +> The application of a universal hypertext system, once in place, will cover +> many areas such as document registration, on-line help, project documentation, +> news schemes and so on. + +But out of such a humble scope grew one of the most powerful forces of the 21st +century. By the end of 1990 TBL had written the first HTML/HTTP browser and +server. By the end of 1994 sites like IMDB, Yahoo, and Bianca's Smut Shack were +live and being accessed by consumers. The web grew that fast. + +In my view the characteristic of the web which catalyzed its adoption so quickly +was the place-ness of it. The web is not just a protocol for transferring +information, like email, but instead is a _place_ where that information lives. +Any one place could be freely linked to any other place, and so complex and +interesting relations could be formed between people and ideas. The +contributions people make on the web can reverberate farther than they would or +could in any other medium precisely because those contributions aren't tied to +some one-off event or a deteriorating piece of physical infrastructure, but are +instead given a home which is both permanent and everywhere. + +The other advantage of the web, at the time, was its simplicity. HTML was so +simple it was basically human-readable. A basic HTTP server could be implemented +as a hobby project by anyone in any language. Hosting your own website was a +relatively straightforward task which anyone with a computer and an ISP could +undertake. + +This was the environment early adopters of the web found themselves in. + +## Web 2.0 + +The infamous dot-com boom took place in 2001. I don't believe this was a failure +inherent in the principles of the web itself, but instead was a product of +people investing in a technology they didn't fully understand. The web, as it +was then, wasn't really designed with money-making in mind. It certainly allowed +for it, but that wasn't the use-case being addressed. + +But of course, in this world we live in, if there's money to be made, it will +certainly be made. + +By 2003 the phrase "Web 2.0" started popping up. I remember this. To me "Web +2.0" meant a new aesthetic on the web, complete with bubble buttons and centered +fix-width paragraph boxes. But what "Web 2.0" actually signified wasn't related +to any new technology or aesthetic. It was a new strategy for how companies +could enable use of the web by non-expert users, i.e. users who don't have the +inclination or means to host their own website. Web 2.0 was a strategy for +giving everyone a _place_ of their own on the web. + +"Web 2.0" was merely a label given to a movement which had already been in +motion for years. I think the following Wikipedia excerpt describes this period +best: + + +> In 2004, the term ["Web 2.0"] began its rise in popularity when O'Reilly Media +and MediaLive hosted the first Web 2.0 conference. In their opening remarks, +John Battelle and Tim O'Reilly outlined their definition of the "Web as +Platform", where software applications are built upon the Web as opposed to upon +the desktop. The unique aspect of this migration, they argued, is that +"customers are building your business for you". They argued that the +activities of users generating content (in the form of ideas, text, videos, or +pictures) could be "harnessed" to create value. + + +In other words, Web 2.0 turned the place-ness of the web into a commodity. +Rather than expect everyone to host, or arrange for the hosting, of their own +corner of the web, the technologists would do it for them for "free"! This +coincided with the increasing complexity of the underlying technology of the +web; websites grew to be flashy, interactive, and stateful applications which +_did_ things rather than be places which _held_ things. The idea of a hyperlink, +upon which the success of the web had been founded, became merely an +implementation detail. + +And so the walled gardens began to be built. Myspace was founded in 2003, +Facebook opened to the public in 2006, Digg (the precursor to reddit) was +launched in 2004, Flickr launched in 2004 (and was bought by Yahoo in 2005), +Google bought Blogger in 2003, and Twitter launched in 2006. In effect this +period both opened the web up to everyone and established the way we still use +it today. + +It's upon these foundations that current events unfold. We have platforms whose +only incentive is towards capturing new users and holding their attention, to +the exclusion of other platforms, so they can be advertised to. Users are +enticed in because they are being offered a place on the web, a place of their +own to express themselves from, in order to find out the worth of their +expressions to the rest of the world. But they aren't expressing to the world at +large, they are expressing to a social media platform, a business, and so only +the most lucrative of voices are heard. + +So much for not wanting to talk about social media. + +## Web 3.0 + +The new hot topic in crypto and hacker circles is "Web 3.0", or the +decentralized web (dweb). The idea is that we can have all the good of the +current web (the accessibility, utility, permanency, etc) without all the bad +(the centralized platforms, censorship, advertising, etc). The way forward to +this utopian dream is by building decentralized applications (dApps). + +dApps are constructed in a way where all the users of an application help to +host all the stateful content of that application. If I, as a user, post an +image to a dApp, the idea is that other users of that same dApp would lend their +meager computer resources to ensure my image is never forgotten, and in turn I +would lend mine for theirs. + +In practice building successful dApps is enormously difficult for many reasons, +and really I'm not sure there _are_ any successful ones (to date). While I +support the general sentiment behind them, I sometimes wonder about the +efficacy. What people want from the web is a place they can call their own, a +place from which they can express themselves and share their contributions with +others with all the speed and pervasiveness that the internet offers. A dApp is +just another walled garden with specific capabilities; it offers only free +hosting, not free expression. + +## Web 2.0b + +I'm not here solely to complain (just mostly). + +Thinking back to Web 1.0, and specifically to the turning point between 1.0 and +2.0, I'd like to propose that maybe we made a wrong turn. The issue at hand was +that hosting one's own site was still too much of a technical burden, and the +direction we went was towards having businesses host them for us. Perhaps there +was another way. + +What are the specific difficulties with hosting one's own site? Here are the +ones I can think of: + +* Bad tooling: basically none of the tools you're required to use (web server, + TLS, DNS, your home router) are designed for the average person. + +* Aggregiously complex languages: making a site which looks half decent and can + do the things you want requires a _lot_ of knowledge about the underlying + languages (CSS, HTML, Javascript, and whatever your server is written in). + +* Single point-of-failure: if your machine is off, your site is down. + +* Security: it's important to stay ahead of the hackers, but it takes time to + do so. + +* Hostile environment: this is separate from security, and includes difficulties + like dynamic home IPs and bad ISP policies (such as asymetric upload/download + speeds). + +These are each separate avenues of attack. + +Bad tooling is a result of the fact that devs generally build technology for +themselves or their fellow devs, and only build for others when they're being +paid to do it. This is merely an attitude problem. + +Complex languages are really a sub-category of bad tooling. The concesus seems +to be that the average person isn't interested or capable of working in +HTML/CSS/JS. This may be true today, but it wasn't always. Most of my friends in +middle and high school were well within their interest and capability to create +the most heinous MySpace pages the world has ever seen, using nothing but CSS +generators and scraps of shitty JS they found lying around. So what changed? The +tools we use to build those pages did. + +A hostile environment is not something any individual can do anything about, but +in the capitalist system we exist in we can at least hold in faith the idea that +eventually us customers will get what we want. It may take a long time, but all +monopolies break eventually, and someone will eventually sell us the internet +access we're asking for. If all other pieces are in place I think we'll have +enough people asking to make a difference. + +For single point-of-failure we have to grant that more than one person will be +involved, since the vast majority of people aren't going to be able to keep one +machine online consistently, let alone two or more machines. But I think we all +know at least one person who could keep a machine online with some reliability, +and they probably know a couple of other people who could do so as well. What +I'm proposing is that, rather than building tools for global decentralization, +we need tools for local decentralization, aka federation. We can make it +possible for a group of people to have their presence managed by a subset of +themselves. Those with the ability could help to host the online presence of +their family, friends, churches, etc, if given the right tools. + +Security is the hard one, but also in many ways isn't. What most people want +from the web is a place from which to express themselves. Expression doesn't +take much more than a static page, usually, and there's not much attacking one +can do against a static page. Additionally, we've already established that +there's going to be at least a _couple_ of technically minded people involved in +hosting this thing. + +So that's my idea that I'd like to build towards. First among these ideas is +that we need tools which can help people help each other host their content, and +on top of that foundation a new web can be built which values honest expression +rather than the lucrative madness which our current algorithms love so much. + +This project was already somewhat started by +[Cryptorado](https://github.com/Cryptorado-Community/Cryptorado-Node) while I +was a regular attendee, but since COVID started my attendance has fallen off. +Hopefully one day it can resume. In the meantime I'm going to be working on +setting up these tools for myself, and see how far I can get. |