diff options
-rw-r--r-- | erlang-tcp-socket-pull-pattern.md | 133 | ||||
-rw-r--r-- | goplus.md | 29 |
2 files changed, 102 insertions, 60 deletions
diff --git a/erlang-tcp-socket-pull-pattern.md b/erlang-tcp-socket-pull-pattern.md index dca91e3..419d005 100644 --- a/erlang-tcp-socket-pull-pattern.md +++ b/erlang-tcp-socket-pull-pattern.md @@ -1,46 +1,62 @@ # Erlang, tcp sockets, and active true -If you don't know erlang then [you're missing out](http://learnyousomeerlang.com/content). -If you do know erlang, you've probably at some point done something with tcp sockets. Erlang's -highly concurrent model of execution lends itself well to server programs where a high number -of active connections is desired. Each thread can autonomously handle its single client, -greatly simplifying the logic of the whole application while still retaining -[great performance characteristics](http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1). +If you don't know erlang then [you're missing out][0]. If you do know erlang, +you've probably at some point done something with tcp sockets. Erlang's highly +concurrent model of execution lends itself well to server programs where a high +number of active connections is desired. Each thread can autonomously handle its +single client, greatly simplifying the logic of the whole application while +still retaining [great performance characteristics][1]. # Background -For an erlang thread which owns a single socket there are three different ways to receive data -off of that socket. These all revolve around the `active` [setopts](http://www.erlang.org/doc/man/inet.html#setopts-2) -flag. A socket can be set to one of: - -* `{active,false}` - All data must be obtained through [recv/2](http://www.erlang.org/doc/man/gen_tcp.html#recv-2) - calls. This amounts to syncronous socket reading. -* `{active,true}` - All data on the socket gets sent to the controlling thread as a normal erlang - message. It is the thread's responsibility to keep up with the buffered data - in the message queue. This amounts to asyncronous socket reading. -* `{active,once}` - When set the socket is placed in `{active,true}` for a single packet. That - is, once set the thread can expect a single message to be sent to when data - comes in. To receive any more data off of the socket the socket must either - be read from using [recv/2](http://www.erlang.org/doc/man/gen_tcp.html#recv-2) - or be put in `{active,once}` or `{active,true}`. +For an erlang thread which owns a single socket there are three different ways +to receive data off of that socket. These all revolve around the `active` +[setopts][2] flag. A socket can be set to one of: + +* `{active,false}` - All data must be obtained through [recv/2][3] calls. This + amounts to syncronous socket reading. + +* `{active,true}` - All data on the socket gets sent to the controlling thread + as a normal erlang message. It is the thread's + responsibility to keep up with the buffered data in the + message queue. This amounts to asyncronous socket reading. + +* `{active,once}` - When set the socket is placed in `{active,true}` for a + single packet. That is, once set the thread can expect a + single message to be sent to when data comes in. To receive + any more data off of the socket the socket must either be + read from using [recv/2][3] or be put in `{active,once}` or + `{active,true}`. # Which to use? -Many (most?) tutorials advocate using `{active,once}` in your application [0][1][2]. This has to do with usability and -security. When in `{active,true}` it's possible for a client to flood the connection faster than the receiving process -will process those messages, potentially eating up a lot of memory in the VM. However, if you want to be able to receive -both tcp data messages as well as other messages from other erlang processes at the same time you can't use `{active,false}`. -So `{active,once}` is generally preferred because it deals with both of these problems quite well. +Many (most?) tutorials advocate using `{active,once}` in your application +\[0]\[1]\[2]. This has to do with usability and security. When in `{active,true}` +it's possible for a client to flood the connection faster than the receiving +process will process those messages, potentially eating up a lot of memory in +the VM. However, if you want to be able to receive both tcp data messages as +well as other messages from other erlang processes at the same time you can't +use `{active,false}`. So `{active,once}` is generally preferred because it +deals with both of these problems quite well. # Why not to use `{active,once}` -Here's what your classic `{active,once}` enabled tcp socket implementation will probably look like: +Here's what your classic `{active,once}` enabled tcp socket implementation will +probably look like: ```erlang -module(tcp_test). -compile(export_all). --define(TCP_OPTS, [binary, {packet, raw}, {nodelay,true}, {active, false}, {reuseaddr, true}, {keepalive,true}, {backlog,500}]). +-define(TCP_OPTS, [ + binary, + {packet, raw}, + {nodelay,true}, + {active, false}, + {reuseaddr, true}, + {keepalive,true}, + {backlog,500} +]). %Start listening listen(Port) -> @@ -66,15 +82,16 @@ read_loop(Socket) -> end. ``` -This code isn't actually usable for a production system; it doesn't even spawn a new process for the new socket. But that's not -the point I'm making. If I run it with `tcp_test:listen(8000)`, and in other window do: +This code isn't actually usable for a production system; it doesn't even spawn a +new process for the new socket. But that's not the point I'm making. If I run it +with `tcp_test:listen(8000)`, and in other window do: ```bash while [ 1 ]; do echo "aloha"; done | nc localhost 8000 ``` -We'll be flooding the the server with data pretty well. Using [eprof](http://www.erlang.org/doc/man/eprof.html) we can get an idea -of how our code performs, and where the hang-ups are: +We'll be flooding the the server with data pretty well. Using [eprof][4] we can +get an idea of how our code performs, and where the hang-ups are: ```erlang 1> eprof:start(). @@ -111,18 +128,30 @@ inet:setopts/2 12303598 5.72 4533863 [ 0.37] erlang:port_control/3 12303600 77.13 61085040 [ 4.96] ``` -eprof shows us where our process is spending the majority of its time. The `%` column indicates percentage of time the process spent -during profiling inside any function. We can pretty clearly see that the vast majority of time was spent inside `erlang:port_control/3`, -the BIF that `inet:setopts/2` uses to switch the socket to `{active,once}` mode. Amongst the calls which were called on every loop, -it takes up by far the most amount of time. In addition all of those other calls are also related to `inet:setopts/2`. +eprof shows us where our process is spending the majority of its time. The `%` +column indicates percentage of time the process spent during profiling inside +any function. We can pretty clearly see that the vast majority of time was spent +inside `erlang:port_control/3`, the BIF that `inet:setopts/2` uses to switch the +socket to `{active,once}` mode. Amongst the calls which were called on every +loop, it takes up by far the most amount of time. In addition all of those other +calls are also related to `inet:setopts/2`. -I'm gonna rewrite our little listen server to use `{active,true}`, and we'll do it all again: +I'm gonna rewrite our little listen server to use `{active,true}`, and we'll do +it all again: ```erlang -module(tcp_test). -compile(export_all). --define(TCP_OPTS, [binary, {packet, raw}, {nodelay,true}, {active, false}, {reuseaddr, true}, {keepalive,true}, {backlog,500}]). +-define(TCP_OPTS, [ + binary, + {packet, raw}, + {nodelay,true}, + {active, false}, + {reuseaddr, true}, + {keepalive,true}, + {backlog,500} +]). %Start listening listen(Port) -> @@ -194,20 +223,30 @@ erlang:port_control/3 3 0.00 59 [ 19.67] tcp_test:read_loop/1 20716370 100.00 12187488 [ 0.59] ``` -This time our process spent almost no time at all (according to eprof, 0%) fiddling with the socket opts. -Instead it spent all of its time in the read_loop doing the work we actually want to be doing. +This time our process spent almost no time at all (according to eprof, 0%) +fiddling with the socket opts. Instead it spent all of its time in the +read_loop doing the work we actually want to be doing. # So what does this mean? -I'm by no means advocating never using `{active,once}`. The security concern is still a completely valid concern and one -that `{active,once}` mitigates quite well. I'm simply pointing out that this mitigation has some fairly serious performance -implications which have the potential to bite you if you're not careful, especially in cases where a socket is going to be -receiving a large amount of traffic. +I'm by no means advocating never using `{active,once}`. The security concern is +still a completely valid concern and one that `{active,once}` mitigates quite +well. I'm simply pointing out that this mitigation has some fairly serious +performance implications which have the potential to bite you if you're not +careful, especially in cases where a socket is going to be receiving a large +amount of traffic. # Meta -These tests were done using R15B03, but I've done similar ones in R14 and found similar results. I have not tested R16. +These tests were done using R15B03, but I've done similar ones in R14 and found +similar results. I have not tested R16. + +* \[0] http://learnyousomeerlang.com/buckets-of-sockets +* \[1] http://www.erlang.org/doc/man/gen_tcp.html#examples +* \[2] http://erlycoder.com/25/erlang-tcp-server-tcp-client-sockets-with-gen_tcp -* [0] http://learnyousomeerlang.com/buckets-of-sockets -* [1] http://www.erlang.org/doc/man/gen_tcp.html#examples -* [2] http://erlycoder.com/25/erlang-tcp-server-tcp-client-sockets-with-gen_tcp +[0]: http://learnyousomeerlang.com/content +[1]: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 +[2]: http://www.erlang.org/doc/man/inet.html#setopts-2 +[3]: http://www.erlang.org/doc/man/gen_tcp.html#recv-2 +[4]: http://www.erlang.org/doc/man/eprof.html @@ -1,16 +1,19 @@ # Go and project root -Compared to other languages go has some strange behavior regarding its project root settings. If you -import a library called `somelib`, go will look for a `src/somelib` folder in all of the folders in -the `$GOPATH` environment variable. This works nicely for globally installed packages, but it makes -encapsulating a project with a specific version, or modified version, rather tedious. Whenever you go -to work on this project you'll have to add its path to your `$GOPATH`, or add the path permanently, -which could break other projects which may use a different version of `somelib`. - -My solution is in the form of a simple script I'm calling go+. go+ will search in currrent directory -and all of its parents for a file called `GOPROJROOT`. If it finds that file in a directory, it -prepends that directory's absolute path to your `$GOPATH` and stops the search. Regardless of whether -or not `GOPROJROOT` was found go+ will passthrough all arguments to the actual go call. The +Compared to other languages go has some strange behavior regarding its project +root settings. If you import a library called `somelib`, go will look for a +`src/somelib` folder in all of the folders in the `$GOPATH` environment +variable. This works nicely for globally installed packages, but it makes +encapsulating a project with a specific version, or modified version, rather +tedious. Whenever you go to work on this project you'll have to add its path to +your `$GOPATH`, or add the path permanently, which could break other projects +which may use a different version of `somelib`. + +My solution is in the form of a simple script I'm calling go+. go+ will search +in currrent directory and all of its parents for a file called `GOPROJROOT`. If +it finds that file in a directory, it prepends that directory's absolute path to +your `$GOPATH` and stops the search. Regardless of whether or not `GOPROJROOT` +was found go+ will passthrough all arguments to the actual go call. The modification to `$GOPATH` will only last the duration of the call. As an example, consider the following: @@ -23,8 +26,8 @@ As an example, consider the following: /hello.go ``` -If `hello.go` depends on `somelib`, as long as you run go+ from `/tmp/hello` or one of its children -your project will still compile +If `hello.go` depends on `somelib`, as long as you run go+ from `/tmp/hello` or +one of its children your project will still compile Here is the source code for go+: |