Four Days of Go

By Evan Miller

April 21, 2015 (Japanese translation)

Part of my work involves the mild reverse-engineering of binary file formats. I say “mild” because usually other people do all of the actual work; I just have to figure out what an extra flag field or two means, and I then take as much credit as possible for the discovery on my blog.

To see what’s in the guts of a binary file, I use a hex editor, though even my favorite one is a bit of a chore to use. When I’m trying to figure out a file format, I want to mark it up with my hypotheses about what various bytes may mean, but currently there aren’t any hex editors that will let me do that. My workflow at present is to print out the hex representation of a binary file onto physical sheets of paper, and then mark them up with a ball-point pen that I received last year at a conference about technology.

To save a few trees, and to ensure that my Conference Pen Collection remains in pristine condition for a future eBay auction, I decided to write my own hex editor suited for reverse-engineering tasks. I’ve had a hex-editor name picked out for a while now (Hecate: The Hex Editor From Hell), as well as a color palette and appropriate thematic iconography (think Dante’s Inferno meets Scorsese’s Taxi Driver).

I also had some visual ideas for the program worked out, but before I could get serious about tinkering, I realized I needed to choose a development platform. I use three platforms on a regular basis (OS X, a terminal, and the World Wide Web), so I decided to organize a three-way imaginary cage fight between them, i.e. construct a list of pros and cons for each.

I know OS X pretty well at this point, and I thought about writing the program in Swift. However, I wanted to make Hecate cross-platform and open-source, so that other people could contribute to the project without me having to pay them. A browser version could make sense, but I’d rather not spend my time running a Hex Editing Web Service, nor do I want my users babysitting a local Node.js instance or whatever on their computer.

I spent a few minutes considering a cross-platform C++ toolkit such as Qt. Then the police arrived and told me in a calming manner to put down the hunting rifle, so that left me with the last (and original) computing platform: the terminal.

Apparently terminal applications have been experiencing something of a retro chic Renaissance, driven by the California New Wave of systems programming languages. I still enjoy programming in old-school, Jersey-style, scorched-earth C, and gave some thought to writing Hecate in it, but I was assured by several GitHub pages that the ncurses library is a horrible macro-infested mess, and decided to explore other options.

There’s actually a new terminal library written in C called termbox; that was my first choice, but then I saw the author mention that the Go version of the library had more features. More features, of course, are always a good thing to have, especially in a library you’ve never used before, so I thought, what the hay, let’s learn a new programming language.

Hello, Go: First impressions

When I program I usually think in C, that is, as I type I try to think about the C code that’s actually being executed when the program runs. I tend to prefer languages where I have a reasonable chance of understanding what will be executed; but I also appreciate being able to throw caution to the garbage collector and bang out code in a hurry when the occasion calls for it.

Go code will look at least a little familiar to C programmers. It carries over C’s primitive types, as well as its semantics with regard to values and pointers. From the start I had a decent mental model of how things are passed to functions in Go, and under what circumstances the caller can expect data to change. Like C, Go shuns classes in favor of plain structs, but then lets you make code more object-oriented with interfaces, which I’ll discuss in a minute.

First let’s talk about the basic syntax. Go is statically typed with type inference, which saves some typing, and it splits out the declaration/initialization and assignment operators into := and =, like this:

    my_counter := 1 // an exciting new variable

    my_counter = 2 // update the variable

    my_counter := 3 // this produces an error

Although when I started I wasn’t accustomed to the = versus := distinction, I began to like it as a way to catch editing errors. Functions can have multiple return values, but the rules for multiple assignment feel a little odd; the left hand side of a multiple assignment is allowed to have a mix of declared and undeclared variables, but you need to use := when there is at least one undeclared variable on the left. That is to say:

    my_counter := 1  // an exciting new variable

    my_counter, _ = update_counter(my_counter) // OK

    my_counter, _ := update_counter(my_counter) // not OK

    // The following line is OK. Even though my_counter already exists, 
    // error is a new variable, so := is appropriate
    my_counter, error := update_counter(my_counter)

To me it seems a little arbitrary that the := should somehow dominate the =, and it also introduces room for the very bugs that := was supposed to prevent (e.g. I might think I am declaring my_counter in two places).

I think a more logical syntax would be to have the number of colons equal to the number of new left-hand variables (::= for a double declaration, :::= for a triple declaration, etc.), but I guess the language designers couldn’t find my phone number at the critical moment during Go’s research and development phase.

Go has eliminated some traditional keywords in favor of overloading if and for. Go’s for can be used in place of C’s for, while, and while(1), and there is a two-statement version of if that I just found out about yesterday. I suppose these consolidations technically makes the language simpler, but it also makes code slightly harder to talk about. When looking at someone’s C code, I can say “use a while loop here instead of a for loop”, but with Go I would have to say “use a zero-statement for loop instead of a three-statement for loop”. It is possible, however, that the Go team has developed a set of secret signifiers to distinguish these constructions in everyday conversation and not told me about them. (Did you know that the $_ variable in Perl is called “it”? I read that in a book. If I remember correctly the name comes from a Stephen King novel.)

Go has also eliminated the ternary operator, and, for reasons that appear to be political, does not have integer Min and integer Max functions. From what I can gather on the mailing list threads, the language designers are against polymorphism, as well as adding letters to function names, so unlike the standard C library which operates on float, double, and long double, as well as int and long where appropriate (e.g. absolute value), the Go standard math library operates only on float64. Since there’s no implicit casting to floats, this is rather annoying if you’re using integers, such as when you are counting things. It also makes Go somewhat less useful for heavy number-crunching where you might want single- or extended-precision versions of floating-point functions.

(Incidentally, the only language I know that gets polymorphism right for dealing with multiple kinds of mathematical objects is Julia — though last time I checked it was still lacking long double / float80 support.)

By the way, if anyone who works on the Go math library is reading this, there are a few important functions missing.

The rest of the standard library looks good to me so far — I like the design of the string-formatting library, and the Unicode support is excellent. rune is an odd way to name your character type, but I suppose they wanted to avoid confusion with C’s 8-bit char. (In English usage, rune refers specifically to a character from a medieval Germanic alphabet, or a glyph believed to have magical powers. While some people might object to the character type having mystical connotations in Go, I fully support all references to medieval texts and/or the occult in programming languages.)

Go is “OO-ish” with its use of interfaces — interfaces are basically duck typing for your structs (as well as other types, because, well, just because). I had some trouble at first understanding how to get going with interfaces and pointers. You can write methods that act on WhateverYouWant — and an interface is just an assertion that WhateverYouWant has methods for X, Y, and Z. It wasn’t really clear to me whether methods should be acting on values or pointers. Go sort of leaves you to your own devices here.

At first I wrote my methods on the values, which seemed like the normal, American thing to do. The problem of course is that when passed to methods, values are copies of the original data, so you can’t do any OO-style mutations on the data. So instead methods should operate on the pointers, right?

This is where things get a little bit tricky if you’re accustomed to subclassing. If your operate on pointers, then your interface applies to the pointer, not to the struct (value). So if in Java you had a Car with RaceCar and GetawayCar as subclasses, in Go you’ll have an interface Car — which is implemented not by RaceCar and GetawayCar, but instead by their pointers RaceCar* and GetawayCar*.

This creates some friction when you’re trying to manage your car collection. For example, if you want an array with values of type Car, you need an array of pointers, which means you have to need separate storage for the actual RaceCar and GetawayCar values, either on the stack with a temporary variable or on the heap with calls to new. The design of interfaces is consistent, and I generally like it, but it had me scratching my head for a while as I got up to speed with all the pointers to my expensive and dangerous automobiles.

Go is garbage-collected. I personally think Swift/Objective-C-style Automatic Reference Counting would have been a better choice for a statically typed systems language, since it gives you the brevity benefits without the GC pauses. I’m sure this has been argued to death elsewhere, so I’ll save my GC rant for a very boring dinner party.

One of Go’s major selling points is its concurrency support. I have not yet played with its concurrency features, cutely called goroutines. My impression from the description is that while goroutines are an advancement over vanilla C and C++, Go lacks a good story for handling programmer errors in a concurrent environment. Normal errors are bubbled up as values, but if there’s a programmer error (e.g., index out of range), the program panics and shuts down.

For single-threaded programs, this is a reasonable strategy, but it doesn’t play very well with Go’s concurrency model. If a goroutine panics, either you take down the whole program, or you recover — but then your shared memory may be left in an inconsistent state. That is, Go assumes programmers will not make any mistakes in the recovery process — which is not a very good assumption, since it was programmer error that brought about the panic in the first place. As far as I know, the only language that really gets this right is Erlang, which is designed around shared-nothing processes, and thus programmer errors are properly contained inside the processes where they occur.

(It’s also worth mentioning that you can get Go-style M:N concurrency model in C by using Apple’s libdispatch. In conjunction with block syntax, it’s a fairly nice solution, though like Go, it’s not robust to programmer error.)

I had previously read about Go’s refusal to compile programs with unused import statements, but I didn’t really believe it until, well, I couldn’t compile a Go program that contained an unused import statement. (The same goes for unused variables.) The Go FAQ gets a bit pedantic on this point — explaining to you why it’s for your own good — but in practice, it makes the language less fun to tinker with. I prefer to try things out and get them working, then go back later and clean things up. Go basically forces you to have clean code all along, which is a bit like forcing a scientist to wipe down the workbench and rinse all the beakers between every experiment, or forcing a writer to run the spell checker after every cigarette. It sounds like good practice, but it comes with a cost, and it’s a decision that’s probably best left to the person it immediately affects, rather than to the tool designers.

As an aside, I personally would like to see a version of Go called “Sloppy Go” that will only compile programs that contain at least one unused import and several unused variables, and maybe an unmatched parenthesis, just to ensure that the programmer still knows how to have fun.

I was trying to think of why the Go designers thought it was such a good idea to refuse to compile programs with unused variables. I have a theory, and will take a detour here into what I believe to be the psychological foundations of the Go programming language. I call it the Autistic Gopher Hypothesis.

The Autistic Gopher Hypothesis

I didn’t mention the very first impression I had of Go. On the Go homepage, there is a gopher — the language mascot — facing you. But he’s looking to the left.

Other times he’s looking to the right.

Even when he’s looking in your direction, it’s like he’s looking slightly upwards, perhaps at your toupée.

There was always something a little unsettling to me about the Go gopher. He’s always moving around, and never quite makes eye contact with the viewer. Compared to the Go gopher, a devil looks downright approachable. Even a penguin looks warm:

The Go gopher doesn’t look dangerous per se, but doesn’t he seem a little… odd? He faces you head-on as if he wants your attention and approval, but he’s not engaging you, and certainly not listening to you. If I had to guess, I’d say the Go gopher suffers from a mild form of autism.

I get the same feeling about the Go language. It feels like it is designed by an obsessive personality — obsessed with build times in particular, but also having an obsession with detail, someone who rarely makes mistakes when writing code, who generally will not run code until it appears to be complete and correct.

Normally I’d appreciate these qualities in a compiler writer, but I feel that the designer went too far, to the point of being antisocial, i.e. attempting to impose arbitrary rules on the language users. I imagine that this person is tired of dealing with warning-riddled code produced by colleagues — code full of unused variables and imports, slow-building code that takes up the designer’s precious time — and has decided to exert control over the type of code written by colleagues not by the normal organizational and political processes (e.g., lobbying for -Wall -Werror on the build server), but by producing a compiler that refuses any input that doesn’t meet the designer’s own exacting standards for computer code. The designer realizes that giving any ground, e.g. having compiler warnings of any kind, creates a potential political battle within the designer’s organization. Thus the designer has circumvented the normal give-and-take over the build server configuration simply by eliminating flags from the compiler.

In other words, Go represents a kind of Machiavellian power play, orchestrated by slow-and-careful programmers who are tired of suffering for the sins of fast-and-loose programmers. The Go documentation refers quite often to intolerable 45-minute build times suffered by the original designers, and I can’t help but imagine them sitting around and seething about all those unused imports from those “other” programmers, that is, the “bad” programmers. Their solution was not to engage and educate those programmers to change their habits, but rather design a new language that the bad programmers would be compelled to use — and tie down the language sufficiently so that “bad” practices, such as a program containing unused variables, were impossible.

Reading Go’s mailing list and documentation, I get a similar sense of refusal-to-engage — the authors are communicative, to be sure, but in a didactic way. They seem tired of hearing people’s ideas, as if they’ve already thought of everything, and the relative success of Go at Google and elsewhere has only led them to turn the volume knob down. Which is a shame, because they’ll probably miss out on some good ideas (including my highly compelling, backwards-incompatible, double-triple-colon-assignment proposal mentioned above).

Under this theory, more of the language choices start to make sense. There is no ternary operator because the language designers were tired of dealing with other people’s use of ternary operators. There is One True Way To Format Code — embodied in gofmt — because the designers were tired of how other people formatted their code. Rather than debate or engage, it was easier to make a new language and shove the new rules onto everyone by coupling it with Very Fast Build Times, a kind of veto-proof Defense Spending Bill in the Congress of computer programming. In this telling, the story of Go is really a tale of revenge, not just against slow builds, but against all kinds of sloppy programming.

Which in my opinion is too bad, because I myself am a sloppy programmer. I love writing sloppy code. Not because I like having sloppy code, or maintaining sloppy code — but because I like to tinker and play with code. I like trying a bunch of different library calls to see exactly what they do. I like trying a bunch of interface ideas and seeing which works best. The faster I can get results from my code, the faster I can understand the problem at hand. For me, writing code is as much about acquiring knowledge as it is about producing something of lasting value. So in the process of writing code, I’ll leave behind a wasteland of fallow variables and futile imports, but I don’t really care, because there’s a good chance I’ll throw away the whole file anyway. Frankly, my unused variables are none of anyone’s damn business but my own.

In that light, although Go is a productive language compared to C, the Go compiler’s overt pedantry is a significant hindrance to trying out ideas with code, and getting one of those errors can be a real buzzkill. I still like writing Go code, but overall I fear that Go has sacrificed the values of fun, exploration, and knowledge-seeking in favor of the language designers’ perceived political needs at their current place of employment.

Up From Below

Despite my misgivings over the absence of Sloppy Go, and the waking nightmares I have about the Go gopher wearing my Peter Pan pajamas and murdering me in my sleep, on the whole I’ve been enjoying my initial experiences with the Go language. I was surprised at how idiot-proof it was to build things — you just type “go build” and almost instantly have a self-contained executable. This does make me wonder how things went so badly with make, makemaker, autoconf, aclocal, and the rest of the Texas Toolchain Massacre.

Termbox, by the way, is a fun library to work with. It gives you a key press handler and an API for putting colored characters at points; that’s pretty much it. If you’re feeling crushed beneath the twin behemoths of browser programming and scrum meetings, termbox is a great way to attempt to resuscitate your dying sense of worldly wonder and recapture your faded feelings of youth. I highly recommend it.

To get my initial groove on with termbox, I made a dumb program that displays all 256 terminal colors. It looks like this:

Once I figured out how to read a file, I had the beginnings of a hex viewer:

And check it out, responsive terminal design:

Go is productive enough that I’ve been enjoying implementing things from scratch like collapsible widgets and navigating a viewport. In order to do evil things like convert raw bytes to floats, I chose to use the “unsafe” package, which made me feel manly, powerful, and highly supportive of private gun ownership. Interfacing with C appears to be straightforward, though I feel like the compiler may want a criminal-background check and 30 day waiting period before letting me use it.

For my hex editor, the only real costs compared to C are the garbage collector, which I don’t anticipate will be even the slightest of problems, and the periodic annoyance with compiler’s draconian stance toward unused variables, which I anticipate will be a cosmic, eternally recurring Groundhog Day of suffering, rue, and lament.

Nonetheless, thus far I’m glad I chose Go over C to implement Hecate: The Hex Editor From Hell. The tradeoff has been worth it, and I’m looking forward to continuing development next weekend and beyond. It’s been great fun to discover terminal programming, which is a welcome relief from worrying about embedded fonts and Retina displays and Apple Watch WebKit and whatnot. Who knows, maybe one day I’ll actually use Hecate to reverse-engineer another flag field in that binary file, and proceed to take complete credit for it on my blog.

[ Update, 5/7/2015: the source code for Hecate is now available on GitHub. ]

You're reading, a random collection of math, tech, and musings. If you liked this you might also enjoy my pieces on:

Are you new to data analysis? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!

Data visualization and predictive modeling

Back to Evan Miller's home pageFollow on TwitterSubscribe to RSS