A Review of Perl 6

By Evan Miller

August 13, 2017


“Man is amazing, but he is not a masterpiece,” he said, keeping his eyes fixed on the glass case. “Perhaps the artist was a little mad…”

—Joseph Conrad, Lord Jim


Perl 6 reminds me of my uncle’s Ph.D. dissertation. For the first five years, we would ask (around Christmas) when he’d be finished; putting less and less stock in his answers as the years wore on, and lacking any evidence that the dissertation, in any form, actually existed, we made the completion date a kind of running joke, a familial version of Boston’s Big Dig or New York’s Second Avenue subway line. But after enough self-imposed deadlines came and went, we stopped joking about it. After enough years, we stopped asking.

In case you missed it, and I think everyone did, Perl 6 was released to the world a year and a half ago — on Christmas of 2015, incidentally — after an effort nominally spanning almost sixteen years. (I think, but am not certain, my uncle’s dissertation was also completed at some point.) The market for new programming languages, as I write this in 2017, is competitive but not impenetrable — but like a freshly minted Ph.D., no one seems to be sure of Perl 6’s market prospects, and like a just-turned-in dissertation, no one seems to know whether the fruit of many years’ labor is actually worth a damn.

It doesn’t help that the purveyors of Perl 6 provide few hints as to what you should actually do with the language, besides the facile answer of whatever you want. Perl 6 is multi-paradigm, maybe omni-paradigm; it claims to support object-oriented programming, functional programming, aspect-oriented programming, array programming, and (good old) procedural programming. It’s a new language, and not just a cleaned-up version of Perl 5, any more than English is German minus the umlauts. Knowledge of previous versions, alas, won’t get you very far. By the same token, prejudices regarding the preceding incarnations don’t necessarily hold today’s water.

What follows is my attempt to provide the world with an honest if incomplete assessment of the programming world’s whitest of elephants, Perl 6. I was motivated by the following thought: every sysadmin knows at least a little Perl, but I do not know anyone who knows any Perl 6, nor do I know anyone who knows anyone who knows any Perl 6. I’ve never read anything about Perl 6 that was longer than a punchline. I could equally imagine a world where Perl 6 is a heap of rubbish, and one where it’s the Hope Diamond of language design. No one seems to know; no one would know. Perl 6 is a register that has been written to, still waiting to be read.

After a number of false starts (by me) over the years, I’ve finally managed to spend enough time with the language to say, with confidence, that I’m familiar with it. I’ve read a cornucopia of blog posts, devoured documentation, and written a small (500-line) Perl 6 library with cursory test coverage. I still have no idea what it’s like to use Perl 6 for a year, or even a proper work week, so this review will be less of a definitive testimonial, and more like one of those mom-tries-Linux reaction videos. It’s likely that I’m missing some things.

Perl 6 lacks a strong “hook”, and it’s also kind of slow. It’s therefore difficult, but not impossible, to pique the reader’s interest with a hold-my-beer demonstration. (If you’d like one, see Why I’m Learning Perl 6.) Instead, I’ll start with small examples.

Numbers

Perl 6 is easy enough to install from the Perl 6 website; unlike Perl 5, it ships with a passable shell:

$ perl6
To exit type 'exit' or '^D'
>

If you’re interested in some version numbers, which if any Perl 6 people are reading this, should probably be displayed in the shell greeting, just run perl6 --version:

$ perl6 --version
This is Rakudo version 2017.04.3 built on MoarVM version 2017.04-53-g66c6dda
implementing Perl 6.c.

Perl 6 really consists of a language specification (Perl 6.c; the c is for Christmas), a compiler (called Rakudo), and a virtual machine (called MoarVM). There’s a reason for the multiplicity of monikers: the Perl 6 folks believe no implementation ought to be privileged over the others.1

Tutorials and teasers for Perl 6 often begin with what, at first glance, looks like the world’s easiest math problem:

> 0.1 + 0.2 == 0.3
True

Most languages evaluate this expression as false, and the tutorials go on to explain the difference between floating-point and exact arithmetic. (Perl 6 treats number literals with decimals after the dot as rationals — that is, as a quotient of two integers — rather than as inexact floating-point numbers.)

Perl 6 is object-oriented, and you can see the class of any object with the .WHAT method:

> (10).WHAT
(Int)
> (0.1).WHAT
(Rat)
> (1e-1).WHAT
(Num)

Int is an integer, Rat is a rational (exact fraction), and Num is a double-precision float. If you want the “traditional” floating-point behavior, just make a Num out of each number before adding them:

> Num(0.1) + Num(0.2) == Num(0.3)
False

Scientific notation (such as 1e-1 for 1×10-1) always produces a floating-point Num in Perl 6. For reasons I don’t understand, however, they sometimes seem to add up as rationals, rather than floating-point:

> 0.1e0 + 0.2e0 == 0.3e0
False
> 1e-1 + 2e-1 == 3e-1
True

Maybe someone can explain the discrepancy to me in a polite email. Anyway, I wholeheartedly endorse Perl 6’s use of rational numbers for exact arithmetic, and I think more languages could benefit from a similar implementation. For reference, Julia and Haskell have built-in rationals, constructed (respectively) with the // and % operators; most other languages have them as a library, if at all, without support for literals.

The overall math support in Perl 6 is quite good, with complex numbers and popular constants built right into the language. Here is Euler’s identity, correct to 15 decimal places at any rate:

> e**(pi*i)
-1+1.22464679914735e-16i

Speaking of which, if you’re frequently dealing with floating-point numbers, Perl 6 has a handy approximately-equal operator =~=, which measures things within 10-15:

> e**(pi*i) + 1 =~= 0
True

Perl 6 has Unicode everywhere, so if you wish you can also use π in place of pi (and, cutely, superscripts to exponentiate to an integer). However, the Perl 6 shell doesn’t seem to be completely multibyte-aware, so backspace through Unicode expressions at your own peril. The non-interactive compiler works with UTF-8 just fine.

Be aware that the choice of numeric types can have a significant impact on performance. For example, here’s a small script to compute the 999,999th term of a harmonic series:

my $num = 1;
my $total = 0;

while $num < 1_000_000 {
    $total += 1/$num;
    $num++;
}

say $total;

That takes about 4.4 seconds on my machine. But change 1/$num to 1e0/$num and it takes only 1.75 seconds — in the first case, Perl 6 constructs a rational to represent 1/$num, but goes straight to the floating-point representation with 1e0/$num.

Another oddity crops up if you inspect $total closely. It begins life as an Int, becomes a Rat in the first loop iteration, and is converted to a Num around the 46th iteration. There’s some subtle type trickery going on behind the scenes, and one senses the benign presence of Larry Wall, whose skulduggery with strings and numbers in the original version of Perl made it such a hit with programmers weary of scanf and sprintf.

In my small test here, the “fast” version of the Perl 6 script is still about four times slower than Perl Classic, and 500 times slower (!) than C. The Perl 6 people often point to the speed improvements they’ve made since the initial Christmas release, so perhaps the speed situation will improve with time. Overall, when it comes to numerical computing, I guess you could say that while Perl 6 is perfectly capable of heavy lifting, the lifting crew isn’t in any apparent hurry.

If you were beginning to worry that there was only way to compute a harmonic series in Perl 6, you can compute the same thing with this hieroglyphic one-liner:

[+] (1..^1e6).map: 1/*

I won’t trace the mechanics of that 22-byte code snippet here, but I will take this moment to note that among his other accomplishments, Larry Wall, the creator of Perl and designer of Perl 6, is a two-time winner of the Obfuscated C Contest.

One final observation about Perl 6 and math: although Perl 6 has all the usual functions from math.h, it could certainly use a few more.

Strings and Regexes

Perl 6’s string support, and Unicode support in particular, is the best in the business. Unlike Go or JavaScript, Perl 6 is aware of graphemes and combining codepoints, and unlike Python, Swift, or Elixir, Perl 6 can can access arbitrary graphemes by position in constant time, rather than iterating over every grapheme to access a particular one.2 Unicode identifiers are allowed, and so are custom Unicode operators. Perl 6 has a number of built-in Unicode operators (with ASCII fallbacks) for things like set operations and equality tests.

Remember Euler’s identity, with the approximation test? You can write it like this, if you like:

> e**(π×i) + 1 ≅ 0
True

Perl 6 allows any form of quotation construct recognized by the Unicode standard, so the following are equivalent:

「Man is amazing, but he is not a masterpiece」
“Man is amazing, but he is not a masterpiece”
"Man is amazing, but he is not a masterpiece"

The Perl 6 philosophy, as expressed by Larry Wall in interviews, is that Unicode symbols may be difficult to type now, but technology will improve and in the future you’ll be happy to have expressive Unicode defining your source code. I’m personally satisfied with the ASCII variants — in Perl 6 parlance, they’re called Texas symbols (because everything’s bigger, get it?) — but maybe some people will find Unicode easier on the eyes. It’s worth noting that there are rumors of e-Ink keyboards in upcoming Apple hardware, so perhaps Larry Wall is ahead of this particular curve.

Back in late 1980s, Perl introduced the programming world to regular expressions, or at least the non-stack-based kind that people might actually want to use to capture and print stuff. Perl 6 has Perl’s regexes with a few changes: extending the Unicode support, moving the modifiers around, and reassigning some symbols, and also changing the whitespace rules to be more legible. They feel to me more modern, with more room to breathe. I’ll give a few examples.

Character classes use <[]> instead of [], and ranges use .. instead of -. So:

[a-zA-Z0-9]

is now:

<[ a..z A..Z 0..9 ]>

Capturing groups still use parentheses, but non-capturing groups now use plain square brackets. Spaces must now be made explicit. So this:

(Male|Female) (?:Cat|Dog)

is now:

(Male || Female) ' ' [Cat || Dog]

Modifiers (i for case-insenstive is the most important one) can appear inside a regex when preceded by a colon. For example, this Perl 5 regex:

(Male|Female) (?:[Cc][Aa][Tt]|[Dd][Oo][Gg])

would be translated to Perl 6 as:

(Male || Female) ' ' [:i cat || dog]

\d now matches Unicode digits such as ᭕ (a Balinese five) and ๓ (a Thai three). \w matches digits and anything Unicode considers to be a letter, including Ж and Θ. \h matches any Unicode-defined horizontal whitespace character; if you want to match only the space on grandpa’s ASCII typewriter, you have your choice of \c[SPACE], \x[20], or ' '.

Sets of characters can be identified by traditional Perl groupings such as alnum or xdigit, or using Unicode General Categories such as Lu (uppercase letter) or Sm (math symbol):

<alnum> # traditional character class name, now in angle brackets
<:Lu> # Unicode General Category, note the colon

Additional sets can be computed by defining ranges or with set operators, union, intersection, difference, and so on.

<alnum - [0..9]> # Alphanumerics, excluding 0 through 9
<alnum + :Sm> # Alphanumerics, plus Unicode-defined math symbols
<alpha & xdigit> # Alphabetical hexadecimal digits; same as <[ a..f A..F ]>

If I dealt with Unicode text files on a regular basis, I’d likely feel ☺︎ with these features.

Grammars

The Perl 6 feature I was most excited to read about — in fact the initial reason I was drawn to Perl 6, aside from morbid curiosity — is the inclusion of grammars in the language. A good way to ease into grammars is to refactor a regular expression, for example like this:

my regex sex { "Male" || "Female" };
my regex species { :i "cat" || "dog" };

if "Male CAT" ~~ /<sex> \s <species>/ {
    say "Sex: " ~ $/<sex>;
    say "Species: " ~ $/<species>.lc;
}

(Tilde is Perl 6’s string concatenation operator; double-tilde is “smart match”, which in this context applies a regular expression.)

Next you can group regexes into a grammar block, providing a special regex named TOP, indicating where the parsing should begin:

grammar Animal {
    regex sex { "Male" || "Female" }
    regex species { :i "cat" || "dog" }

    regex TOP {
        <sex> \s <species>
    }
}

if my $result = Animal.parse("Male CAT") {
    say "Sex: " ~ $result<sex>;
    say "Species: " ~ $result<species>;
}

Because named regexes (or rules) can refer to each other, and to themselves, the Perl 6 grammar engine can be used to build recursive-descent parsers. I use Ragel for a number of parsing tasks, so I liked the idea of having a language that has something Ragel-ish built in — but I’m afraid I won’t be abandoning Ragel anytime soon, and won’t be migrating to Perl 6, at least not on account of its grammar engine.

The problem with Perl 6’s grammar engine — and perhaps this shortcoming could be addressed in a future version of the language — is that if parsing fails for any reason, the parse method simply returns Nil. There is no indication of any kind about where the failure occurred; so for any kind of non-trivial input, grammars are a real pain to debug. A single misplaced character kills the whole parse, and the coroner’s report is blank.

The Perl 6 compiler is written using the Perl 6 grammar engine, so I was wondering how the Perl 6 parser produces its line numbers and error messages. I started poking around the language’s grammar file, and like a kid rifling through his parents’ bedside table, I regret that I ever had the idea.

It’s bad enough that the Perl 6 grammar file is a foam party of intermingled grammar, code, and error messages, but here’s the cherry on top: most rules in the Perl 6 grammar file have an extra condition that, instead of matching something, throws an exception, and the exception handler picks apart the internal state of the parser to figure out how much input was processed before the exception was thrown. It works, sort of, but it’s a kludgy technique to replicate if all you want are some line numbers. Traditional parser-generators do a much better job handling non-matching input.

This shortcoming explains, in part, why Perl 6’s error messages are sometimes useless if you have a syntax error in your source code. Many of Perl 6’s error messages are helpful and good, but for example, if you forget to close a quote, Perl 6 will tell you that there’s an error in the last line of the file — when of course the error is more likely near the opening quote, which could be anywhere in the file. Good luck finding it, as Perl 6’s exception-throwing parser provides nary a clue.

As a long-time Ragel addict, I really wanted to like Perl 6’s grammar engine, but the lack of decent failure handling means I can’t endorse it except as a refactorable regex machine. Perl 6 could have been an interesting platform for prototyping new programming languages, for example, but as soon as you take a look at the Perl 6 grammar file, you’ll want to close the drawer and go back to a more innocent time.

A Note on Identifiers

Let’s get something out of the way: Perl 6 identifiers can contain dashes.

$why-hello-there = "Why, hello there!";

Perl 6ers call this “kebab case”; I’m told on good authority that Lispers use the same case convention, but they never really had a meat-related name for it.

Kebab case is the preferred case convention in Perl 6’s standard library, including for function and variable names (e.g. is-prime($a-big-number)). Because variables in Perl 6 almost always start with a sigil (dollar sign, at sign, or percent sign), the convention usually doesn't create any ambiguity with subtraction operations. It’s therefore not a terrible choice for Perl 6, and is at least consistent with Larry Wall’s bet on Unicode, and his bold vision that one day we will be able to type hyphens and minus signs into our computers as separate characters.

Most programmers that I show Perl 6 to instantly recoil at the kebab case, as if there were human flesh hanging off of it. I’ve personally grown to like it. It gives a screen full of Perl 6 code a distinct visual identity, and even if you dislike the aesthetic, you have to admit it’s easier on the Shift finger than either CamelCase or underscores.

Of course, any salutary effects on the health of your left pinky will likely be washed away by Perl 6’s $@%&! variable sigils.

Arrays and hashes

Perl 5 had arrays, indicated by @ and indexed by integer, and hashes, preceded by % and keyed by string; on these blocks, the church of Perl was built.

Perl 6 has the same two data structures, but with a number of modifications. A noticeable syntactic change is that $array[0] is now written @array[0], that is, value accesses use the data structure’s sigil (at-sign or percent sign), rather than the scalar sigil (dollar sign).

An important semantic difference is that arrays and hashes can be parameterized with a value type; for example,

my Int @array;
my Str %hash;

declares an array that stores only integers, and a hash that stores only strings. The hash keys are strings by default, so you don’t need to parameterize them.

Literal arrays are written with square brackets:

@array = [99, 100, 101];

Literal hashes can be written in a few different ways. The following are equivalent:

%( minutes => 3, seconds => 45 )
%( "minutes" => 3, "seconds" => 45 )
%( "minutes", 3, "seconds", 45 )
%( :minutes(3), :seconds(45) )
%( :3minutes, :45seconds )

The last form feels like a hack, but I guess it’s preferable to the pineapple upside-down ontology of Rails methods like 3.minutes or 45.seconds.

In constructing a hash, braces can be used in place of %(…), that is, unless the braces contain the default variable, $_, in which case Perl 6 gets confused and thinks a block is being declared. The documentation discourages braces for hashes for this reason — but then, braces are the default representation when Perl 6 prints a hash, so I’m getting mixed messages about what to use. In the tradition of Perl, there is indeed more than one way to do it, but in this case, I’m still looking for a satisfactory explanation as to why. It seems to me Perl 6 should have just endorsed one surrounding hash syntax, outlawed the other, and saved themselves countless StackOverflow questions tagged #newbie.

Hash members are accessed as either %hash<key> or %hash{"key"}. A cute feature is that hashes can look up arrays of keys, so %hash<key1 key2> returns the values for both "key1" and "key2".

If you’re not satisfied by string keys, it’s possible to declare a different type of key inside of braces:

my %hash{Int};

Or create a hash keyed by arbitrary objects with colon-prefixed braces:

my %hash := :{ 100 => "One hundred",
               200 => "Two hundred",
               $(pi) => "Three point one four one five nine" }

For the record, that’s the only literal syntax for creating object hashes; the weird $_ prohibition applies here, too. And note the $() syntax above, necessarily for preventing Perl 6 from interpreting pi as a string. For all the rhetoric about Perl 6 being more consistent than Perl 5, it sure has a lot of little rules.

Lists

Technically, Perl 5 had a third data structure, the list. Lists could be assigned to arrays or to hashes — list values were separated by commas and surrounded by parentheses — but were themselves immutable, and could not be explicitly bound to variables.

Lists are still present (and immutable) in Perl 6, but in a departure from the predecessor language, Perl 6 lets you bind lists to variables for later re-use, using the bind-to (:=) operator:

my @list := (1, "One", pi);

Assigning to an item in the list can result in an error:

@list[0] = 2
# Cannot modify an immutable Int (1)

Note that this error message indicates that you can’t modify the Int, not that you can’t modify the List. If the list contains a variable, you can modify the contents of the variable, without changing the actual list (which still contains the variable):

my $value = 10;
my @list2 := (1, 2, $value);

@list2[0] = pi; # Error: Cannot modify an immutable Int
@list2[1] = pi; # Error: Cannot modify an immutable Int
@list2[2] = pi; # Works

say @list2; # (1 2 3.14159265358979)
say $value; # 3.14159265358979

Weird, huh? In a sense, because lists themselves are assumed immutable, value assignments are just re-routed to a particular list member. In this way, list assignment follows somewhat naturally:

my $value1 = 1;
my $value2 = 2;

($value1, $value2) = ($value2, $value1);

Don’t get too excited though, as this assignment won’t behave as you’d expect:

my $value1 = 1;
my $value2 = 2;
my $value3 = 3;

(($value1, $value2), $value3) = ((100, 200), 300);

For reasons that aren’t clear to me, the lefthand side is flattened before assignment, and is equivalent to this:

($value1, $value2, $value3) = ((100, 200), 300);

Thus in both cases $value1 gets assigned (100, 200), $value2 gets 300, and $value3 gets nothing. There’s a separate colon prefix for doing a destructuring assignment (note that bind-to := is required in this usage):

:(($value1, $value2), $value3) := ((100, 200), 300);

In the above line, $value1 gets 100, $value2 is 200, and $value3 becomes 300. I don’t understand the logic of making a syntactic distinction between list assignment and full destructuring assignment — and silently flattening the lefthand list structure in the first case — but there you have it.

Lists can contain other lists, and also arrays and hashes. Arrays and hashes can contain arrays and hashes (this is still Perl), and can also contain lists.

The last thing I’ll note about lists is that they can be unrolled with the special pipe operator:

my @list := (2, 3, 4);

say (1,  @list, 5); # "(1 (2 3 4) 5)"
say (1, |@list, 5); # "(1 2 3 4 5)"

This comes in handy when calling functions. But we’ll talk about that after we talk about something that has been bothering me.

Itemization

If you’re not especially interested in the gory details of how lists differ from arrays, feel free to skip this section.

An array is a list, but items within an array (or hash) behave differently from items within a list:

my @array = (1, 2, 3);
my @list := (1, 2, 3);

say @array.WHAT; # "(Array)"
say @list.WHAT; # "(List)"

say @array[0].WHAT; # "(Int)"
say @list[0].WHAT;  # "(Int)"

say @array[0].VAR.WHAT; # "(Scalar)"
say @list[0].VAR.WHAT;  # "(Int)"

Arrays have an extra, hidden layer of indirection, which the documentation calls itemization. Unlike the list, the array is really full of transparent containers which point to the actual values and which can be accessed through the .VAR method. Assigning to an array member really modifies the container, whereas modifying a list member attempts to manipulate the item itself.

However, lists can also have this layer of indirection, on a per-member basis, using the special dollar function, $(…). You still can’t modify the list, but itemized members are protected from list-flattening operations.

Itemization is a confusing part of an already complex language — most of the time, methods pass straight through the container, but because itemized members of a list are not flattened by List.flat, when you return a nested list from a function, you are expected to be aware of which members are itemized, and which are not, based on what you expect the caller to expect when the caller calls List.flat on the returned value.

Got that?

The documentation provides five heuristics for deciding when items in a list returned from a function should be itemized; after dozens of readings, the heuristics are still mostly mumbo-jumbo to me, and so I think I’ll be sticking to simple lists and objects and arrays of objects.

Functions

Perl 6 might have my favorite function-dispatching mechanism of any language I’ve used; it’s certainly the most flexible.

Function arguments can be named or positional, but not both. Named arguments are preceded with a colon:

sub triangle-area(:$base, :$height) {
    return 0.5 * $base * $height;
}

triangle-area( height => 3, base => 4 ); # 6
triangle-area(4, 3); # error, "Too many positionals passed"

Signatures can destructure their arguments, a popular feature in functional languages:

sub first-item([$head, *@tail]) {
    return $head;
}

my @array = (99, 100, 101);
first-item(@array); # 99

If you miss the old Perl 5 convention of putting all of your parameters into an array, you can use what’s called a slurpy:

sub add-em-up(**@numbers) {
    return @numbers.sum;
}

add-em-up(1, 2, 3); # returns 6

The pipe operator, mentioned in the previous section, can also be used to pass a regular list as an argument list:

my @args := (3, 4, 5);
 
add-em-up(|@args); # equivalent to add-em-up(3, 4, 5)

Hashes destructure:

sub volume(%( :length($x), :width($y), :depth($z) )) {
    return $x * $y * $z;
}

my %dimensions = %(depth => 2, length => 10, width => 5);
volume(%dimensions); # 100

And can be unrolled to named parameters with the pipe operator:

sub triangle-area(:$base, :$height) {
    return 0.5 * $base * $height;
}

my %dims = %( base => 4, height => 5 );
triangle-area( %dims); # Error: Too many positionals passed
triangle-area(|%dims); # Works

Multiple-dispatch occurs not just on types, like Julia and C++, but also on values, as in Elixir and Haskell. There are often faster ways to compute a factorial, but:

multi sub factorial(0) { 1 } # "return" and semicolon are optional, btw
multi sub factorial(Int $n where * > 0) { $n * samewith($n-1) }

factorial(4);   # 24
factorial(4.0); # type error

Note the use of samewith in the second clause; this is a feature that I pined for in Erlang, a keyword to perform recursion so that you could rename a function without having to change all the recursive calls within it. It’s a welcome feature in Perl 6.

Candidate clauses are evaluated in order, but note that you might get unexpected results defining multiple-dispatch functions inside the Perl 6 shell. Each clause seems to only be aware of the clauses that preceded it, so for example switching the two clauses above results in a run-time error. But switching the order is fine if you’re using the regular compiler.3

In a couple of places thus far, I’ve used a “Whatever Star”, which creates a closure with itself as an argument:

my $add-two = * + 2;
say $add-two(pi); # 5.14159265358979

The Whatever Star is the skeleton key to a number strange-looking Perl 6 expressions, such as the following (which, incidentally, strengthen the case for Unicode operators):

(1, 2, 3).map: * * 2    # multiply by 2; first * is Whatever, second is multiply
(1, 2, 3).map: * ** 2   # raise Whatever to power of 2

I’ve avoided the Whatever Star because, in addition to making Perl 6 look like a lineal descendant of brainfuck, it is governed by rules that are too subtle for my understanding. All of the following work, and do the same thing (compute the square root of a list of numbers):

(1, 4, 9).map: *.sqrt
(1, 4, 9).map: { $_.sqrt }
(1, 4, 9).map: { sqrt($_) }

This doesn’t work:

(1, 4, 9).map: sqrt(*)

I’m sure there’s a valid explanation for this discrepancy somewhere on the Internet. In cases like these I like to say that the monads must be acting up again, and move on.

NativeCall

The most pleasant set of surprises for me with Perl 6 function-calling — in fact some of the more pleasant surprises in all of Perl 6 — is the nearly frictionless interfacing with C libraries.

To get a feel for the C interop, I decided to use Perl 6 to wrap up one of my favorite C libraries, libxlsxwriter. The C library, ironically, began life as a Perl 5 library, and was later ported to C, but, alas, never to Perl 6. As the name implies, libxlsxwriter can be used to generate XML-based Excel documents.

(Here’s the code I came up with, if you want to follow along at home: XLSX::Writer, a Perl 6 library for generating Excel spreadsheets; pardon the SEO keywords.)

The NativeCall module in Perl 6 unlocks all kinds of dark magic for calling C code. Declaring a Perl 6 class to be an opaque C pointer is simple:

class Writer::Workbook is repr('CPointer');

Declaring a C function that returns an opaque pointer object is a matter of:

sub workbook_new(Str is encoded('utf8'))
    returns Writer::Workbook is native("xlsxwriter") {...}

NativeCall automatically converts Perl 6 strings to C strings, and to the encoding of one’s choosing (via is encoded(…)). The ellipses between the braces above are literal ellipses, but a Whatever Star can be used in their place (I’m not sure why); anyway, the workbook_new routine is now ready to be called from any method. For example, if you make a new method in the Writer::Workbook class:

method new(Str $path) { workbook_new($path) }

Then a workbook can be opened in a script with:

use Writer::Workbook;

my $workbook = Writer::Workbook.new("/path/to/workbook.xlsx");

The wrapping process quickly becomes a kind of three-step waltz: declare the C function as a subroutine, make a method for a nice OO interface, and add a few lines to the test script. One-two-three, one-two-three, one-two-three…

If the C library requires knowledge of the C struct, mirroring the data layout is fairly simple. For example, this C struct:

typedef struct lxw_datetime {
    int year;
    int month;
    int day;
    int hour;
    int min;
    double sec;
} lxw_datetime;

Becomes:

class Writer::CDateTime is repr('CStruct') is export;
has int32 $.year;
has int32 $.month;
has int32 $.day;
has int32 $.hour;
has int32 $.min;
has num64 $.sec; # alignment is ok!

I couldn’t figure out whether there’s a platform-specific C int type in NativeCall (Rust and Julia provide this), but int is 32 bits on all modern computers, so I didn’t lose any sleep over it.

One particularly pleasant surprise was that NativeCall uses the correct alignment rules to pad out data structures; in the example above, the sec field begins at byte 24 in both the C and Perl 6 structures, even though only 20 bytes (five 4-byte integers) precede it. Bullet dodged. The C struct is then initialized as a regular Perl 6 object:

my $now = DateTime.now;
my $dt = Writer::CDateTime.new(:year($now.year), :month($now.month),
                               :day($now.day), :hour($now.hour),
                               :min($now.minute), :sec($now.second));

Here’s where things get interesting. NativeCall always passes pointers to structs, so you don’t have to muck around with pointer types in the Perl 6 code. Thus this C function:

void workbook_set_creation_datetime(lxw_workbook *workbook,
                                    lxw_datetime *datetime);

Would be declared nicely and neatly in Perl 6:

sub workbook_set_creation_datetime(Writer::Workbook, Writer::CDateTime)
    is native("xlsxwriter") {...}

And wrapped in a Writer::Workbook method thus:

method set-creation-datetime(Dateish $date) {
    my $dt = Writer::CDateTime.new(:year($date.year), :month($date.month),
                                   :day($date.day), :hour($date.hour),
                                   :min($date.minute), :sec($date.second));
    workbook_set_creation_datetime(self, $dt)
}

Then called from a script:

my $wb = Writer::Workbook.new("/path/to/workbook.xlsx");
$wb.set-creation-datetime(DateTime.now);

Most of the type-mappings between C and Perl 6 are automatic and straightforward; the only exception that I encountered was the num64 type, which requires a Num (floating-point) type, and won’t accept a generic Numeric (Num, Int, or Rat). But the .Num conversion method is an easy workaround.

I’ll mention one other language feature that makes wrapping a C library pleasant, Perl 6’s enum construct:

enum LibraryError <LXW_NO_ERROR
               LXW_ERROR_MEMORY_MALLOC_FAILED
               LXW_ERROR_CREATING_XLSX_FILE
               LXW_ERROR_CREATING_TMPFILE
               # many more
               >;

Each value of the enum is really a key-value pair, beginning with the value zero. A nicety is that if a C function returns an integer, the Perl 6 side can easily access the symbolic representation:

sub some_c_function() returns int32 is native("something") {...}

my Int $code = some_c_function();
say LibraryError($code) # LXW_NO_ERROR, etc

Enums can start with a value other than zero:

enum Script <<:superscript(1) subscript>>;

Or have specified values:

enum Color ( black => 0x1000000, blue => 0x0000FF,
             brown => 0x800000, cyan => 0x00FFFF);

I hesitate to suggest expanding the language any further — even a tiny suggestion at this point may be like the mortal mint that did in Mr. Creosote in The Meaning of Life — but one feature Perl 6 might borrow from Go is incrementing the enum value via bit-shift, i.e. Go’s too-clever-by-half 1 << iota construct.

My only real qualm with Perl 6’s enum feature is that it appears enum symbols must be unique within the declaring scope, or else you get warnings and a strange error. I’d prefer the symbols not to be exported into the scope, and then access the symbols via the enum name (LibraryError::none, Color::black, etc.). Right now, if the Color and LibraryError enums are declared in the same scope, then Color::none and LibraryError::none would create a conflict. So you have to either use C-style globally unique names (LIBRARY_ERROR_NONE, LIBRARY_COLOR_NONE, …), which doesn’t feel very Perlish to me, or else put each enum in its own file and then import it, which is pretty inconvenient.

Nonetheless, wrapping up a C library in Perl 6 is a painless experience, even an addictive one. You can even pass a Perl 6 function or closure as a C callback, using the native types (int32 and all that) directly in your Perl 6 code. However, the usefulness of this feature is mitigated somewhat because you can’t get a Perl 6 object back from a void * context pointer, which is how many C APIs are designed. Another limitation is that NativeCall does not support passing or returning C structs by value; of course, there’s a hack for that.

Concurrency

Perl 6 has a number of abstractions for dealing with concurrency — promises, channels, and something called supplies — but I think it’s more helpful to start with a picture of the Perl 6 execution model so you can see what these abstractions are really doing and what their limitations are.

Perl 6 runs on a virtual machine, MoarVM, which has a thread scheduler, a garbage collector, and an event loop. Unlike the C implementations of Python and Ruby, MoarVM has no global interpreter lock, so Perl 6 code can always execute concurrently inside a single process. As of this writing, Perl 6 threads map directly to OS threads, but this behavior will change in the upcoming “Diwali” (Perl 6.d) release, which will support a degree of M:N thread multiplexing. Perl 6 threads have their own storage, but are able to access each other’s objects, so some care must be taken to avoid race conditions. A lock mechanism is provided, but is discouraged in favor of the language’s message-passing abstractions.

There’s a fixed upper limit on the number of OS threads that MoarVM will use; this number defaults to 16, but can be changed with the environmental variable RAKUDO_MAX_THREADS. If your code attempts to start more than that number background tasks using threads, execution may deadlock.

The garbage collector is generational, and halts all execution across all threads while it runs (i.e. it is stop-the-world). The garbage collector itself is multi-threaded. It will “run till done”, so it doesn’t have the same time guarantees as, for example, Go’s garbage collector.

To handle asynchronous I/O, MoarVM uses libuv, the “secret sauce” in Node.js; I haven’t tried it, but juggling thousands of concurrent network connections in a single-threaded Perl 6 server shouldn’t present any insurmountable challenges.

With all of that out of the way, let’s talk about Perl 6’s core concurrency abstractions: promises, channels, and supplies.

The start keyword kicks off code on another thread, assuming a thread is available; it’s a shortcut to Promise.start, and should be met with an await:

my $task = start { 
    say "Hello from {$*THREAD.id}";
}

say "Ahoy from {$*THREAD.id}";
await $task;

If you run the above code, you’ll see that the greetings are initiated from different threads, and that the messages may appear in either order. Note that await returns from the current thread in Perl 6.c, but might return from a different thread in future versions.

To see the thread pool in action, try starting more than 16 tasks:

my @p = (1..32).map: { start { say "Sleeping on {$*THREAD.id}"; sleep 5; } };
await @p;

You should see the resulting messages in two batches; sixteen messages immediately, using every thread in the thread pool, followed by sixteen more messages (after 5 seconds), re-using those same threads. You can imagine how code might unexpectedly deadlock if some code is waiting for a particular thread to start, but there are no available threads in the thread pool. Future versions of Perl 6 are supposed to alleviate some of these thread-exhaustion problems by moving Perl 6 threads across OS threads if the Perl 6 thread is waiting on a message from a channel; this will bring the implementation closer to the concurrency model of Erlang, Go, or .NET.

Channels, since we’re talking about them, exist for thread communication in Perl 6. The language includes the react/whenever keywords for receiving messages on a channel.

my $channel = Channel.new;

my $p = start {
    react {
        whenever $channel { say "Hello from {$*THREAD.id}! You said \"$_\""; }
    }
};

my $message = "Ahoy";
say "This is {$*THREAD.id}. Let's send \"$message\" to our buddy (twice).";

$channel.send($message);
$channel.send($message);
$channel.close; # mandatory

await $p;

react is a blocking construct, and only returns when every channel that it’s listening to has been closed, or when done is explicitly called from within react. In the above example, failure to close the channel (by commenting out the line marked # mandatory) will result in the start block never returning, and so the modified program will hang on the call to await.

Channels in Perl 6 are similar to Go’s channels, or message-passing in Erlang, with important differences. Go’s channels can be typed, and Erlang’s messages can be pattern-matched; listeners on Perl 6 channels just have to accept whatever is sent. Go and Erlang both have a timeout mechanism on the receiving end; Perl 6 does not. (I imagine as the Perl 6 community gains experience with concurrent and distributed systems, they’ll demand similar features in their language.)

The final bit of abstraction for handling concurrency in Perl 6 is called the supply. A Supplier object emits (via the emit method) a sequence of values to its Supply object; the Supply can have one or more listeners, called taps. Taps are not objects, but rather callback functions that are executed once for each supplied value.

By default, the taps are called on the thread that calls Supplier.emit. For example:

my $supplier = Supplier.new;
my $supply = $supplier.Supply; 

$supply.tap({ say "Tap #1 got \"$_\" on {$*THREAD.id}" });
$supply.tap({ say "Tap #2 got \"$_\" on {$*THREAD.id}" });

say "Sending values from {$*THREAD.id}...";
$supplier.emit("Hello");

If you run that code, you’ll see all the action happens on a single thread. Supplies don’t introduce concurrency; they’re more of a curtain for hiding it. For example, a supplier that emits values as it receives messages from a channel (say, inside a react block) would ensure that application logic occurred in a predictable order and on a predictable thread.

One of the more interesting applications for supplies is receiving data from external programs. In Perl 5, receiving data from both the standard out stream and standard error stream is a headache; in Perl 6, they each get their own supply:

my $proc = Proc::Async.new(:r, 'echo',
    'Man is amazing, but he is not a masterpiece');

$proc.stdout.tap({ say "OUT: $_"; });
$proc.stderr.tap({ say "ERR: $_"; });

await $proc.start;

Incidentally, the Proc::Async class also has write and close-stdin methods to facilitate two-way interactions with external programs.4

UNIX signals can be accessed from a supply, too:

signal(SIGINT).tap( { say "Perhaps the artist was a little mad..."; exit 0 } )

Note in both this example and the previous one, because emit is called from behind the proscenium, these particular callbacks will execute on a different thread from the main program.

Concurrency isn’t easy in any language, but Perl 6 seems to have a decent set of abstractions for writing complicated programs without introducing race conditions. However, the threading model of these abstractions is not obvious at first glance, and seems to be a common source of confusion on StackOverflow. I also didn’t see much guidance for exception-handling in concurrent code, which is a common source of bugs and deadlocks outside of the shared-nothing Erlang diaspora.

Finally, I’m not totally sold on the rationale for supplies; their banner use-cases (signal handling and stdout/stderr processing) don’t really require sequential execution of different callbacks, and because the emit happens deep in the standard library, the callbacks end up executing on a Mystery Thread as far as the main application is concerned.5

I suspect these Supply use-cases would be better served with channels capable of broadcasting messages to multiple subscribers. For example, the following hypothetical API is more verbose, but would give more clarity about which code is executing on which thread:

my $proc = Proc::Async.new(:r, 'echo',
    'Man is amazing, but he is not a masterpiece');

# Get channels for the external process's output
my $out = $proc.stdout-channel;
my $err = $proc.stderr-channel;

# Start a worker thread that listens on the channels
my $worker = start {
    react {
        whenever $out { say "OUT: $_"; } # Print stdout on the worker
        whenever $err { say "ERR: $_"; } # And also stderr on the worker
    }
};

# Run the external program
await $proc.start;

# Wait for the worker to finish
await $worker;

Right now channels pick a listener at random to deliver a message to, so even if the data streams were delivered over channels, as in the fake API above, you wouldn’t be able to plug in multiple receivers. The current 1:1 channel design fits well with worker-thread designs, but not so well for event listeners.

So What?

There’s more, as you might guess, to Perl 6 — I’ve mostly left out its object model, class hierarchy, array-programming features, and control-flow constructs, but I think I’ve covered enough territory to offer up a general perspective on the language and run-time. To put the burning question bluntly, what is Perl 6 good for?

Perl 6’s math support is excellent — complex and rational numbers are handled seamlessly — but I’m afraid Perl 6 is just too slow for serious scientific number-crunching. To keep up with the Joneses it would need either a major investment in native libraries, like Python’s Numpy, or a built-in JIT compiler like Julia’s. Supposedly there’s a JIT in the works, but it appears to have been in the works for a number of years, so I’m not holding my breath.

Perl 6’s string handling, in particular the Unicode support, is unparalleled. I like the new regex syntax. However, the grammar engine is a bit of a disappointment, and despite my initial enthusiasm, it won’t be replacing parser-generators, at least not in my toolkit. Compiling Perl 6 code and processing text files from disk are relatively slow operations, and neither is helped by the non-instantaneous VM start time. Perl 6 will likely have a hard time replacing Perl 5 in the world’s UNIX pipelines, at least in its present form.

Perl 6 is better at concurrency than Ruby and Python. But for high-throughput, low-latency applications, the current release is significantly behind Go, Erlang, or .NET. It will start to catch up with the 6.d “Diwali” release, which introduces M:N thread multiplexing, but the VM will still need some serious battle-testing and battle-tuning before it can claim to compete with the other platforms.

Testing is easy — just put test scripts in the t/ directory and follow the Test module docs — but I’m still not sure if there’s a preferred method for writing function and module documentation. People seem to just put everything in README.md, and I didn’t see anything to extract an API from source code in the manner of pydoc or javadoc.

In light of the foregoing, I have a hard time justifying Perl 6 in its current incarnation for “real work”. Then again, I never thought Ruby made any sense as a server technology, so anything can happen, I guess.

I like writing Perl 6 programs. The generics, type-checking, and OO model inspire more confidence than than the Little Rascals Clubhouse attitude of Perl 5, while stopping considerably short of the slavery-is-freedom mentality of Rust or Haskell. Perl 6’s functional features let me distill logic to its essentials, which I like. It’s easy to write Perl 6 code that is difficult to read, but I personally prefer to be treated like a responsible adult rather than, for instance, a teenager trapped in the father-knows-best house of Go.

Still, the language remains distressingly permissive; out-of-bounds array accesses just return Nil, and these two lines of code:

($one, $two, $three) = (1, 2);
($one, $two, $three) = (1, 2, 3, 4);

each compile and run just fine. I would be much more comfortable with a compiler that voiced at least mild concern in these cases, since there’s likely a bug in the code.

Most things in Perl 6 are well-named, but the reliance on sigils and such makes many operators — and there are many, many operators in Perl 6 — hard to Google. Sometimes I feel like I’m trying to look up a swear word from a Beetle Bailey comic strip. It’s a language where a single typed character can convey a world of significance; I personally enjoy the language’s concise aesthetic, but I also enjoy algebra problems more than most people who consider themselves to be psychologically fit. Programmers who prefer things to be spelled out with modifier keywords might be put off by Perl 6’s load-bearing asterisks and pipe symbols and so forth.

The Perl 6 language is large, and has a number of trap doors. Learning can at times feel like taking a tour of Willy Wonka’s Chocolate Factory, where every room seems to have new treats with strange effects. But the factory floor-plan, if viewed at sufficient distance, is basically coherent, and even predictable: I found myself on more than a few occasions correctly guessing syntactic constructs and how to do this-or-that. I’m particularly fond of Perl 6’s function dispatching, which feels like Erlang with the addition of named parameters, varargs, and built-in type checking. I’ll probably start using Perl 6 for slice-and-dice system scripts, the same way I use Ragel for fleshing out grammars or Julia for exploring math problems.

So what is Perl 6 best for?

Perhaps it’s not the most exciting to everyone else, but the most immediate application in my mind is gluing together C libraries and writing command-line wrappers for C libraries. For the ReadStat library that I maintain, I’d be more than happy to replace the Hieronymous Bosch hellscape in the bin/ directory with nice clean native-interfacing Perl 6 scripts. (If the dream is to be realized, I’ll need the aforementioned improvements to NativeCall, passing structs by value and being able to convert context pointers to Perl 6 objects.)

This will sound stupid, but if full-screen terminal programs become popular again, Perl 6’s concurrency and strong systems support would make it an excellent choice. I would port Hecate: The Hex Editor From Hell in a heartbeat, and I’d love to see Perl 6 mail clients, text editors, and so on. (The Terminally Insane Programming movement is sure to catch fire any day now.)

I didn’t mention this before, but Perl 6 is exceedingly clever when it comes to making command-line tools. If this is your file:

# script.pl6
sub MAIN(Str $input-file, Int :$lines = 10) {
    say "blah";
}

Running the script without any arguments gives:

$ perl6 script.pl6
Usage:
  script.pl6 --lines=<Int> <input-file>

Made you smile, right? (Perl 6: The Ruby on Rails of command-line tools.)

There are a number of unexpected surprises in Perl 6; some, like these command-line cushions, tickled my fancy, but others, like the rules about $_ inside hash braces or anything having to do with itemization, leave me dumbfounded.

A few notes about creature comforts in Perl 6. The compiler’s error messages are usually adequate, but sometimes they’re incorrect, or irrelevant. The REPL is useful, but needs better support for multiline copy-paste and backspacing through multibyte characters. The default Vim highlighter is slow, and the vim-perl6 package has an annoying highlighting bug that’s been sitting open on GitHub for almost a year. Travis CI supports Perl 6, but it will download and compile the language each time, which adds several minutes to each test run. It’s not the Wild West, exactly, but it does feels like a mom-and-pop hotel where you have to jiggle the toilet handle, and maybe smack the side of the TV screen until the picture clears up.

If It’s So Great, How Come I Never Heard Of It?

Languages are inseparable from their communities, so some parting words about that. I found myself, in late-night moments of bewilderment, on the #perl6 IRC channel, asking for help with the finer points and darker corners of the language. The responses were usually helpful, and always courteous. At least one of the causes of my confusion turned out to be a bug in the VM, which was quickly fixed.6

Most of the discussions on the IRC channel are technical, but if you lurk long enough it’s hard not to miss a current of frustration that Perl 6 isn’t more widely adopted. There’s a kind of righteous indignation that the language is very good — so dammit (I’m projecting), when will people start taking us seriously?

I’ve spent long enough in the software business to have more than a passing familiarity with that feeling. There’s nothing quite so demoralizing as spending a year or two on something — or fifteen years in the case of Perl 6 — and being unable to secure so much as five minutes of a potential user’s time.

The explanation, of course, is banal: if you’re a geek, writing correct computer code is a lot easier than reprogramming people’s perceptions.

I don’t feel qualified to offer up marketing suggestions to the Perl 6 people, but in honor of their butterfly mascot — a winsome sprite named Camelia that belies a sprawling netherworld of complexity — I’ll part ways by pointing them to the great butterfly collector of Western literature, the character Stein from Joseph Conrad’s Lord Jim.

Stein (right), occasional conscience of Jim (left), in the 1965 film adaptation.
(Columbia Pictures)

Stein, the story goes, spent his youth in a late 19th-century life of adventure, sailing the high seas, battling bandits, living as a native in the East Indies, and all that. The turning point for his character is a story where he is set on by seven would-be assassins, drops three of them with successive shots from his revolver, and proceeds to chase a butterfly through a field. He finally catches it with the help of his hat:

Flop! I got him! When I got up I shook like a leaf with excitement, and when I opened these beautiful wings and made sure what a rare and so extraordinary perfect specimen I had, my head went round and my legs became so weak with emotion that I had to sit on the ground. I had greatly desired to possess myself of a specimen of that species when collecting for the professor. I took long journeys and underwent great privations; I had dreamed of him in my sleep, and here suddenly I had him in my fingers—for myself!

Lord Jim, ch. 20

As I re-read that passage, I can’t help but think of Larry Wall, a man who certainly “took long journeys and underwent great privations” in pursuit of his dream, a redesigned Perl.7 I think of my dissertation-writing uncle, too, though I’m afraid to ask if he ever finished the thing. For his own part, after a certain age, Stein stopped pursuing wealth and fame and adventure in order to cultivate and cherish the iridescent beauty of his butterflies.

I wonder sometimes if the software profession could use a few more butterflies — not as inputs into commercial empires, but as specimens to study, appreciate, and admire in the lamp-lit silence of wood-shelved rooms. 8

Is Perl 6 a masterpiece? I’m not sure what instructions I’d give to the foreman of that particular jury, but Perl 6 is certainly alone in the audacity of its ambitions and sheer complexity of the language. It might be the sort of thing that’s impossible to judge without using it for at least a year, or half-decade. I’m afraid I’ll have to save my Final Judgment for a future article, but in the meantime I have enough reason to keep a stable version of Perl 6 rattling around the /usr/local curio cabinet.

Will Perl 6 conquer the world? These things are hard to predict, but in a certain sense — I’m speaking to frustrated members of the IRC channel now — it doesn’t matter. Perl 6 is one-of-a-kind; no one can argue with that. Maybe it’s enough, once you’ve captured something rare and beautiful in this uncertain world, to sit on the ground and hold it in your fingers a little while. “Only one specimen like this they have in your London,” said the Bavarian-born Stein to Lord Jim’s narrator, hand hovering over a glass-encased butterfly of bronze and white and yellow. “And then — no more. To my small native town this my collection I shall bequeath. Something of me. The best.”


Notes

  1. I think the rest of us can agree that these are rather uninspired names, and liable to cause confusion. The compiler should probably be called perl6c, the virtual machine should be PerlVM, and the language should be called something else. I’ve always thought Surf would be a great name for a programming language, and hereby willingly donate it to the Perl 6 project for the betterment of programmingkind.

  2. The constant-time algorithm comes with a cost, though. Graphemes are stored internally as 32-bit integers; this address space is large enough to contain all legal codepoint combinations, but string sizes in memory tend to be inflated by a factor of 2–4 compared to other languages.

  3. I’ll take this moment to note another bug with the Perl 6 shell, which is that it only seems to accept the first line of a copy-paste operation.

  4. Maybe not the most eye-popping features; but, for example, Erlang still can’t half-close a pipe.

  5. It’s possible to specify the thread by providing your own scheduler object to the Supply.schedule-on method.

  6. The only thing that could have made the experience better would be if the Perl 6 powers-that-be moved from their ancient Request Tracker, which uses a third-party authentication service run by a guy named Bjørn, over to GitHub Issues, like everybody else. (Bjørn, I salute you.)

  7. The Parable of the Pearl of Great Price, for which the original version of Perl was named, also comes to mind; I suspect for this reason that a Perl 6 name change will never receive BDFL approval, even if an unambiguously great name like Surf is proposed.

  8. A few more butterfly collectors, too, dispensing bits of wisdom at critical moments in our collective plot.


You’re reading evanmiller.org, a random collection of math, tech, and musings. If you liked this you might also enjoy:


Get new articles as they’re published, via Twitter or RSS.


Want to look for statistical patterns in your MySQL, PostgreSQL, or SQLite database? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!


Wizard
Statistics the Mac way

Back to Evan Miller’s home pageFollow on TwitterSubscribe to RSS