"Screw it, I'll make my own!" - The story of a new programming language

The story of a new programming language

As… very few people know, for the past couple years I have been designing a new programming language called Earl Grey. I did not design it as a test of my capabilities, or as an academic paper, or any of that. I made it so that I could use it, and use it I have, intensively, to the point that virtually all my programming projects at the moment involve my language. To this day I am not sure if it was a sane or wise thing to do, but I believe that it is a rather unique perspective to share – one of the few experiences I can think of that (curiously enough) is both empowering and isolating.

The motivation

If you ask me why I made a programming language, I could justify it in a lot of ways, point out its strengths, what I think it does better than the others, and so on. But I don't think that's really the driving force. As I see it, that driving force is, basically, a kind of conceit. A typical programmer will learn one or several well-established languages, depending on what they aim to achieve. They will adapt their way of thinking to fit these tools as best they can. But perhaps you don't want to adapt. You don't like any of the tools you can find because they are never exactly the way you want them, it's like they don't fit your brain at the moment. If you won't adapt to the language, the only alternative is to adapt the language to you. And if you are like me it becomes a bit of an obsession, an itch you just have to scratch.

Of course, this can be a good thing: your preferred tool may very well end up being preferred by many others. On the other hand, it could be mere idiosyncrasy: an idea that for you is game-changing, but that most people are indifferent to. And you know, when you think about it, that is the most likely outcome. Simply look at the humongous number of programming languages that exist, most of them half-done or abandoned, others excellent but largely ignored. Take languages like Oz or Icon which implement novel and powerful concepts that are probably better than any you've come up with, and yet, I imagine few people have even heard about them. In fact, most of the languages that are used in practice are basically cookie-cutter clones of each other, with some minor variations and a few ideas filched from decade-old research.

I don't mean to sound bitter about it… well of course I'm a little bitter about it. But it's normal. Learning a new language is a large commitment that few people are willing to make. That cost penalizes difference, as people seek tweaks on a familiar formula rather than completely new paradigms. Pragmatic issues such as libraries and tooling support also play a large role, since they are central to most people's workflows. You really have to fill a pressing need, just at the right moment, for a significant number people to invest themselves, but that's not necessarily your need you would be filling. In the end, the only sure way to gain a decent following for a language is to have support from a large corporation, but then again, that is true of almost anything.

I don't have such backing, of course, and I have no social networking skills either (sucks to be schizoid). But here's the thing. The more I thought about this the more I realized I didn't care. The itch was still there, it wouldn't go away. I'd been having ideas about programming languages for the better part of a decade and I was still mostly using Python, sometimes JavaScript. They felt crippling, not powerful enough. I dabbled in Scheme and Racket a bit too, but to be honest, I just don't like s-expressions. So after some experimentation I came up with something I called o-expressions, which (in my opinion) combine the advantages of conventional syntax and s-expressions. I had several other ideas, for instance about how powerful pattern matching facilities could be integrated to the language, about ad-hoc structures and exceptions, and so on.

But I felt I had waited long enough and I needed to do something about it. I needed to, so that instead of writing some Python script and thinking, I wish I could abstract this pattern with a macro or custom operator, or instead of writing something in Racket or Clojure and wishing for more syntax, I could use a language that I like and that I have control over. I needed to do it for myself, regardless of whether anyone else would want anything to do with it or not.

Compromise

Grand visions are always dangerous, because of how difficult they are to satisfy. Any programming language designer, by nature, is going to be much harder to satisfy than anyone else, because most of the time other people are already satisfied. And there are a lot of things I want from my ideal language, that roughly fall into three categories:

Ideas that are clear in my mind, like o-expressions, hygienic macros, pattern matching, ad-hoc structures, and so on.
Ideas that I think have potential, but that I haven't worked out the details: gradual typing, user-space program optimization, built-in, fine-grained serialization of data and code, including support for distributed computing, code signing, and so on.
Things that require thousands of man-hours, like a sizable package ecosystem, widespread editor/IDE support, and a thriving community.

The problem, of course, is that trying to satisfy that specification is basically insane. Just figuring out all the ideas in 2. would take years and may never reach a satisfactory conclusion, and during all that time I would have to use existing languages. This wasn't acceptable: it dawned on me that if I either had to have a perfect language, or nothing at all, then I would have nothing at all, plainly said. Then it becomes a matter of choosing a particular focus. That focus had to be the ideas I was mostly certain about.

Now, I needed a language that would be productive for me, a language to make applications in. If for some reason I needed to parse XML, serve static files over HTTP, and so on, it would be nice if I didn't have to sink hours implementing these things. In other words, there was no getting around the fact that my language had to be compatible with some existing runtime. That is why I chose to compile Earl Grey to JavaScript. I don't like JavaScript, but it is ubiquitous from running in every browser, and it has an immense ecosystem. I also decided not to change its semantics significantly: I had to ensure smooth interoperation, make sure that I could easily use any existing JavaScript package in Earl Grey, and that JavaScript users could easily import packages written in Earl Grey.

It wasn't easy to drop so many interesting ideas, but I didn't really have a choice if I wanted something to get done, and in the end, if I succeeded in this capacity, within the limited bounds I had set, it could only embolden my future projects. Once Earl Grey would be done, I could use it to make another language and perhaps take greater risks with it. Furthermore, I would certainly make mistakes, and the experience is often much more formative than brooding on theory. You don't really have much choice but to understand something, after you've messed it up.

Getting to work

I started by bootstrapping Earl Grey, meaning that I would write its compiler in itself. That has a few advantages: compilers are usually large and complex applications, so this is a good, thorough test of the language's ability. It also forces you to deal with the language you have created from the outset. You have to experience its flaws, which is good since you are in a great position to fix them as soon as they become evident. Plus, you will naturally end up adapting the language so that it is good at writing compilers, at which point it is practically going to write itself.

The first weeks of language development are blissful. New feature after new feature is implemented, each of them making the compiler simpler or more beautiful. If code seems forced or inelegant, that can be addressed with clever new features. New keywords and operators can be added or tweaked. I added significant indent at some point, cleaned out all braces, and it felt great. I added pattern matching, replaced countless if/then/else statements, and it felt great also. These are good times. You never have to fight against the language, because you have complete dominion over it, it cannot fight you, it has to bend to your will. Frustration turns to elation as you spin a Feature out of it.

But after a while, the language will turn into "good enough". It has to, in order to be worthwhile. That's when you start developing other projects using it. I did quite a few:

An IRC bot to play holdem poker, cards against humanity and some other games.
A little arcade game I call Glub and that I'll probably try to publish on mobile.
A REPL for Earl Grey that I could use to show it off online.
Macros for Express, React and other libraries.
A markup language called Quaint (that I made the documentation with).
About twenty plugins for that markup language.
An incremental build system, Engage.

All in all, these add up to about 30,000 lines of Earl Grey.

I started work on a web application, too. That particular project showed me just how soul-numbingly horrible JavaScript's callback hell was, and even though they were clearly better, the awkwardness of promise syntax. Earl Grey had no special support for asynchronous programming and that made it just as bad as JavaScript was. This prompted me to add generators and async/await keywords to Earl Grey. And so I did.

There's something liberating about this, you know: seeing that something is lacking, that some code is significantly more awkward than it should be… and then fixing it yourself. No need to wait for the EcmaScript standards to address the problem, or for PEP0492 to be accepted in mainstream Python (that is a done thing now), and so forth. You just… do it. Of course, it's no small time investment, but once it works it is quite rewarding.

But can you always do it, though? Can you always adapt?

Inertia

There is a bit of a catch-22 in language design where the more a language is used, the clearer it becomes which parts of it are problematic and should change, but the harder it gets to actually change them. To hone a language you must use it, but applications require that a language's features remain stable, robust, set in stone, and therefore as imperfect as they were at that moment. Furthermore, the more delays are suffered in fixing flaws or plugging holes, the more likely it gets that a dependency is fostered upon them. The options you do not explicitly and insistently leave open, tend to close and seal themselves before you know it.

Having very few users besides oneself helps, since you don't have to worry too much about breaking libraries and applications besides your own. For a long time, Earl Grey's sole and most important application was the compiler itself. But even then, came moments where I thought something ought to be changed, but doing so was nearly impossible.

I made several changes in Earl Grey that were awkward to implement and stopped short of implementing others because of how complicated it would have been for me:

I added a method keyword at some point, and of course that broke all the instances I had of using method as a variable. There were… quite a few.
At some point I wanted to experiment with changing the behavior of the ? operator (it performs runtime type checking, e.g. String? "hello" returns true), but I deemed that it would break too much of what I already had.
Same for the # operator, where e.g. #point{1, 2} is shorthand for {"point", 1, 2}. I wanted to change that, but the compiler was already committed to the equivalence.
For a long time I wanted to keep my options open as to whether accessing a field that didn't exist would raise an error or not. In JavaScript it doesn't, so I went with that default. Unfortunately, out of pure laziness, I came to rely on that behavior in a few cases. Then I forgot where I did that. Welp. Too late.

It feels strange that this would happen with only one user, but it makes sense. You can only ever change features that are never used, or used seldom enough that it is reasonable to change all existing occurrences. In some cases it may be nigh impossible to identify what has to be adapted, and then things must be set in stone.

I was aware that Earl Grey was necessarily going to be imperfect, but it was interesting to see that even features that I could easily have added in theory were becoming impossible. So it was "worse" than I thought. I imagine it must be possible to foresee or prevent these issues to some extent, but I expect it would be mostly through good static analysis, and dynamic languages are more vulnerable to this problem. I think Rust did it right, with a long period of language design and refinement guided by application, but the language's static nature and Mozilla's resources and support certainly helped.

About idiosyncrasy

Idiosyncrasy is that which is peculiar to you. Thoughts and ideas that you think are interesting, seductive, an improvement over the state of the art, but that other people don't perceive as such. If you get the chance to have feedback about something you made, and you take care to prod and listen, you will find out some interesting things.

I posted about Earl Grey on Hacker News a year ago or so to get some feedback. That stressed me out, but the feedback was mostly positive (yay!) Some people tried out the language and opened a few issues. By far the greatest amount of discussion, next to jokes about tea, was about one of Earl Grey's most insignificant features: variable names can contain dashes. For instance, the-value = 123 is equivalent to theValue = 123. Oh boy. Personally I like that a lot, and I am not alone in this. Dashes in identifiers just look good. Nonetheless it sparked a fierce bout of bikeshedding. I reckon that this is actually the essence of programming language debate: simple issues about which everyone can, and will have an opinion. Not that this is an original thought, it is an old wisdom. But now when I see bikeshedding elsewhere I have to wonder how the author feels about that ocean of pointless bickering, spreading around the footnotes of their work. Dry amusement, perhaps.

The thing with bikeshedding is, if you observe it objectively, it becomes clear that the main factor in anyone's preference is whatever they are used to, and that reasons for that preference are usually (although not always) reverse-engineered from there. I have experimented a lot, before Earl Grey. For instance, I have tried all combinations of brackets for function calls (f(x), f[x], f{x}), data structures, grouping and blocks. There is always a point where whatever I was currently doing felt more natural, nicer than the alternatives. And I am willing to bet that if one forced themselves to omit or use semicolons, use significant indent, use s-expressions, whereas they never did before, that they would (often, but not always) eventually end up preferring it to the alternative, regardless of what they did before or after the switch. Of course this is mostly true of cosmetic aspects, but even for important distinctions such as dynamic versus static typing, I am not under the impression that most people truly have a good grasp of the tradeoffs involved, and if they don't, then their preference isn't grounded any better than their preference for semicolons and whatnot.

One of the little cosmetic things in Earl Grey that feel natural to me is the function declaration syntax:

square(x) = x * x

For some reason I thought that this was clearly nicer than alternatives. Cleaner, more mathematical. I was a bit surprised to find out that most of the other people trying out my language preferred an alternative notation, using the anonymous function declaration syntax (argument -> expression):

square = x -> x * x

I thought it was interesting. And you know what else I find natural? Writing subtraction as x - y instead of x-y. The first version of Earl Grey did not allow dashes in variable names; but as it turned out, save for one or two occurrences, I always put spaces on each side of the subtraction operator. So it was quite natural for me to think that dashes in variable names would be nice, after all, how could it ever be confused with subtraction?

Perhaps those who questioned that choice had a different idea about what is natural to write. But they would probably get used to my way, just like I could get used to theirs.

Things I learned

There are many things I learned along this journey. Some of them I already suspected, others are somewhat obvious, but they are all worth listing.

The first thing is that programming languages are massive. Packages and libraries, editor and IDE support, tutorials and documentation, questions and answers on Google and StackOverflow, all these things accrete and gravitate around languages. And although the field of programming language design is concerned with a language's proper aspects, its syntax, semantics, performance profile, type system, and so on, the fact is that when comes the time to write complex applications, these aren't the things that matter most. If you want to create a language that you can use as quickly and productively as possible, you have to compromise along those lines: the most important being compatibility with as many existing libraries as possible, and at a minimum, with the editor/tools you plan to use.

The things you feel the most passionately about are often the least important. Well, that's not entirely fair – they can be pretty important. But not as much as you think. For example, I cared a lot about "purity" and making sure that every operator or syntactic construct has one and only one "meaning" (for instance, I believe that one shouldn't use braces to delimit both data structures and blocks, because these are two incompatible semantics). But I never think about "purity of design" when I actually code. If purity is compromised to make some common task easier to write, that will make me happier whenever I have to perform that task, but when will I ever think about how ugly the compromise is? Basically never, that's when.

It is never done. There are always moments when the code you are writing feels awkward or imperfect, as if the language struggled to express some concepts elegantly. You can never figure that out in advance, because you can't predict how all the features of your language will interact with each other. Some of these combinations will be unique, issues that can only happen in your language (which is good, since it means it isn't a bland copy of everything else). One has to resist the temptation fix every single issue and create bloat. My strategy at the moment is to hold on implementing a feature until I can recall a dozen times I wished for it.

It is isolating. I like to do things alone, so that doesn't bother me a lot. But it is still unsettling when you feel like you can't participate in discussions about programming languages without indulging into self-promotion, and when you can't really relate to most people's experiences using mainstream languages. There is also a shortage of toys to play with: I can use Emacs because I wrote earl-mode myself. But I don't have Vim syntax definitions, so I can't use that editor productively. I didn't have Atom definitions until someone helpfully wrote some. In any case it is clear that language limits one's choice of tools and the support they can get using them.

Writing Earl Grey was an enlightening experience, and it still is, since I am not fully done designing it, fixing bugs, improving it in any way I can. It has become stable, with some parts true to my original vision, others not. It has taught me a few things.

I must say I'm still itching to make a new programming language, or more than one. It is a different itch, though, because I feel I already have something nice going on, so I could write a language that's experimental or limited in scope, maybe one language for each idea I have, that would save the trouble of properly integrating them with each other before knowing if they are any good.

I would write them in Earl Grey, of course.

Olivier Breuleux January 29, 2016