(A cheesy homepage for Justin Collins)
Much nicer closure arguments in Brat

I am just so excited right now. I was watching Dave Thomas’ (not the Wendy’s guy) keynote talk from RubyConf 2008, in which he proposes several “forks” of Ruby. At about 40 minutes in, he discusses having a fork of Ruby that has real closures instead of blocks. I thought to myself, “Self, that sounds a lot like Brat.” Five seconds later, he mentioned the problem with passing in multiple closure literals to a function: the comma is ugly.

some_method { do_stuff }, { 1 + 2 }

I agree! I think this is an ugly issue in Brat right now.

But then he proposed an awesome solution: if two closures are next to each other in the argument list, you don’t need a comma! Now you can do this:

some_method { do_stuff } { 1 + 2 }

This took approximately 2 minutes to implement for Brat.

Now instead of

while { x < 1 },
    { 
        p x
        x = x + 1 
    }

You can do

while { x < 1 } { 
        p x
        x = x + 1 
    }

I think this is awesome and makes Brat a lot more attractive.

In fact, I went totally crazy so now you can do

x = 101
true? x > 100
      { p "> 100!" }
      { p "<= 100!" }

Just be careful when using bare variables:

true? a { p "truth!" } { p "lies!" }

This is parsed into:

true?(a({ p "truth!" }, { p "lies!" }))

This is so cool…now to update all of Brat’s docs with this syntax. :)


What I have enjoyed lately

Lately, there has been a particular activity which I consistently look forward to with excitement. That is writing libraries for my own programming language. Even if it is just wrapping existing libraries, there is something really cool about enabling a language to do “real world” type things, such as communicating with other processes or generating HTML or making a full-blown GUI.


Operator Precedence and the Shunting Yard Algorithm

Previously, I had pretty much avoided implementing any kind of operator precedence for Brat and covered up my reluctance by saying that was just how it was. So binary operations were strictly left-to-right affairs. But that makes things like 1 + 3 * 4 very disappointing. Since I keep feeling like Brat should at least pretend to be a real language, I decided I really needed to do something about this.

What is so hard about it? It doesn’t seem difficult at first. It seems like there ought to be a nice recursive solution to the problem. And there probably is, but if you are using a typical recursive descent parser, this recursion will not be possible.

There are a few ways to go at this issue, but I wanted the simplest and easiest to deal with. A while ago, I had been thinking about this and ran across a reference to the shunting yard algorithm. At the time, I couldn’t really see how it was related. This time around, I looked at it a bit more closely.

The purpose of the algorithm is basically to convert equations which use infix notation (typical math notation with operators in between their arguments) into Reverse Polish Notation. Why would you want to do that? Well, because RPN is explicit about which order operations are evaluated in. It is also pretty straight-forward to implement (which is the important part, really.)

Let’s take a look at the two main parsing rules, written in Treetop, which deal with binary operations. I have elided the related code:

  rule binary_operation
        lhs:binary_operation_chain rhs:expression { ... }
  end

  rule binary_operation_chain
        (lhs:binary_lhs_expression ~space op:operator ~space)+  { ... }
   end

The first rule will match any binary operations. The second rule is a helper, which matches a list of binary operations, all except for the far right expression. There may have been some explicit reason why I separated these before, but it remains convenient in the code to do so.

The second rule will essentially return a list of alternating values and operators, just what the shunting yard algorithm needs.(Well, actually it’s simpler than the algorithm needs.) This list will be fed into the algorithm, which will spit out a new list of values and operators, now in the proper RPN form.

The parser then “evaluates” the RPN expression. It essentially runs through the algorithm for evaluating RPN, outputting code to evaluate expressions and using temporary values. In other words, the RPN algorithm is not explicitly emitted in the code, but the intent is in the order that things are evaluated.

The whole thing ends up being about 50 lines of code, which is far less than I thought I would be looking at for this.

Anyhow, just wanted to share. I’m pretty excited that Brat has this feature now, especially since it can be implemented in such a flexible way. Operator precedence is stored in a hash table which just associates operators with an integer representing its precedence level, so it is quite simple to add more operators (operators not in the table are just given the lowest precedence, though, so arbitrary operators are still completely supported).


Squish-ins in Brat

As I mentioned before, there is no real reason or goal behind developing Brat, I just saw potion and decided I wanted to be cool, too.

What that means is that features in Brat are sort of based on two things: 1) How easy is it to implement in Neko and 2) Is it something I expect to have because I am spoiled by Ruby.

For example, closures/functions in Brat started off being identical to Neko’s, because obviously that was the easiest thing to do. And I hadn’t planned on taking the ‘prototyping’ approach, but it was easy and is likely the brattiest way to go.

Anyhow, when I was doing stuff for array objects, I was thinking I should really copy Ruby’s approach, which is to have an enumerable module and then mix it in to array and hash and so on. But, of course, that would mean implementing mixins somehow. I knew I didn’t want to add in any new things (like modules) and it’s all about the lowest resistance, so I created squish-ins.

I have no idea if this concept has been used before (probably), but essentially you can take an object and copy all the methods of another object into it.

a = new
b = new
a.something = { p "do something" }
b.squish a
b.something  #prints "do something" 

Note that I said it “copies” the methods from the other object. This is and isn’t true. Squishing in another object’s methods does not create copies of them, only references. However, it is a reference to the object’s actual methods at the time of the squishing. In other words, this is a copying action rather than an inheritance or subclassing action. If the squished-in object’s methods change, the…squisher(?) will still reference the old methods.

a = new
b = new
a.something = { p "do something" }
b.squish a
b.something
a.something = { p "now do something else" }
b.something   #still prints "do something" 

So we get the code-sharing of mixins but not the inheritance properties. For now, I’m fine with that. Of course, it is actually not that difficult to do both:

a = new
b = new
a.something = { p "do something" }
b.something = { a.something }
a.something = { p "now do something else" }
b.something   #prints "now do something else" 

I am not sure which is preferable.


First C Extension Done (sorta)

I’ve been working on Brat quite a bit, and I remembered that I wanted to add in BigNum support. There is a GNU library for this called gmplib which seemed fairly straightforward. In fact, it was considerably more straightforward than connecting it up with Neko.

By the way, I realized I have never done anything in C. My undergraduate courses started off with C++, but I have never had to use plain old C.

Anyhow, this project required wired up three things: gmplib to Neko, then Neko to Brat. gmplib to Neko required creating, manipulating, and wrapping up C code and values using Neko’s FFI. Then I needed to replace all the math code I had written for Brat (in Neko). The last step (unfinished as of yet) will be to seamlessly switch between using Neko’s number values and the structures used by gmplib. There is no point in wasting time with BigNums when you are doing 1 + 1.

Neko FFI

First of all, everything passed in or out of Neko will be a value. This is a Neko datatype, which basically contains a kind and the data itself. There are several kinds that are predefined (basically, a kind is a type), but you will likely need to make your own to pass C values in and out of Neko. This is what the top of my number.c file looks like:
#include <stdio.h>
#include <neko.h>
#include <gmp.h>

DEFINE_KIND(k_mint);
DEFINE_KIND(k_mfloat);

I include the proper header files and then define two kinds, one for integers and another for floats.

I then made a couple functions (one for integers, one for floats) to allocate and wrap up the proper structures for me. I think this is probably the way it would typically look when using Neko’s FFI:
value new_float() {
        mpf_t * new_float;
        new_float = (mpf_t*)alloc(sizeof(mpf_t));
        mpf_init(*new_float);
        value store = alloc_abstract(k_mfloat, new_float);
        val_gc(store, free_float);
        return store;
}
The gmplib structures need to be cleared when you are done with them, so I have a function which is used as a ‘finalizer’, which is assigned as shown above. When the Neko value is garbage collected, it first calls this method:
void free_float(value floatval) {
        mpf_clear(val_data(floatval));
}

This function demonstrates another item: getting the data out of a value. This is performed using val_data, which returns whatever data you have wrapped up in the Neko value (using alloc_abstract).

Most of the functions I wrapped up are pretty straightfoward:
value float_add(value float1, value float2) {
        value result = new_float();
        mpf_add(val_data(result), val_data(float1), val_data(float2));
        return result;
}
To make these functions available to Neko, all you have to do is declare them to be ‘primitives’ and provide the number of parameters:
DEFINE_PRIM(num_add, 2);
Note that all values returned to Neko need to be converted into Neko values, even things like ints and booleans. This is fairly simple, though, as the API provides functions for these, as well as for checking the types of values passed in from Neko. For example, here is a simple boolean check:
value is_int(value num) {
        return alloc_bool(val_is_kind(num, k_mint));
}

Once I caught the hang of it, wrapping up the library functions wasn’t too hard. But good thing I have a fairly large test suite to make sure I’m doing everything right! This is the first project where I have used tests like this, but I believe I understand why people are stressing tests so much. It’s amazing how much they help, both in finding bugs and in providing some peace of mind.


More about Brat

Now, I won’t claim that Brat is a well-planned, coherent language, but I have tried to keep a few things in mind.

1. Common things are shorter, simpler

For example, in Python you pass a function as a parameter by removing parentheses (meth), but invoke it using parentheses (meth()). In Ruby, this is a very weird process and makes named callbacks a bit of a pain (easier to pass a block which calls the method instead). But I invoke methods more often than I pass them as parameters, so a bare variable name means to invoke the method, while an arrow (->meth) means to use it as a value.

Another example. In Ruby, you use puts and gets to print a line (plus newline) and get a line (with the newline attached). In Java, this is System.out.println and…well, input is more of a pain. In Brat, you use p and g, and g strips the newline by default. The less common case (wanting to print without a newline) is just print.

Yet another example. If you want to do string replacement in Ruby, you use gsub to replace all matches, and sub to replace just the first. But I have pretty much never used sub, so in Brat sub replaces all matches and sub_first just does the first (and is clearer about its semantics, too.)

2. As few special cases as possible

In Brat, there are only objects and functions (and I was seriously considering making functions be objects, too, but it was too much of a pain and ruined some things). That is it. There are no keywords, and only a few “special” symbols, like ->. Instead of having separate literals for arrays and hash tables, they share the [] syntax. Operators work like in Ruby, where they are actually converted to method calls, but in Brat all operators are treated equally. Unfortunately, this means math is a little bit of a pain. I’m still thinking about how to make that better.

3. Provide as much flexibility as possible

This is the reasoning for how binary (and unary) operators work: you can define pretty much any operator you want, as long as it only contains symbols. And there are no keywords you might run into conflicts with (although you might overwrite default built-in functions/methods.)

Objects in Brat are not really constrained in any way. You can even share or swap methods between objects. It uses prototyping instead of static (or fairly static) class hierarchies, which I am still getting used to.

4. Worry about performance later

I mean, seriously. I am not going to worry about this until there is more than one person using the language.


Actual Brat Release

I’ve been working on my little language a bit more lately, instead of doing work I really should be doing.

Now that Brat is working pretty well, I thought I would make a release. It’s basically just a snapshot of the current Subversion repository, but I know people would rather download a tar file than use SVN to check something out. It is only for Linux at the moment, but maybe I will get it working on Windows sometime. It should work with Ruby 1.8.6, 1.8.7, and 1.9.

Alright, on to language stuff. I recently changed the scoping rules to be more what a person might expect: any scope can access its outer scopes (rather than before, which used the Neko default of copying the outer scope). This makes the toplevel variables something like globals, except not quite.

Example:
a = { x }
x = 1
p a   #Error - x was not defined when a was
b = { x }
p b   #Prints '1'
c = { x = x + 1 }
c
p b   #Prints '2'

And recursive functions no longer need to be attached to an object:

rec = { x | false? x < 1, { p x; rec(x - 1) } }
rec 10

Bad news is that my current approach does not do a good job of releasing memory. But that is a problem for another day.

Of course, many bugs large and small have been fixed along the way. I hope to at least get this language to the point where I use it regularly. I think it is an interesting and probably odd mix of object-oriented and functional programming.

Also, I set up a discussion group in case Brat generates any attention.


Does this mean..?

Does this mean Brat will never be taken seriously?

a_!?-*+^&@1~\\><$ = new
a_!?-*+^&@1~\\><$.-!_+~%~+_!- = {d0~!@><?&&<>\/+-*^&% | d0~!@><?&&<>\/+-*^&%}
a_!?-*+^&@1~\\><$.@==||------> = { a_!?-*+^&@1~\\><$ }
a_!?-*+^&@1~\\><$.<------||==@ = { ->@==||------> }
@==||------>a_!?-*+^&@1~\\><$ -!_+~%~+_!- a_!?-*+^&@1~\\><$.<------||==@

More Brattiness: Interactive Brat

Pretty much any language these days has to have some kind of interactive mode for playing around and testing things out. So I have been considering for a while how to make an interactive interpreter for Brat.

Now, Neko comes with a little interactive console itself (nekoc -console), but of course that is for Neko, not for Brat. Frustratingly, Neko has a lot fewer ways to inspect itself and do dynamic stuff like evals than you might think. For example, there is no way (that I know of) to grab a binding from somewhere and use it.

Since Neko (and therefore Brat) is compiled to bytecode, I wasn’t sure there would be a way to get this all to work. On top of that, Neko modules are each their own little world, with only “exported” values being available outside.

Long story short, I was getting really discouraged while considering how I could possibly accomplish this. Trying to understand even how the Neko console worked was annoying, because most of the Neko stuff is written in C or NekoML.

But I figured out a way!

However, let me whin- I mean, explain one other frustration. When I translate Brat to Neko, I use a ton of local variables. I do this because I figure they are more likely to be garbage collected and less likely to mysteriously screw things up. But the way the Neko console maintains “state” is by copying over global variables between evaluations. So I wasn’t even thinking about that as a viable solution.

So what did I do? (Seriously, get to the point, right?)

First of all, I needed a way to input Brat code without saving it to a file, and then a way to emit Neko code without saving that to a file. This I accomplished the first by reading in Brat code from standard input. From there, it is easy enough to interpret it and do whatever with it.

Then, I needed a way to interpret the code. This I accomplished by mangling the Neko console code to fit my needs. I was able to create a new executable which reads in Neko code from the standard input and evaluates it.

But, wait. What about those local variables? That was easy enough. Simple replacement was sufficient, and, as a bonus, variables that can remain local without trouble are not touched.

When the interpreter first starts up, it sets the same things as are at the top of a regular Brat program: importing and setting core functions and objects. However, instead of executing the code within a method on base_object (like normal), I simply set this to be base_object and everything works.

The two programs (the brat script and the neko_console) communicate via pipes. It all seems to work fairly well.

I even have a very simplistic “unfinished line” detector so things like functions can span multiple lines.


Brat syntax change

I realize that Brat owes a lot to Smalltalk, so I was looking over the Smalltalk wikipedia page.

I noticed that the syntax for blocks (functions) looked like this:
[ :param | ... ]
That made me think about Brat’s function syntax, which looked like this:
{ | param | ... }

Why is the Brat syntax like that? Well, because I am used to Ruby, where that is what blocks look like. But I noticed something. In Ruby, there are two vertical bars around the parameters. Why? I assume because of parsing issues, especially when you consider that curly braces are also used for hash literals. But, in Brat, square braces are used for arrays and hashes, curly braces are only used for functions. That means the first vertical bar is completely superfluous!

So I changed it to look more like Smalltalk, thereby saving a character and reducing syntactic clutter:
{ param | ... }

It does look a little unbalanced to me, with only that one vertical bar there, but I think that is just because I am used to reading Ruby. I am sure it will grow on me.

Note: This does introduce some ambiguity when you consider that ’|’ can also be a binary operator. In the situation where this could be ambiguous, the formal parameter interpretation has higher priority.

number.| = {rhs| my + rhs }
x = 1
f1 = { 1 | x }  #this is not ambiguous, numbers cannot be formal parameters
p f1  #=> 2
f2 = { x | x }
p f2 5  #=> 5
f3 = {| x | x }
p f3  #=> 2

How are you?

I was messing around, seeing what silly things could be done using Brat, and I came up with several variations of this “How are you?” program.

Here’s one:
  How = {|x| p "How", x[0], x[1] }
  are = {|x| [" are "] + x }
  you? = ["you?"]
  I = {|x| p "I" , x[0], x[1], x[2] }
  am = {|x, y| x + y }
  fine = [" fine, "]
  thank = {|x| ["thank"] + x }
  you! = [" you!"]

  How are you?
  I am fine, thank you!

Pretty nifty, I thought.

Then I thought, “Well, I’m only using a question mark and an exclamation mark, I can probably replicate this in Ruby, too.”

This is the 2-minute direct translation I came up with:

  def How x
        print "How", x[0], x[1], "\n" 
  end

  def are x
        [" are "] + x
  end

  def you?
        ["you?"]
  end

  def I x
        print "I", x[0], x[1], x[2], "\n" 
  end

  def am x, y
        x + y
  end

  def fine
        [" fine, "]
  end

  def thank x
        ["thank"] + x
  end

  def you!
        [" you!"]
  end

  How are you?
  I am fine, (thank you!)

I was a little disappointed that Ruby has to be furtive about how it says thank you, but it doesn’t parse for me otherwise.


My own little language

I think pretty much every programmer (every computer scientist?) dreams of one day writing their own totally awesome programming language. I know I have wanted to do it for a long time. But it’s such a pain…I mean, you have to write grammars and parsers (lexers, too, maybe) and intermediate languages and compilers and maybe a virtual machine to run it on…it’s quite a lot of work and headache.

But then I saw _why had all of a sudden written his own little language called Potion and I became really jealous. I wanted my own little language, too! It wasn’t fair :(

So I caught the bug. Seriously. The language implementor bug. The stars had aligned just right, though. I knew about PEGs from a class I took last year and Alessandro’s Ometa. I had heard of TreeTop, which is a PEG generator for/in Ruby. After a quick look around I saw the Neko VM, which provides a high-level language for…well, implementing languages. Then it compiles down to its own byte-code and runs on its own little VM.

With these tools in hand (and my vague memory of a compilers course), I have been working feverishly on a little language I am calling Brat. Why “Brat”? Because it’s a language for people who don’t like their language getting in the way of doing whatever they want.

So far, I have some basics implemented, like objects and methods. It’s sort of an object-oriented prototyping dynamically typed something or other. I guess I don’t really know.