(A cheesy homepage for Justin Collins)
Suspicous of Gemcutter stats

Recently, the main Ruby Gems hosting has moved from RubyForge to Gemcutter. This is pretty cool, and I was excited to publish my first gem there (the JRuby gdbm library I’ve mentioned before).

I get excited imagining that people actual use my stuff, so I’ve been tracking the downloads reported by Gemcutter. I seem to recall there being a problem a little while ago with numbers being off, but I thought that had been solved. In any case, I was surprised by the number of downloads (up to 137 now). It’s not very high for most things, I guess, but for a library for a specific dbm for a specific Ruby implementation, it seems suspicious.

My current theory is that people are installing it on accident, perhaps thinking that they need it to use gdbm in the main Ruby implementation (or perhaps others?). Fortunately, I’ve checked it out, and it appears that even if you install it this way, it will not interfere with the gdbm library in Ruby’s standard library. This is likely because the gem paths are later in the search path. Of course, if Ruby’s standard gdbm is not working for you, then my gem probably won’t either, unless you are using it with JRuby, as intended.

In retrospect, perhaps I should have named the gem a little bit better, to avoid confusion…


Ruby links

Here’s some links to interesting Ruby information. I intend to update it periodically.

Ruby Doc – Best site for looking up the core Ruby API and the standard libraries.

AllGems – Documentation for most Ruby gems.

Ruby Forum – Mirrors of the main Ruby mailing lists. Some are read-only.

Gemcutter – The new gem hosting site.

RubyForge – The old Ruby project hosting site. It is slowly being phased out.

Try Ruby – The new Try Ruby page, where you can go through an interactive tutorial directly in your browser.

Ruby Inside – Popular Ruby blog, with frequent articles.

Ruby Flow – Community-submitted Ruby links.

Ruby Trends – View and vote for trends in Ruby software, books, and more.

Ruby Pulse – Screencasts of Ruby software and libraries.

Ruby Subreddit – Reddit site for Ruby. Not incredibly active.

Ruby Stack Overflow – Ruby questions on Stack Overlow.

Confreaks – Not strictly Ruby, but videos from many Ruby events are available for streaming.


Much nicer closure arguments in Brat

I am just so excited right now. I was watching Dave Thomas’ (not the Wendy’s guy) keynote talk from RubyConf 2008, in which he proposes several “forks” of Ruby. At about 40 minutes in, he discusses having a fork of Ruby that has real closures instead of blocks. I thought to myself, “Self, that sounds a lot like Brat.” Five seconds later, he mentioned the problem with passing in multiple closure literals to a function: the comma is ugly.

some_method { do_stuff }, { 1 + 2 }

I agree! I think this is an ugly issue in Brat right now.

But then he proposed an awesome solution: if two closures are next to each other in the argument list, you don’t need a comma! Now you can do this:

some_method { do_stuff } { 1 + 2 }

This took approximately 2 minutes to implement for Brat.

Now instead of

while { x < 1 },
    { 
        p x
        x = x + 1 
    }

You can do

while { x < 1 } { 
        p x
        x = x + 1 
    }

I think this is awesome and makes Brat a lot more attractive.

In fact, I went totally crazy so now you can do

x = 101
true? x > 100
      { p "> 100!" }
      { p "<= 100!" }

Just be careful when using bare variables:

true? a { p "truth!" } { p "lies!" }

This is parsed into:

true?(a({ p "truth!" }, { p "lies!" }))

This is so cool…now to update all of Brat’s docs with this syntax. :)


gdbm for JRuby

Motivation

(Warning: boring story ahead. Skip to next section if you want.)

Quite a while ago, I thought it would be kind of nice if my Ruby MUD server would run on JRuby, since JRuby might be better at dealing with long-running server processes. Unfortunately, at the time, EventMachine did not have a Java version, so I decided to wait for that to work.

A while later, there was some kind of Java release for EventMachine. I tried it out, but then ran into another problem: gdbm is provided as a C extension for the standard Ruby distribution. So, I gave up again.

Then, along came Ruby FFI and I thought, “Hm…wonder if I could use that…” and then I gave up because I am not very good at C, even just trying to read it.

THEN, I saw this blog post about the next version of JRuby (1.4) and decided maybe I should do something about it, after all. So I did.

ffi-gdbm

The result is a gdbm library which should be fully compatible with the standard C extension distributed with Ruby, including reading and writing gdbm files which can be read/written by either implementation.

Requirements

Although I had hoped to make this library compatible with any Ruby FFI implementation, unfortunately it only works with JRuby, and it needs at least version 1.4. Perhaps this will change as the other Ruby FFI implementations get fancier.

Installation

You can checkout ffi-gdbm on GitHub. All you actually need is the gdbm.rb file.

Alternatively, you can install it as a gem:

jgem install gdbm --source http://gemcutter.org

Make sure you require "rubygems" if you do use the gem version.

Usage

You can use this library just as you would the standard library that comes with Ruby. See here for documentation.

Aftermath

Unfortunately, the Java version of EventMachine is still incomplete, so it’s a bit of letdown, although kams does work at least nominally using it. Hopefully, EventMachine will one day be completely stable under JRuby.


What I have enjoyed lately

Lately, there has been a particular activity which I consistently look forward to with excitement. That is writing libraries for my own programming language. Even if it is just wrapping existing libraries, there is something really cool about enabling a language to do “real world” type things, such as communicating with other processes or generating HTML or making a full-blown GUI.


Announcing the Fledgling Languages List

Since I have started working on my own little programming language, I have repeatedly made attempts to take a look around at other little languages that people may be working on. However, either my Google-fu is weak or it is just difficult to find these little guys.

Part of the difficulty is that most people do not want to see your little half-finished, half-working language. A recent discussion on Reddit showed that people can be very reactionary to such languages. Now, there is Esolang, but that site is more focused on languages that are purposefully unusable.

Thus, I have created a new site to be the home of fledgling languages, appropriately entitled The Fledgling Languages List. Anyone is welcome to submit their own language or languages they know about for listing. Each language has its own little page with a description and people can leave comments or feedback for each language.

I am very open to suggestions or feedback concerning the site, especially if something is broken.


Operator Precedence and the Shunting Yard Algorithm

Previously, I had pretty much avoided implementing any kind of operator precedence for Brat and covered up my reluctance by saying that was just how it was. So binary operations were strictly left-to-right affairs. But that makes things like 1 + 3 * 4 very disappointing. Since I keep feeling like Brat should at least pretend to be a real language, I decided I really needed to do something about this.

What is so hard about it? It doesn’t seem difficult at first. It seems like there ought to be a nice recursive solution to the problem. And there probably is, but if you are using a typical recursive descent parser, this recursion will not be possible.

There are a few ways to go at this issue, but I wanted the simplest and easiest to deal with. A while ago, I had been thinking about this and ran across a reference to the shunting yard algorithm. At the time, I couldn’t really see how it was related. This time around, I looked at it a bit more closely.

The purpose of the algorithm is basically to convert equations which use infix notation (typical math notation with operators in between their arguments) into Reverse Polish Notation. Why would you want to do that? Well, because RPN is explicit about which order operations are evaluated in. It is also pretty straight-forward to implement (which is the important part, really.)

Let’s take a look at the two main parsing rules, written in Treetop, which deal with binary operations. I have elided the related code:

  rule binary_operation
        lhs:binary_operation_chain rhs:expression { ... }
  end

  rule binary_operation_chain
        (lhs:binary_lhs_expression ~space op:operator ~space)+  { ... }
   end

The first rule will match any binary operations. The second rule is a helper, which matches a list of binary operations, all except for the far right expression. There may have been some explicit reason why I separated these before, but it remains convenient in the code to do so.

The second rule will essentially return a list of alternating values and operators, just what the shunting yard algorithm needs.(Well, actually it’s simpler than the algorithm needs.) This list will be fed into the algorithm, which will spit out a new list of values and operators, now in the proper RPN form.

The parser then “evaluates” the RPN expression. It essentially runs through the algorithm for evaluating RPN, outputting code to evaluate expressions and using temporary values. In other words, the RPN algorithm is not explicitly emitted in the code, but the intent is in the order that things are evaluated.

The whole thing ends up being about 50 lines of code, which is far less than I thought I would be looking at for this.

Anyhow, just wanted to share. I’m pretty excited that Brat has this feature now, especially since it can be implemented in such a flexible way. Operator precedence is stored in a hash table which just associates operators with an integer representing its precedence level, so it is quite simple to add more operators (operators not in the table are just given the lowest precedence, though, so arbitrary operators are still completely supported).


My weird project setup

When I started with kams, it was just going to be my own little MUD server that I puttered around with. So I set up a little Subversion repository and went at it. That repository slowly turned into my main server for the game. That way, I could do version control and keep a backup copy of everything. That worked pretty well for quite a while, until the decision was made to release some code. To do so, I created a branch, stripped out all of the stuff I didn’t want publicly released, and then released that code. Well, that is the way it was supposed to work. Actually, I had to fix things along the way as I was preparing the release code. This meant copying/merging back and forth between the branch and the trunk versions of the code.

This became annoying.

Now that I have been using Git for a while, I was hoping to do something so that I can continue my current path (keeping my private stuff private, but in the Subversion repository), but also maintaining an on-going public code repository.

To compound the issue, I (very foolishly?) do most of the development work on the live server. (Keep in mind, no one is actually using the live server, so this really does not matter.)

This is what I came up with. First, I created a GitHub repository for the public code. Then I copied my latest branch into the Git repository, committed it. Next, I created a new branch, deleted all the files, and then did an svn checkout to the new branch, and finally committed the private files to that Git branch.

Now what I can do is make code updates on my trunk code (kept in Subversion), then do an svn update on the private Git branch, then do a Git commit of the specific changes I want put in the public repository, then do a git cherry-pick of that commit into my master branch. Finally, a git push puts it all up on Github.

This works in reverse, as well. Changes made in the Git repository can be propagated to the private branch, from there to the Subversion repository of the trunk.

Works fairly well, actually, and it will make it considerably easier to create future releases. Yay.


Squish-ins in Brat

As I mentioned before, there is no real reason or goal behind developing Brat, I just saw potion and decided I wanted to be cool, too.

What that means is that features in Brat are sort of based on two things: 1) How easy is it to implement in Neko and 2) Is it something I expect to have because I am spoiled by Ruby.

For example, closures/functions in Brat started off being identical to Neko’s, because obviously that was the easiest thing to do. And I hadn’t planned on taking the ‘prototyping’ approach, but it was easy and is likely the brattiest way to go.

Anyhow, when I was doing stuff for array objects, I was thinking I should really copy Ruby’s approach, which is to have an enumerable module and then mix it in to array and hash and so on. But, of course, that would mean implementing mixins somehow. I knew I didn’t want to add in any new things (like modules) and it’s all about the lowest resistance, so I created squish-ins.

I have no idea if this concept has been used before (probably), but essentially you can take an object and copy all the methods of another object into it.

a = new
b = new
a.something = { p "do something" }
b.squish a
b.something  #prints "do something" 

Note that I said it “copies” the methods from the other object. This is and isn’t true. Squishing in another object’s methods does not create copies of them, only references. However, it is a reference to the object’s actual methods at the time of the squishing. In other words, this is a copying action rather than an inheritance or subclassing action. If the squished-in object’s methods change, the…squisher(?) will still reference the old methods.

a = new
b = new
a.something = { p "do something" }
b.squish a
b.something
a.something = { p "now do something else" }
b.something   #still prints "do something" 

So we get the code-sharing of mixins but not the inheritance properties. For now, I’m fine with that. Of course, it is actually not that difficult to do both:

a = new
b = new
a.something = { p "do something" }
b.something = { a.something }
a.something = { p "now do something else" }
b.something   #prints "now do something else" 

I am not sure which is preferable.


First C Extension Done (sorta)

I’ve been working on Brat quite a bit, and I remembered that I wanted to add in BigNum support. There is a GNU library for this called gmplib which seemed fairly straightforward. In fact, it was considerably more straightforward than connecting it up with Neko.

By the way, I realized I have never done anything in C. My undergraduate courses started off with C++, but I have never had to use plain old C.

Anyhow, this project required wired up three things: gmplib to Neko, then Neko to Brat. gmplib to Neko required creating, manipulating, and wrapping up C code and values using Neko’s FFI. Then I needed to replace all the math code I had written for Brat (in Neko). The last step (unfinished as of yet) will be to seamlessly switch between using Neko’s number values and the structures used by gmplib. There is no point in wasting time with BigNums when you are doing 1 + 1.

Neko FFI

First of all, everything passed in or out of Neko will be a value. This is a Neko datatype, which basically contains a kind and the data itself. There are several kinds that are predefined (basically, a kind is a type), but you will likely need to make your own to pass C values in and out of Neko. This is what the top of my number.c file looks like:
#include <stdio.h>
#include <neko.h>
#include <gmp.h>

DEFINE_KIND(k_mint);
DEFINE_KIND(k_mfloat);

I include the proper header files and then define two kinds, one for integers and another for floats.

I then made a couple functions (one for integers, one for floats) to allocate and wrap up the proper structures for me. I think this is probably the way it would typically look when using Neko’s FFI:
value new_float() {
        mpf_t * new_float;
        new_float = (mpf_t*)alloc(sizeof(mpf_t));
        mpf_init(*new_float);
        value store = alloc_abstract(k_mfloat, new_float);
        val_gc(store, free_float);
        return store;
}
The gmplib structures need to be cleared when you are done with them, so I have a function which is used as a ‘finalizer’, which is assigned as shown above. When the Neko value is garbage collected, it first calls this method:
void free_float(value floatval) {
        mpf_clear(val_data(floatval));
}

This function demonstrates another item: getting the data out of a value. This is performed using val_data, which returns whatever data you have wrapped up in the Neko value (using alloc_abstract).

Most of the functions I wrapped up are pretty straightfoward:
value float_add(value float1, value float2) {
        value result = new_float();
        mpf_add(val_data(result), val_data(float1), val_data(float2));
        return result;
}
To make these functions available to Neko, all you have to do is declare them to be ‘primitives’ and provide the number of parameters:
DEFINE_PRIM(num_add, 2);
Note that all values returned to Neko need to be converted into Neko values, even things like ints and booleans. This is fairly simple, though, as the API provides functions for these, as well as for checking the types of values passed in from Neko. For example, here is a simple boolean check:
value is_int(value num) {
        return alloc_bool(val_is_kind(num, k_mint));
}

Once I caught the hang of it, wrapping up the library functions wasn’t too hard. But good thing I have a fairly large test suite to make sure I’m doing everything right! This is the first project where I have used tests like this, but I believe I understand why people are stressing tests so much. It’s amazing how much they help, both in finding bugs and in providing some peace of mind.


More about Brat

Now, I won’t claim that Brat is a well-planned, coherent language, but I have tried to keep a few things in mind.

1. Common things are shorter, simpler

For example, in Python you pass a function as a parameter by removing parentheses (meth), but invoke it using parentheses (meth()). In Ruby, this is a very weird process and makes named callbacks a bit of a pain (easier to pass a block which calls the method instead). But I invoke methods more often than I pass them as parameters, so a bare variable name means to invoke the method, while an arrow (->meth) means to use it as a value.

Another example. In Ruby, you use puts and gets to print a line (plus newline) and get a line (with the newline attached). In Java, this is System.out.println and…well, input is more of a pain. In Brat, you use p and g, and g strips the newline by default. The less common case (wanting to print without a newline) is just print.

Yet another example. If you want to do string replacement in Ruby, you use gsub to replace all matches, and sub to replace just the first. But I have pretty much never used sub, so in Brat sub replaces all matches and sub_first just does the first (and is clearer about its semantics, too.)

2. As few special cases as possible

In Brat, there are only objects and functions (and I was seriously considering making functions be objects, too, but it was too much of a pain and ruined some things). That is it. There are no keywords, and only a few “special” symbols, like ->. Instead of having separate literals for arrays and hash tables, they share the [] syntax. Operators work like in Ruby, where they are actually converted to method calls, but in Brat all operators are treated equally. Unfortunately, this means math is a little bit of a pain. I’m still thinking about how to make that better.

3. Provide as much flexibility as possible

This is the reasoning for how binary (and unary) operators work: you can define pretty much any operator you want, as long as it only contains symbols. And there are no keywords you might run into conflicts with (although you might overwrite default built-in functions/methods.)

Objects in Brat are not really constrained in any way. You can even share or swap methods between objects. It uses prototyping instead of static (or fairly static) class hierarchies, which I am still getting used to.

4. Worry about performance later

I mean, seriously. I am not going to worry about this until there is more than one person using the language.


Actual Brat Release

I’ve been working on my little language a bit more lately, instead of doing work I really should be doing.

Now that Brat is working pretty well, I thought I would make a release. It’s basically just a snapshot of the current Subversion repository, but I know people would rather download a tar file than use SVN to check something out. It is only for Linux at the moment, but maybe I will get it working on Windows sometime. It should work with Ruby 1.8.6, 1.8.7, and 1.9.

Alright, on to language stuff. I recently changed the scoping rules to be more what a person might expect: any scope can access its outer scopes (rather than before, which used the Neko default of copying the outer scope). This makes the toplevel variables something like globals, except not quite.

Example:
a = { x }
x = 1
p a   #Error - x was not defined when a was
b = { x }
p b   #Prints '1'
c = { x = x + 1 }
c
p b   #Prints '2'

And recursive functions no longer need to be attached to an object:

rec = { x | false? x < 1, { p x; rec(x - 1) } }
rec 10

Bad news is that my current approach does not do a good job of releasing memory. But that is a problem for another day.

Of course, many bugs large and small have been fixed along the way. I hope to at least get this language to the point where I use it regularly. I think it is an interesting and probably odd mix of object-oriented and functional programming.

Also, I set up a discussion group in case Brat generates any attention.


Does this mean..?

Does this mean Brat will never be taken seriously?

a_!?-*+^&@1~\\><$ = new
a_!?-*+^&@1~\\><$.-!_+~%~+_!- = {d0~!@><?&&<>\/+-*^&% | d0~!@><?&&<>\/+-*^&%}
a_!?-*+^&@1~\\><$.@==||------> = { a_!?-*+^&@1~\\><$ }
a_!?-*+^&@1~\\><$.<------||==@ = { ->@==||------> }
@==||------>a_!?-*+^&@1~\\><$ -!_+~%~+_!- a_!?-*+^&@1~\\><$.<------||==@

Muscle Memory?

This is weird. Every time I try to type gosu, my fingers type gosub. Especially weird since it has probably been a decade since I did anything in BASIC.

Apparently, BASIC really does mutilate the mind.


More Brattiness: Interactive Brat

Pretty much any language these days has to have some kind of interactive mode for playing around and testing things out. So I have been considering for a while how to make an interactive interpreter for Brat.

Now, Neko comes with a little interactive console itself (nekoc -console), but of course that is for Neko, not for Brat. Frustratingly, Neko has a lot fewer ways to inspect itself and do dynamic stuff like evals than you might think. For example, there is no way (that I know of) to grab a binding from somewhere and use it.

Since Neko (and therefore Brat) is compiled to bytecode, I wasn’t sure there would be a way to get this all to work. On top of that, Neko modules are each their own little world, with only “exported” values being available outside.

Long story short, I was getting really discouraged while considering how I could possibly accomplish this. Trying to understand even how the Neko console worked was annoying, because most of the Neko stuff is written in C or NekoML.

But I figured out a way!

However, let me whin- I mean, explain one other frustration. When I translate Brat to Neko, I use a ton of local variables. I do this because I figure they are more likely to be garbage collected and less likely to mysteriously screw things up. But the way the Neko console maintains “state” is by copying over global variables between evaluations. So I wasn’t even thinking about that as a viable solution.

So what did I do? (Seriously, get to the point, right?)

First of all, I needed a way to input Brat code without saving it to a file, and then a way to emit Neko code without saving that to a file. This I accomplished the first by reading in Brat code from standard input. From there, it is easy enough to interpret it and do whatever with it.

Then, I needed a way to interpret the code. This I accomplished by mangling the Neko console code to fit my needs. I was able to create a new executable which reads in Neko code from the standard input and evaluates it.

But, wait. What about those local variables? That was easy enough. Simple replacement was sufficient, and, as a bonus, variables that can remain local without trouble are not touched.

When the interpreter first starts up, it sets the same things as are at the top of a regular Brat program: importing and setting core functions and objects. However, instead of executing the code within a method on base_object (like normal), I simply set this to be base_object and everything works.

The two programs (the brat script and the neko_console) communicate via pipes. It all seems to work fairly well.

I even have a very simplistic “unfinished line” detector so things like functions can span multiple lines.