Notes on creating a DSL

The first rule of creating a DSL is DON’T invent your own language. As a wise colleague of mine said: “We create DSLs intending to make people happy and end up making them very unhappy”.

However, the second rule of creating a DSL is treat ALL development as the creation of a DSL! Libraries like AssertJ or RestAssured in the test space show how amazingly useful and readable DSL-like forms can be in coding, forcing the user to think in problem space rather than implementation space. In general, always make your libraries work like a mini DSL.

In this discussion, however, I’m referring to the worst thing ever to do: create a new Domain Specific Language where users give it text and something turns that text into runtime behaviour.

Making a new language is hard!

Who wants to spend their time defining language grammars, or writing a compiler/parser/interpreter. Sure Antlr can help you with this, but that’s only the half of it. Once you’ve created your language you have to teach people how to use it and deal with all the complex edge cases around its syntax and what that may mean to its consumers.

You only need to look at Perl to find out what happens when a well-meaning language creator unleashes options on the general public.

Using existing tools can be long-winded and distracting

That said, it’s much easier to express things in problem space than in a poorly fitting implementation space. SQL queries are a very good fit for pulling data from tables and would be more long-winded and harder to understand if they were coded in, say, Java. Every problem has its commonalities, assumptions and shortcuts, and if they can be baked into something which gives an easy framework for expressing intent, rather than uninteresting detail, then that’s a very powerful tool.

People can Google existing stuff

If you invent something you’re going to have to teach it to people. Teaching syntax and grammar is the easy part. What about all the different pitfalls? The tips for doing it more cleanly? Recipes for doing common tasks? The fact that folks can easy go onto StackOverflow for Python or TypeScript issues means that those languages keep growing and the user base keeps involved.

If you create something you’re going to have to go very public with it and provide a lot of initial training, or you’re going to find that it gets limited by people’s own understanding of what you’ve made.

Documentation is an admission of failure

Producing a load of documentation seems a good idea. Producing none is definitely a bad idea. However, watch out for sentences that begin “Note, when you’re doing this, watch out for…”. It’s quite likely there’ll be some hidden surprises in the thing you’ve created. Some unexpected rules you need to follow, or some things never to do.

A lot of documentation is a warning to stop people falling off the cliff you’ve accidentally left in the implementation. While you may reasonably keep the scope of your DSL lightweight and not promise the world, your users may not be so easy to convince.

Where possible, spend more time rounding off what the language will do for the user than explaining that it can’t do it. The more predictable and unsurprising the system is, the more the users will do with it.

Starting with the familiar

There is a strong argument for not creating so much as extending something commonplace. For example, in one project I worked on, we had a function evaluation syntax that was very similar to Javascript in its form. In the end, I migrated it to use a Javascript parser, making it agree (as far as it was implemented) with the rules of Javascript, because it guaranteed that you could take someone with programming knowledge and get them using it with minimal explanation.

The challenge of starting with the familiar is people get foxed when they can’t do ALL of the familiar. This is where it’s wise to consider extending the scope to let them have as much as they’d reasonably expect to have – in the above case, I found myself adding in things like + and – to the expressions simply because it was harder to explain why they wouldn’t work when most of the machinery for them was already there.

The advantage of using something familiar is that you get a lot of tooling for free. Just pasting your DSL script into an editor like Atom or Notepad++ might give you syntax highlighting, and you can easily bring things like Ace editor into an HTML front end to give your users syntax highlighting without much set up. You can do this for a completely novel DSL too, but the closer you are to something existing, the less to do.

Know why

Parsers and editors and their ilk are hard to get right. Training users in something new can be slow and being able to express yourself clearly in a new language is hard even for that language’s progenitors. But, if there is a definite way of making something easier by adopting a DSL, then focus on that and build the language around that central vision.

In summary

Modeling the problem space and cutting out distraction is a nice thing to do for the consumers of your system/framework/library. If you can make it require a minimum of learning, be as predictable as possible and have as much tooling support as possible, then this can make them happy. Always consider other options before committing.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s