Chapter 7. DSL infrastructure with Rhino DSL
In this chapter
- Understanding DSL infrastructure
- Rhino DSL structure
- Common DSL idioms
- Batch compilation and caching
By now, you’ve heard about Rhino DSL several times. I keep saying that it’s a library that makes building DSLs easier, but I’ve never gone into the details. This chapter will address that omission.
We’ll go over Rhino DSL in detail and see what it has to offer and under what circumstances you might want to roll your own DSL infrastructure instead. This chapter isn’t intended to replace API documentation; it’s intended to review what a DSL infrastructure should offer and how Rhino DSL measures up.
To be clear, you don’t need to use Rhino DSL to build DSLs in Boo. Rhino DSL is merely an aggregation of idioms that I have found useful across many DSL examples.
Before we get into Rhino DSL, let’s consider what we want from a DSL infrastructure, and why we need one in the first place. There is a set of problems that we need to deal with and resolve in order to build production-quality DSLs, including the following:
- Dealing with the compiler directly is awkward. It involves a fair amount of work, which needs to be done for each DSL you build. Many DSL implementations share common idioms (as discussed in chapters 4 and 5), and there is little sense in duplicating them all over the place.
- Compiling scripts time after time is inefficient. Caching reduces compilation costs, but caching comes with its own set of problems. To begin with, you need to perform cache invalidation and recompile scripts that have been changed.
- Compiling each script individually is costly in terms of performance. Compilation costs can be significantly reduced if you compile many files at once, instead of doing them one by one. This also helps to reduce the number of loaded assemblies in the AppDomain, which reduces memory consumption.
None of those problems are particularly difficult to resolve. Rhino DSL does so, and it’s a tiny library (not even two thousand lines of code, at the time of writing).
A DSL infrastructure also needs to be able to handle some of the things that we talked about in chapter 5, such as ordering of DSLs and managing which are run and when, for example.
Here are the main requirements that a DSL infrastructure should meet:
- Codify common DSL idioms so you don’t have to keep rewriting them
- Handle caching of DSL scripts
- Abstract the compiler bootstrapping
- Batch compile DSLs
- Manage ordering and script discovery
- Not harm the DSL’s extensibility
Rhino DSL is the result of several years’ experience building DSLs and dealing with these issues. It isn’t a masterpiece of programming, but it can save you a lot of time. I suggest that you use Rhino DSL instead of rolling your own infrastructure, at least while you are getting started building DSLs.
Rhino DSL is an active project
Rhino DSL is updated regularly. Most of these updates are either bug fixes or enhancements to support the more advanced scenarios (which are less generically applicable).
As a result, this chapter covers most of Rhino DSL, but it doesn’t cover everything. I don’t cover the parts that are of interest only to a small minority of language implementers.
Before we get into the nitty gritty details, we should take an overall look at the structure of Rhino DSL.
Rhino DSL is composed of two important classes, both of which you’ve probably familiarized yourself with by now: DslEngine and DslFactory. You can see both of them in figure 7.1.
The DslFactory is the external facing interface, used by clients of the DSL, and it uses the DslEngine to perform all the DSL-specific work. The DslEngine is an abstract class that DSL authors are expected to derive from. It provides the standard services out of the box, but it allows you to override most of them as needed. Both classes provide most of the infrastructure services that we’ve talked about so far in this chapter.
There are implementations for IDslEngineStorage and IDslEngineCache as well, which deal with less common scenarios. We’ll look at them in detail later in this chapter.
Aside from those classes, Rhino DSL includes a few others that codify common DSL idioms (implicit base class, script references, and so on).
The end result of using Rhino DSL is that you can get a DSL out the door quickly and extend it as your needs grow.
Let’s focus on each class in turn, starting with the DslFactory.
The DslFactory contains all the DSL infrastructure logic that’s common to all DSLs, regardless of their implementation. There isn’t much of that, though. The DslFactory mainly contains logic for managing the DSL engines, handling batch compilations, and performing cache invalidation.
Listing 7.1 shows a typical usage of a DslFactory.
As you can see in the code comments, you’re supposed to create the DslFactory once, and only once. The DslFactory also manages the compilation cache for each DslEngine, so keeping only one of those around ensures that you don’t recompile unnecessarily.
Then the DslEngine instances, along with their associated implicit base classes, are registered in the factory, and then you can request an instance by name.
At that point, the DslFactory asks the DslEngine to compile the script (it’s more complex than that, but we’ll discuss it in section 7.4), create an instance of it, and then return it to the caller.
Orchestrating the DSL engine is the main job of the DSL factory, so this is a good time to look at DSL engines.
The DslEngine class is where most of the action happens. It’s structured so you can override specific functionality without having to take on too big a burden. The DslEngine class contains the default (and most common) implementation, and it performs most of its work in virtual methods, so you can override and modify the behavior at specific points.
The most important extension point, which we’ve already seen, is the CustomizeCompiler method. This method allows us to modify the compiler pipeline, modify the compiler parameters, and in general set up the compiler infrastructure that we want.
Listing 7.2 shows a typical use of CustomizeCompiler.
1.
It tell the compiler to use late-bound semantics if it can’t use early-bound ones (this is what Ducky = true means).
2.
It registers the ImplicitBaseClassCompilerStep as the second step on the pipeline (the first step is parsing the code into the AST).
3.
It registers the AutoReferenceFilesCompilerStep as the third step on the pipeline, which will support script references.
With that, our job of writing the DSL is more or less done. We may need to do some additional things in the implicit base class, or introduce an AST macro or attribute, but the main work in creating our DSL is done.
Table 7.1 lists the other methods that you can override to provide additional functionality for your DSL. Those are less commonly used, though.
Table 7.1. The less-common extension points that the DslEngine provides
Method |
Purpose |
Notes |
---|---|---|
Compile | Allows you to completely modify the compilation process | Generally it’s better to use the CustomizeCompiler method. |
CreateCompilerException | Handles compilation errors and adds additional information or guidance | The default implementation throws an exception with the full compiler output. |
CreateInstance | Creates an instance of the DSL | This is useful if you want to create a DSL instance by using a special factory, or by using an IoC container. |
Other extension points relate to the DslEngine local cache, change notifications when a script is changed, the location where the scripts are stored, and so on. In order to keep those concerns outside the code that builds the DSLs, they are split into two interfaces: IDslEngineStorage and IDslEngineCache.
IDslEngineStorage handles everything that relates to the storage of scripts: enumerating them, sending notifications on changes, and retrieving script content from storage. The default implementation of IDslEngineStorage is FileSystemDslEngineStorage, which is what we’ve used so far.
IDslEngineCache holds the compilation results of scripts. Its default implementation is an in-memory cache linking each script URL (the path to the actual script file) to the compiled script type generated from that script, but an interesting extension of the cache would be a persistent cache that would allow you to compile the script once and keep the compilation result around, surviving application restarts, until the script is changed.
Figure 7.2 shows the class diagrams of IDslEngineStorage and IDslEngineCache.
Although overriding CustomizeCompiler usually works, let’s see a more complex example of extending IDslEngineStorage to create a script storage system based on an XML file.
For this example, we aren’t interested in the language of the DSL; we’re interested in its surroundings. We’ll create a DSL system whose scripts aren’t located on a filesystem, but in an XML file instead.
Note
Storing the scripts in an XML file is the simplest way to show the full range of the DSL extensibility, but more interesting applications would use source control-based script storage, or database storage. Unfortunately, those approaches are significantly more complex, and aren’t appropriate as simple examples.
We’ll take the Authorization DSL and make it store its scripts in an XML file. Listing 7.3 shows the structure of that XML storage file.
Now that we have the structure of the XML file, let’s analyze the XmlFileDslEngineStorage class. Listing 7.4 shows the code for the class, minus some methods that we’ll examine in the following discussion.
The NotifyOnChange method should call the action delegate when any of the URLs that were passed have changed, but we aren’t supporting this, so we’ll ignore it. A more thorough implementation would watch the XML file for changes and autoload on change, but that isn’t necessary for our example.
The GetTypeNameFromUrl method is useful for cases where the type name and the URL are different. This is the single case where the DslEngine and the IDslEngineStorage need to work in concert.
Now let’s look at the methods we ignored in listing 7.4. Listing 7.5 shows the GetMatchingUrlsIn method.
There’s nothing particularly interesting here, except that the URL that was passed in is a reference parameter. It gets set to the first rule name we find, but why? Doing so gives us the canonized URL. This is a common problem with paths, because there are many ways to refer to the same file. The canonized format is the one that is returned by the IDslEngineStorage and is ensured to be included in the returned URLs. Otherwise, we might run into an issue where we pass a URL in, get a list of matching URLs from the method, but can’t figure out which URL is the one matching our original one.
In this case, the original URL is the operation name, but the canonized URL is the name of the rule. The URL goes in with the value “/account/login” and comes out with the value “administrators can always login”.
Listing 7.6 shows the CreateInput method, which extracts the code from the XML document.
In this method, we extract the content of the rule and return a StringInput with the URL of the rule as the name, and the text of the rule as the context. This will ensure that if there are errors in the script, we’ll get good error messages back, with a pointer to the right location.
Last (and probably also least), listing 7.7 shows the IsUrlIncludedIn method.
This is a trivial implementation, but the method is important. In batching scenarios (discussed further in section 7.4), it’s common to do a batch compilation of the whole directory, even if the script that you’re searching for isn’t there. This is simply an optimization—the directory was accessed, so it might as well be compiled, because other scripts from that directory are likely to be accessed soon. To avoid that scenario, the IsUrlIncludedIn method checks that the batch to be compiled contains the script we want to execute.
That’s it for our XML-based storage. Now we need to hook it up to the DSL engine. Listing 7.8 shows the code for this.
As you can see, all it involves is setting the proper implementation in the constructor. We can now run all our code, and it would work against the XML file.
The names of the scripts are also important, since those will be the class names generated for those scripts. Now consider figure 7.3, which shows the compiled output of a few Authorization DSL scripts.
Figure 7.3. The application of the law of unintended consequences: our authorization rules in Reflector

As you can see, we have some strangely named classes. This is valid from the CLR perspective, but not from the perspective of most programming languages. It works, but it’s amusing.
Prefer file-based solutions
Although it’s good that we have the option to store scripts in other mediums, I strongly recommend that you keep to the tried and true method of storing scripts in the filesystem. This offers a couple of important advantages over the other approaches.
First and foremost, it makes it easy to put the scripts in source control and perform all the usual source-control actions on them (such as diffing changes in scripts or merging a development branch into the production branch).
Second, we can debug scripts. I haven’t discussed it so far, but debugging scripts is just an F11 key away. But this doesn’t work unless the scripts are compiled from the filesystem. Otherwise, the debugger has no real way to find the script’s source code.
The source-control advantage is the more critical consideration, in my opinion. I strongly prefer to be able to make use of source control without having to jump through hoops, and I have never seen anything that works better than simple text files in a folder hierarchy. It’s the simplest solution, and it’s also the best.
Anyway, we have a whole library to explore yet. Let’s jump directly into the DSL idioms that we get from Rhino DSL.
Most DSL idioms are widely useful across many types of DSLs. The Implicit Base Class pattern, for example, is useful for nearly all DSLs. But after covering the common ground, most DSLs take wildly differing paths.
Rhino DSL contains six reusable idioms (at the time of this writing, at least). You’re probably familiar with most of them by now:
- ImplicitBaseClassCompilerStep
- AutoReferenceFilesCompilerStep
- AutoImportCompilerStep
- UseSymbolsStep
- UnderscoreNamingConventionsToPascalCaseCompilerStep
- GeneratePropertyMacro
These six common idioms are the ones I’ve found most useful across many DSL implementations. Let’s look at them each in turn.
The good old Implicit Base Class is codified as ImplicitBaseClassCompilerStep. We need to insert it into the pipeline (preferably in the second position), and it will move all the code not inside a class into an overridden method in the implicit base class.
Listing 7.9 shows a sample use of this class, taken from the Authorization DSL code.
In addition to specifying the base type and the method to move the code to, we can specify namespaces that we want to auto-import.
AutoReferenceFilesCompilerStep supports script references, which we talked about in chapter 5. This class just needs registration in the pipeline to work. Again, it’s best placed near the start of the pipeline.
The code to make this happen is trivial:
Once that’s done, the following syntax will cause the referenced script to be compiled and then referenced on the fly:
Auto-import support is usually handled by the ImplicitBaseClassCompilerStep, but you can also configure it separately, using AutoImportCompilerStep. Listing 7.10 shows how to use this compiler step.
You should note that this is a two-stage process. You need to add a reference to the relevant assembly (which is done on the first line of listing 7.10) and then you add the auto-import compiler step and pass it the namespaces that will automatically be imported to all compiled files.
The symbols compiler step is codified as UseSymbolsStep. Symbols represent a nicer, more fluent way of handling string literals.
Consider this snippet,
UseSymbolsStep will convert all identifiers starting with @ to string literals. The difference between the two approaches is syntactic only, but this is often important when you want to make certain parts of a DSL clearer. The @identifier approach makes a clear distinction between strings that you pass and elements of the language.
Getting even better syntax
Boo allows you to have a far more natural syntax, like this:
This will work, but it requires special treatment: an AST macro or compiler step with more context than a generic step offers. A compiler step that transforms all unknown references to strings is easy to write, but it tends to produce ambiguous errors, so I suggest creating one only after careful consideration.
As you’ve probably figured out already, UseSymbolsStep repeats the usage pattern we’ve seen so far:
Like the other compilers steps, it should be registered at the beginning of the pipeline. Usually I recommend clustering all our compiler steps one after another, directly after the parsing step.
The CLR has well-defined naming conventions, and deviations like send_to are annoying. At the same time, send_to is easier to read in DSL code than SendTo. If only there were a way to resolve this automatically ...
Luckily, we have such a way: UnderscoreNamingConventionsToPascalCaseCompilerStep can automatically translate send_to to SendTo. This compiler step will automatically make this transformation for any member call that contains an underscore. Because I never use underscores in my applications, this works fine for me. You may have to decide on extending UnderscoreNamingConventionsToPascalCaseCompilerStep to understand your convention if you’re using underscores in method or property names.
The process for registering this compiler step is a bit different than all the rest. Unlike the previous steps, we don’t want this one to run as soon as possible. Quite the reverse—we want it to run as late as possible, which means before we start to process the method bodies.
As a result, we register it using the following snippet:
We haven’t used it so far, but we’ll make use of it the next time we create a language.
That’s it for compiler steps.
We have one last thing to explore, the GeneratePropertyMacro. It allows us to take a snippet like this,
and turn it into a property that will return the parameters we called.
Enabling its use is simple, as you can see in listing 7.11.
We create a class that inherits from GeneratePropertyMacro, and we specify in the constructor the property name it needs to generate. The name of the derived class is important, because it’s the name we’ll use in the DSL to refer to this macro (without the Macro suffix).
Those six common idioms are useful across many DSL implementations. For advanced DSL, you will likely want to add your own custom steps and macros to the mix, and we discussed many of those options in chapter 4. But as you can see, you get quite a bit with what’s “in the box,” so to speak.
Now that we are done discussing what idioms Rhino DSL offers, we need to cover the caching and batch compilation infrastructure. More specifically, we have to understand how they work and why they work as they do.
Compiling code is a costly process, and when you’re creating a DSL, you have to consider ways to reduce this cost. We already talked a bit about that in previous chapters. Now we’ll look at the design of Rhino DSL’s compilation process, and at why it was built in such a way.
To save on compilation costs, we introduce a cache (and cache invalidation policy) so we only compile a script once. But assuming that we have many scripts, we’re still going to pay the compilation cost many times over.
Note
Compiling each script individually will also create many small assemblies. The general recommendation for .NET applications is to prefer a few large assemblies over many small assemblies. Batching helps us reduce the number of assemblies that we compile.
You might wish that you could compile everything once, instead of performing many small compilations. The problem with that, though, is that you run into issues with compilation time when you have large numbers of scripts.
The best solution is a compromise. We want some batching in our compilation, but we don’t want to compile everything at once and pay the high cost of a large compilation.
Note
We’ll assume here that we’re talking about scripts that reside on the filesystem. For scripts stored elsewhere, the concepts are similar, but the implementation depends on the concept of hierarchy in the selected storage mechanism.
When we get a request to execute a certain script, we perform the following operations:
1.
Check if the script has already been compiled and exists in the cache.
2.
If it’s in the cache, instantiate and return the new instance, and we’re done.
3.
If it isn’t in the cache, compile all the scripts in the script directory.
4.
Register all the scripts in the cache.
5.
Instantiate the compiled script and return the new instance.
The key here is in step number 3. Instead of compiling only the script we’re interested in, we compile all the scripts in the current directory and register them in the cache. This means that we pay the compilation cost once per directory. It also means that we have bigger (and fewer) assemblies. We can now rely on the natural organization of the filesystem to limit the number of scripts in a directory to a reasonable number.
Because we usually place scripts on the filesystem according to some logic, and because we usually access them according to the same logic, this turns out to be a pretty good heuristic to detect which scripts we should compile.
Cache invalidation puts a tiny wrinkle in this pretty scenario, though. When a script changes, we remove it from the cache, but we also note that this is a script that we have already compiled. When a new request comes for this script, we won’t find it in the cache, but we will find it in the list of files that were compiled and then changed. As a result, we won’t perform a batch compilation in this scenario; we’ll compile only the current script. The logic is simple: if we had to recompile the script, we already performed a batch compilation on its directory, so we don’t need to compile the entire directory again. The end result is a tiny assembly that contains the compiled type from the script that was changed.
This process isn’t my own idea. ASP.NET operates in a similar way when it compiles ASPX files. I used the same ideas when the time came to build the compilation and caching infrastructure of Rhino DSL.
This just about wraps things up for Rhino DSL (it’s a tiny library, remember?). We just have one final topic to cover: handling external dependencies and integration with external factories.
Although the default approach of creating new instances of the DSL using the default constructor is fine for simple cases, it gets tricky for more complex situations. For complex DSLs, we need access to the application’s services and infrastructure.
In most applications, this is handled by using either a static gateway (a static class that provides the given service) or by using dependency injection (passing the services to the instance using the constructor or settable properties).
The advantage of static gateways is that the DSL can call them. Listing 7.12 shows an Authorization DSL that makes additional calls to the Authorization static gateway to perform its work.
This is easy to build, but I dislike this approach. It tends to make testing awkward (we’ll discuss DSL testing in the next chapter). I much prefer to use dependency injection.
Depending on the infrastructure of our application, we have different ways of handling external dependencies, but the built-in option is to pass the parameters to the constructor. That’s what we did when we built the Quote-Generation DSL, as you can see in listing 7.13.
Listing 7.13 shows the bare-bones approach to dependency injection, but we may want to use a more advanced technique. The advanced options all essentially amount to overriding the DslEngine CreateInstance method and modifying how we create an instance of our DSL.
Listing 7.14 shows how we could create an instance of a DSL by routing its creation through an IoC container (such as Windsor or StructureMap).
Now all the DSL dependencies can be satisfied by the container, instead of having to be manually supplied.
Note
The type we created in listing 7.14 was not previously registered in the container (we just compiled it, after all). The IoC container needs to support creating instances of unregistered types, but most of them do.
Note that although it works, directly using the application services from the DSL is an approach you should consider carefully. IoC containers aren’t often written with an eye toward their use in DSLs, and they may allow leakage of programming concerns into the language.
It’s generally better to handle the application services inside the DSL base class, and then to expose methods that properly match the style of the DSL. This also helps significantly when you need to create a new version of the DSL, and you need to change those services. If you have a facade layer, your job is that much easier. We’ll talk about this more in chapter 9.
Take our tour and find out more about liveBook's features:
- Search - full text search of all our books
- Discussions - ask questions and interact with other readers in the discussion forum.
- Highlight, annotate, or bookmark.
In this chapter, we looked at the requirements of a DSL infrastructure and saw how they’re implemented by Rhino DSL. We looked at caching and batching, and at how those work together to produce a well-performing system.
We explored the extensibility options that Rhino DSL offers and we wrote a DSL engine storage class that could load scripts from an XML file, as an example of how to deal with non-filesystem-based storage (databases, source control, and so on). And last, but not least, we discussed the issue of providing dependencies to our DSL scripts.
This chapter is short, but it provides a thorough grounding in the underlying infrastructure that we build upon, as well as outlining the design requirements that have led to building it in this fashion.
With this knowledge, we can now start managing DSLs in real-world applications. We’ll look next at how we can integrate our DSLs with test-driven development practices and test both the DSL implementations and the DSLs themselves.