Properly utilizing XslCompiledTransform
Not long ago, we noticed some degradation in performance after we upgraded to .NET 2.0 and migrated to the XslCompiledTransform class from the now obsolete XslTransform class. Our implementation was fairly straightforward, although we hid it behind an interface for easy mocking/testing.
The code looked something like the below:
public interface TransformLoader { XslCompiledTransform Load(string name); } class XslTransformLoader : TransformLoader { public XslCompiledTransform Load(string name) { XslCompiledTransform transform = new XslCompiledTransform(); transform.Load(name); return transform; } }
This is a pretty standard implementation, although after JetBrains .Trace pointed out that a majority of the time was being spent in the Load method, we started doing some research. As it turns out, we mistakenly understood the XslCompiledTransform to be smart enough to determine whether or not the transform had already been compiled. If it was, we thought, it would use the compiled version. As it turns out, this is not the case. To effectively utilize this class, it is important to save off the instance of the class for subsequent uses.
To do this, we created a new implementation, the CachedXslTransformLoader, which looks like this:
class CachedXslTransformLoader : TransformLoader { private Dictionary<string, XslCompiledTransform> transforms = new Dictionary<string, XslCompiledTransform>(); public XslCompiledTransform Load(string name) { XslCompiledTransform transform = null; if (!transforms.TryGetValue(name, out transform)) { transform = new XslCompiledTransform(); transform.Load(name); transforms[name] = transform; } return transform; } }
When running the older XslTransformLoader through a loop of 100 transformations with our XSLT and XML files, we found that it was taking approximately 48 seconds to transform the entire loop. However, when utilizing the new CachedXslTransformLoader, the exact same loop only took 1.3 seconds to execute.
This is where the performance improvements from the XslCompiledTransform really come to fruition, so make sure that you are saving off the instance of the class. As we saw, the class is not smart enough to determine whether or not the XSL has already been compiled.
By the way, the performance problem we were experiencing went away with this minor change. On another note, it was nice to see this adhere to the open/closed principle. We were able to correct issues in the system by adding new code, not by touching existing/tested code.
Bottom line: when using XslCompiledTransform, make sure to save off the instance and reuse it for maximum performance benefit.
Sometimes the problem is in your tests
I was TDDing a new website that I’ve been working on last night and got caught in the interesting predicament where the tests failed, but the production code worked. For the life of me, I could not figure out why my test was failing. It turns out, I missed a tiny little piece of documentation on how shoulda works.
By the way, before we get into this, if you are writing Ruby code and writing tests (you are, arent you?), do yourself a favor and check out the shoulda library. Excellent work from the great folks at thoughtbot.
context "with valid attributes" do setup do @user = Factory.create(:user) @updated_attributes = Factory.attributes_for(:user) put :update, :id => @user.id, :user => @updated_attributes end should_not_change "User.count" should_respond_with :success should_redirect_to 'root_url' end
For quite some time, every test was passing with the exception of should_not_change "User.count". After consulting the documentation and source code for shoulda, I realized what should_not_change was actually doing.
The should_not_change macro was evaluating the User.count statement *PRIOR* to the setup method executing and stored the result in a variable. Then when the test executes, it evaluated the User.count statement again. Since the Factory.create call in the setup method created a new instance in the database, of course, User.count would change.
To get around this particular example, I ended up having to create a nested context to make the test pass. I dont necessarily like this, but it does get the test to pass and gives me an opportunity to change it if someone has a better solution.
context "updating User information" do setup do @user = Factory.create(:user) @updated_attributes = Factory.attributes_for(:user) end context "with valid attributes" do setup { put :update, :id => @user.id, :user => @updated_attributes } should_not_change "User.count" should_respond_with :success should_redirect_to 'root_url' end end
Indeed, sometimes the problem lies in your tests.
puts vs print in ruby
I discovered something a bit peculiar about the puts and print methods in Ruby. puts seems to flush immediately, and therefore shows up on $stdout right away. Take the code example below:
5.times { puts "." sleep 2 }
This functions exactly the way that you would expect. It places a single period on $stdout, followed by a two second pause for five iterations. puts inserts a new line character as well, so instead of placing each period on the same line, each one is on a new line. print does not insert the automatic newline sequence, so it would place each one on the same line. However, the code below does not function the way you would expect.
5.times { print "." sleep 2 }
The code above waits for 10 seconds and then prints all 5 periods. As it turns out, this is because the print method buffers the output. The easiest way to get around this (for a situation like the above) is to set the sync property on $stdout.
STDOUT.sync = true 5.times { print "." sleep 2 }
This sets $stdout to avoid buffering the input, which most modern operating systems do. If you find yourself in a situation where you need to have a small amount of output sent immediately to the screen, this is a good technique to utilize to serve this requirement.
Implementing method_missing
Earlier, I was working on creating a script that would iterate a set of folders and execute a chunk of code against that file when it was found. The script itself is easy enough to write with straight ruby. For example:
Dir.glob("./**/*").each do |f| # your ruby code here end
However, as much as we tend to do this during our daily life, I was hoping to hide this behind an api that some less technical users may be able to use. To promote reuse, I created a class that would provide this functionality.
class FileWalker def root_path=(path) @root_path = path end def each_file_of_type(type, &block) Dir.glob("#{@root_path}/**/#{type}").each do |f| yield f end end end
The users of this API utilized it in a manner which you would expect (coming from a statically-typed language):
walker = FileWalker.new walker.root_path = "." walker.each_file_of_type("*.rb") do |file| # your ruby code here end
While this covered the functional requirements of what I was hoping to do, I really did not like the way this reads. One, the user of the API was expected to implement the wildcard for the type. Two, did I say that I didnt like the way it reads? What I was really hoping for was an API that worked like this:
FileWalker.dir "." do each_rb_file do |f| # your ruby code here end end
The first thing you’ll notice is that I remove the file specification from the call with the each_rb_file method. However, I certainly do not want to corrupt the FileWalker class with dozens of methods to iterate different file types. Having to add a new method every time I want to iterate a different file type would most certainly violate the open/closed principle.
As I thought about the best way to accomplish both the API I desired and the long-term maintenance, I decided to take advantage of a great method on ruby’s Object class called method_missing. method_missing is a method that gets called every time a method on the receiver does not exist. The method gets passed three arguments, the name of the method, any arguments to the method, and a block to execute.
Using a little regexp magic, I am able to intercept calls to each_rb_file and delegate them to the earlier each_file_of_type method (which is now private). Take a look:
class FileWalker def initialize(path) @root_path = path end def self.dir(path, &block) walker = new(path) walker.instance_eval(&block) if block_given? end def method_missing(method, *args, &block) if method.to_s =~ /^each_(.+)_file/ each_file_of_type "*.#{Regexp.last_match[1]}", &block end end private def each_file_of_type(type, &block) Dir.glob("#{@root_path}/**/#{type}").each do |f| yield f end end end
So, as you can see with a little method_missing mojo, we now have access to any number of methods that allow us to retrieve all files of a particular extension. The above script and class will work with any sort of file extension. Calling each_exe_file, each_xml_file, or each_py_file will all function the same way, without adding any new code. The advantage here, as mentioned earlier, are that 1) we get to provide a *readable* API to our consumers, and 2) we conform to the open/closed principle by not having to modify already written and tested code to implement a new extension.
If you’ve not looked much at method_missing and ruby, I encourage you to discover it more. I’ve only touched the surface with its capabilities. Others have gone much further, including implementing an entire XML dsl utilizing this method.
Perhaps in a subsequent post I’ll dive into how I’d go about implementing this… til then.


