logo
  • Jobs
  • About Me
  • Contact
  • Home

Archive for February, 2009

Properly utilizing XslCompiledTransform

Not long ago, we noticed some degradation in performance after we upgraded to .NET 2.0 and migrated to the XslCompiledTransform class from the now obsolete XslTransform class. Our implementation was fairly straightforward, although we hid it behind an interface for easy mocking/testing.

The code looked something like the below:

public interface TransformLoader
{
    XslCompiledTransform Load(string name);
}
 
class XslTransformLoader : TransformLoader
{
    public XslCompiledTransform Load(string name)
    {
        XslCompiledTransform transform = new XslCompiledTransform();
        transform.Load(name);
        return transform;
    }
}

This is a pretty standard implementation, although after JetBrains .Trace pointed out that a majority of the time was being spent in the Load method, we started doing some research. As it turns out, we mistakenly understood the XslCompiledTransform to be smart enough to determine whether or not the transform had already been compiled. If it was, we thought, it would use the compiled version. As it turns out, this is not the case. To effectively utilize this class, it is important to save off the instance of the class for subsequent uses.

To do this, we created a new implementation, the CachedXslTransformLoader, which looks like this:

class CachedXslTransformLoader : TransformLoader
{
    private Dictionary<string, XslCompiledTransform> transforms = new Dictionary<string, XslCompiledTransform>();
 
    public XslCompiledTransform Load(string name)
    {
        XslCompiledTransform transform = null;
        if (!transforms.TryGetValue(name, out transform))
        {
            transform = new XslCompiledTransform();
            transform.Load(name);
            transforms[name] = transform;
        }
 
        return transform;
    }
}

When running the older XslTransformLoader through a loop of 100 transformations with our XSLT and XML files, we found that it was taking approximately 48 seconds to transform the entire loop. However, when utilizing the new CachedXslTransformLoader, the exact same loop only took 1.3 seconds to execute.

This is where the performance improvements from the XslCompiledTransform really come to fruition, so make sure that you are saving off the instance of the class. As we saw, the class is not smart enough to determine whether or not the XSL has already been compiled.

By the way, the performance problem we were experiencing went away with this minor change. On another note, it was nice to see this adhere to the open/closed principle. We were able to correct issues in the system by adding new code, not by touching existing/tested code.

Bottom line: when using XslCompiledTransform, make sure to save off the instance and reuse it for maximum performance benefit.

2 Comments

Sometimes the problem is in your tests

I was TDDing a new website that I’ve been working on last night and got caught in the interesting predicament where the tests failed, but the production code worked. For the life of me, I could not figure out why my test was failing. It turns out, I missed a tiny little piece of documentation on how shoulda works.

By the way, before we get into this, if you are writing Ruby code and writing tests (you are, arent you?), do yourself a favor and check out the shoulda library. Excellent work from the great folks at thoughtbot.

context "with valid attributes" do
  setup do
    @user = Factory.create(:user)
    @updated_attributes = Factory.attributes_for(:user)
 
    put :update, :id => @user.id, :user => @updated_attributes
  end
 
  should_not_change "User.count"
  should_respond_with :success
  should_redirect_to 'root_url'
end

For quite some time, every test was passing with the exception of should_not_change "User.count". After consulting the documentation and source code for shoulda, I realized what should_not_change was actually doing.

The should_not_change macro was evaluating the User.count statement *PRIOR* to the setup method executing and stored the result in a variable. Then when the test executes, it evaluated the User.count statement again. Since the Factory.create call in the setup method created a new instance in the database, of course, User.count would change.

To get around this particular example, I ended up having to create a nested context to make the test pass. I dont necessarily like this, but it does get the test to pass and gives me an opportunity to change it if someone has a better solution.

context "updating User information" do
  setup do
    @user = Factory.create(:user)
    @updated_attributes = Factory.attributes_for(:user)
  end
 
  context "with valid attributes" do
    setup { put :update, :id => @user.id, :user => @updated_attributes }
 
    should_not_change "User.count"
    should_respond_with :success
    should_redirect_to 'root_url'
  end
end

Indeed, sometimes the problem lies in your tests.

No Comments

puts vs print in ruby

I discovered something a bit peculiar about the puts and print methods in Ruby. puts seems to flush immediately, and therefore shows up on $stdout right away. Take the code example below:

5.times {
  puts "."
  sleep 2
}

This functions exactly the way that you would expect. It places a single period on $stdout, followed by a two second pause for five iterations. puts inserts a new line character as well, so instead of placing each period on the same line, each one is on a new line. print does not insert the automatic newline sequence, so it would place each one on the same line. However, the code below does not function the way you would expect.

5.times {
  print "."
  sleep 2
}

The code above waits for 10 seconds and then prints all 5 periods. As it turns out, this is because the print method buffers the output. The easiest way to get around this (for a situation like the above) is to set the sync property on $stdout.

STDOUT.sync = true
5.times {
  print "."
  sleep 2
}

This sets $stdout to avoid buffering the input, which most modern operating systems do. If you find yourself in a situation where you need to have a small amount of output sent immediately to the screen, this is a good technique to utilize to serve this requirement.

7 Comments

Implementing method_missing

Earlier, I was working on creating a script that would iterate a set of folders and execute a chunk of code against that file when it was found. The script itself is easy enough to write with straight ruby. For example:

Dir.glob("./**/*").each do |f|
  # your ruby code here
end

However, as much as we tend to do this during our daily life, I was hoping to hide this behind an api that some less technical users may be able to use. To promote reuse, I created a class that would provide this functionality.

class FileWalker
  def root_path=(path)
    @root_path = path
  end
 
  def each_file_of_type(type, &block)
    Dir.glob("#{@root_path}/**/#{type}").each do |f|
      yield f
    end
  end
end

The users of this API utilized it in a manner which you would expect (coming from a statically-typed language):

walker = FileWalker.new
walker.root_path = "."
walker.each_file_of_type("*.rb") do |file|
  # your ruby code here
end

While this covered the functional requirements of what I was hoping to do, I really did not like the way this reads. One, the user of the API was expected to implement the wildcard for the type. Two, did I say that I didnt like the way it reads? What I was really hoping for was an API that worked like this:

FileWalker.dir "." do
  each_rb_file do |f|
    # your ruby code here
  end
end

The first thing you’ll notice is that I remove the file specification from the call with the each_rb_file method. However, I certainly do not want to corrupt the FileWalker class with dozens of methods to iterate different file types. Having to add a new method every time I want to iterate a different file type would most certainly violate the open/closed principle.

As I thought about the best way to accomplish both the API I desired and the long-term maintenance, I decided to take advantage of a great method on ruby’s Object class called method_missing. method_missing is a method that gets called every time a method on the receiver does not exist. The method gets passed three arguments, the name of the method, any arguments to the method, and a block to execute.

Using a little regexp magic, I am able to intercept calls to each_rb_file and delegate them to the earlier each_file_of_type method (which is now private). Take a look:

class FileWalker
  def initialize(path)
    @root_path = path
  end
 
  def self.dir(path, &block)
    walker = new(path)
    walker.instance_eval(&block) if block_given?
  end
 
  def method_missing(method, *args, &block)
    if method.to_s =~ /^each_(.+)_file/
      each_file_of_type "*.#{Regexp.last_match[1]}", &block
    end
  end
 
private
  def each_file_of_type(type, &block)
    Dir.glob("#{@root_path}/**/#{type}").each do |f|
      yield f
    end
  end
end

So, as you can see with a little method_missing mojo, we now have access to any number of methods that allow us to retrieve all files of a particular extension. The above script and class will work with any sort of file extension. Calling each_exe_file, each_xml_file, or each_py_file will all function the same way, without adding any new code. The advantage here, as mentioned earlier, are that 1) we get to provide a *readable* API to our consumers, and 2) we conform to the open/closed principle by not having to modify already written and tested code to implement a new extension.

If you’ve not looked much at method_missing and ruby, I encourage you to discover it more. I’ve only touched the surface with its capabilities. Others have gone much further, including implementing an entire XML dsl utilizing this method.

Perhaps in a subsequent post I’ll dive into how I’d go about implementing this… til then.

2 Comments
flag
Favorite Charity
wounded warrior project
Search
Social
  • mattberther on twitter
  • mattberther on linkedin
Syndication
Archives
  • January 2010
  • September 2009
  • July 2009
  • June 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • September 2008
  • August 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003
  • October 2003
  • September 2003
  • August 2003
  • July 2003
  • June 2003
  • May 2003
  • April 2003
  • March 2003
mattberther.com © 2003 - 2010