Helping Nokogiri. Take II

Nokogiri and Pizza, what else can you ask for?

Ok. My fault. Now, let’s go get some work done.

First, if you haven’t done it, read the previous post about helping nokogiri and forgot about the script and memory leak. It seems that there are more important issues, so let’s fix them first. First run jruby test/test_jruby.rb from Nokogiri root. You’ll see a lot of errors (27 by now) and failures (14). Choose one, and get it green. After that, send me a pull request.

Ok. That sounds simple, but what if  the number of errors or failures raises? The rule I use is simple: keep the sum of both numbers going down and having a failure is better than having an error.

On the other hand, if you take a look at test/test_jruby.rb, you’ll see that not every test is in there. There is a reason for that. Even keeping the number of test low, you get a lot of errors/failures. If that annoys me with just 50 failures, imagine if I had a couple of hundreds errors. When all is right, I’ll add some more to keep the fun on.

Photo by Paul Johnston.

Do you wanna help us with pure-Java Nokogiri?

First things first, if you wanna help, you’ll need to clone the git repo. Just:

git clone git://
cd nokogiri
git checkout --track -b java origin/java

Install the dependencies. Just:

rake install:deps

Because it uses some native libraries, you’ll need to do that with MRI. Finally, you’ll need to generate some files, just run jruby -S rake java:spec. For having a hprof file, you’ll need to run this script with the following command:

jr -J-Xmx32m -J-XX:+HeapDumpOnOutOfMemoryError nokogiri_doc_frag.rb

-J-Xmx32m limits the heap space to 32 Mb, and the other options makes the JVM to write a hprof file when a OutOfMemoryError is thrown. After that, you can inspect that file with the profiler you can find in NetBeans.

In next post, I’ll comment where I think the problem is.