Author: Rowan Rodrik (Page 25 of 27)

Rowan is mainly a writer. This blog here is a dumping ground for miscellaneous stuff that he just needs to get out of his head. He is way more passionate about the subjects he writes about on Sapiens Habitat: the connections between humans, each other, and to nature, including their human nature.

If you are dreaming of a holiday in the forests of Drenthe (the Netherlands), look no further than “De Schuilplaats”: a beautiful vacation home, around which Rowan maintains a magnificent ecological garden and a private heather field, brimming with biological diversity.

FlashMQ is a business that offers managed MQTT hosting and other services that Rowan co-founded with Jeroen and Wiebe.

Bypassing smart completion in Bash

May 9, 2007 / Rowan Rodrik

Luca Citi, a nice Italian Ubuntu user, just gave me an excellent tip in response to my list of Readline keyboard shortcuts. Modern Linux distributions such as Ubuntu and Gentoo can easily be configured for Bash to use smart completion. With smart completion enabled, instead of just looking among all the available files and directories without discrimination, TAB will be able to more accurately adjust its list of available completions depending on the program for which arguments are being sought.

An example of smart completion is that completions for the cd command will only include actual directories and no longer any regular files. Luca gave me another good example: completions for the kpdf command will only include files with the .pdf extension.

Myself, I’ve been bitten by smart completion a few times because I’d want to complete a command argument towards a filename which wasn’t supported by the smart completion rules. Luca gave me his typical example of such a case: his smart completion configuration includes only entries from the fstab as valid mount points for the mount command. But, what if you want to do an ad-hoc mount? Will you just have to type out the full mount point without auto completion? That’s what I used to think before Luca told me about the Alt+/ shortcut. In Bash, this shortcut will act as the TAB-key would without smart completion enabled.

Of course, I’ve updated my list of Readline keyboard shortcuts to include Alt+/.Thanks Luca! 🙂

Web scraping in Ruby: why I had to use scrAPI instead of WWW::Mechanize and Hpricot

May 2, 2007 / Rowan Rodrik

Thursday evening: so, I had written myself a nice little script using Aaron Patterson’s WWW::Mechanize and why’s Hpricot to extract some data from a popular web-based airport directory.

I was warmed up for Hpricot by the promise of XPath and CSS selector support (and a very cool logo, of course). As a long time XPath user, I started banging out some crispy XPath expressions until I realized that XPath support was only very partial. I kept on trying expressions that would work, even bowing down to expressions that, according to the Wiki, would work, but differently. Come on guys, either support a standard or just plainly ignore it, please! 😡 Because I couldn’t figure out how I’d have to integrate why’s fork of the XPath spec in my expressions, I decided to stick with why’s fork of the CSS selectors instead.

Then, it became time to execute my code. I had estimated that it would take about two hours to finish downloading and parsing the approximately 10.000 pages which contained the data in which I was interested. So, I executed my script, detached my screen session and went to bed, trusting that I would find a nice, handy CSV file in the morning.

Friday morning, I was disappointed to find that my script had been killed. I was left wondering what could have killed the script. I decided to restart the script at the countries starting with the letter b (it had died somewhere halfway the list of countries starting with a b). Soon the script was happily appending data again to the existing CSV file.

Disclaimer: why is a much more prolific Ruby coder than I’ll ever be, so please take my comments with a grain of salt. No, actually, rather take them with a few spoonfuls of salt.

Later, I talked about the spontaneous death of the script with Wiebe. Curious, he looked at the memory usage of my script and saw that it was happily munching away hundreds of megs of memory on our server. And memory usage was growing! With crucial server processes at the risk of running out of memory and with me having to build a circumference around the vegetable garden to protect it from a bunch of brawling chickens, Wiebe was friendly enough to drop in and take a look at my spaghetti code to see if he could fix the leak. He couldn’t, because the leak didn’t appear to be in my code. I wasn’t the first to be bugged by a leak in Hpricot.

That news didn’t make me very happy, because it implied I had to redo the script using different tools. I knew that WWW::Mechanize had been inspired by the Perl package by the same name, so I started by looking at that. After installing WWW::Mechanize, I explored CPAN’s WWW namespace a bit further and noticed that the Perl crowd also had two other good scrapers at their fingertips: WWW::Extractor and WWW::Scraper. Once again I was reminded that Perl, despite its funky syntax, is still the king of all scripting languages when it comes to the availability of quality modules. 🙁 After a few deep breaths, I set my rusty Perl skill into (slow)motion. Hell, this was supposed to be a quick script. Why was this taking so much time? (Yeah, yeah; cue all the jokes about developer incompetence. 😕 )

I was almost stamped by a horde of camels, each with a name more syntactically confusing than the other. Just before I was crushed, I came across a reference to a Ruby scraper with decent support for CSS3 selectors: scrAPI. Credits for this discovery go to the documentors of scRUBYt, a featurefull scraper layered on top of WWW::Mechanize. The documentation writers of scRUBYt where friendly enough to help their users by including a link to the competition.

It took me some time to rewrite the script using scrAPI, partially because it was hard to find any documentation that was more comprehensive than a few blog posts and a cheat sheet and less of a hassle than reading the source. But, when Assaf answered my need by pointing me to the online API docs, I was happy.

Another reason why it was hard to migrate from WWW::Mechanize/Hpricot to scrAPI was that Hpricot starts element offsets for XPath predicates and CSS selectors at zero instead of one where they should start. And of course, I had to rid myself of the weird breed between CSS and XPath selectors.

I was surprised that the script using scrAPI ran about twice as fast as the Hpricot-based script. This was including a cumulative sleep() time between each request of almost an hour, because the speed during testing made me worry about over-exerting their web server. Knowing that one of the popular features of Hpricot is its speed, this was very unexpected, although I have to admit that Hpricot did fill my memory very quickly.

A domain for the Omega Research Foundation

March 22, 2007 / Rowan Rodrik

Popko, my dad, has founded the Omega Research Foundation in 1984 (when I was two years old). The Foundation was to serve as a a vehicle for his research into cyclical changes within ecological and social systems.

Last week, after 23 years of hibernation, the foundation has been geared up again to publish what Popko had already been wanting to publish these twenty plus years ago. So, what happened in between? For the last 15 years, Sicirec happened. Actually, Sicirec is still happening, but, starting last month, I don’t need to spend the majority of my waking hours on it anymore. That’s why I registered the domain omega-research.org and installed a wiki and a weblog under it.

Jorrit is also in on this project. We’re doing this together, the three of us, hoping to involve others from within and around our circle.

eps2eps to the rescue when epstopdf complains of no bounding box

March 2, 2007 / Rowan Rodrik

PDFLaTeX doesn’t like encapsulated postscript images. If you want to use .eps files with pdflatex, you can convert these files to PDF using Sebastian Rahtz’ epstopdf, and then remove all .eps file extensions from the image locations in your .tex source files. Then, the latex command will look for .eps file and the pdflatex command will look for .pdf, .jpg and .png files.

The other moment, I tried to do just this. But, epstopdf complained about the lack of a bounding box in one of my EPS files. Indeed, the conversion finished but generated a huge white background with the actual image somewhere in the lower left corner. From the man-page:

epstopdf transforms the Encapsulated PostScript file so that it is guaranteed to start at the 0,0 coordinate, and it sets a page size exactly corresponding to the BoundingBox. This means that when Ghostscript renders it, the result needs no cropping, and the PDF MediaBox is correct. The result is piped to Ghostscript and a PDF version written.

If the bounding box is not right, of course, you have problems…

Luckily, while tab-completing from eps to epstopdf, I noticed the eps2eps utility. I though: What if this utility happens to sanitize the EPS file a bit? A quick look at the man page and a test run later, my hope was confirmed: epstopdf would now generate a nice PDF file without complaining.

The epstopdf manual page could be amended to: If the bounding box is not right, you might want to try to run eps2eps first.

Making flash cards on-line

March 2, 2007 / Rowan Rodrik

I’m learning Spanish from a Dutch method called Eso sí. Approaching chapter 10, I noticed that I would benefit from first learning the words introduced in each chapter before starting on the chapter’s text and exercises. From doing some exercises on Spanish learning websites (especially www.studyspanish.com), I noticed that flash cards can be a great help.

When I used to be behind a Linux terminal, there would always be an abundance of open source flash card software only one apt-get or emerge away. But, I’m behind a Windows terminal, so I thought I’d better try my luck with some on-line tool to make flash cards.

I first came by The Amazing Flash Card Machine. I registered an account and created a few cards.

http://www.flashcardmachine.com/myFlashCards/

I didn’t find the process of adding cards in the Flash Card Machine very quick or supple, so I went to the next tool, FlashcardExchange. Registering an account again was pretty straight-forward, except the the confirmation mail took ages to arrive, which made me click the resend confirmation link (which was very well presented) twice and even change my registration email address before I noticed all four mail had finally arrived when I returned to my desk after a few hours.

I created two card sets using their clean GUI. About that GUI: although clean, it takes a few too many steps to create a new card set or to start studying a card set:

They have the option to add the contents of multiple card sets to a single Leitner card file, but you then need to pay a one-time fee of $19.95. I’ve considered hashing out the 20 dollars, because the site has a clean design and offers good import/export features (a must if I’m going to shell out money for any service). However, with a GUI that gets in the way of adding cards, I’m going to keep the money where it is.

When I looked a little further, I noticed a pretty cool flashcard wiki anyone can edit, but again, no Leitner card files.

In the end I returned to open source desktop software again. Amazingly some of it supports Windows because the software is written in Java or because the developers feel my pain. Now, next time, I still have to choose between three fine applications.

Scaling bitmap graphics versus scaling vector graphics

February 27, 2007 / Rowan Rodrik

Due to some organizational changes, past December, I had to remove the S.A. suffix from the Sicirec logo:

The original logo with the “S.A.” suffix intact.

After removing the S.A. suffix from the vector file in Illustrator’s vector format, I wanted to export the logo to a small PNG again. Annoyingly, though, the PNG—if I wanted Illustrator to respect the correct aspect ratio—could not be the same width as the original PNG if I gave it the same height; it would always be one pixel higher. If, however, I exported it as a huge PNG corresponding to the vector’s original dimensions and scaled it down in The GIMP, the dimensions turned out about the same.

It was then that I noticed that The GIMP’s scaling algorithm is actually very decent. From just looking at the two images below, you need a moment or two to notice that one is a little sharper than the other. Obviously, that’s the Illustrator version.

In the end, though, neither version integrated easily with the complex layout which I had based around the logo image, so I simply opened the existing PNG image in The GIMP and erased the S.A. suffix.

The original PNG after the GIMP treatment.

I still don’t understand why I couldn’t repeat the scaling result of the original image in Illustrator. But, I’ve probably wasted enough time on a rounding issue that isn’t even an issue…

Apache’s `ForceType` directive overrides `AddCharset` directives

February 26, 2007 / Rowan Rodrik

Yesterday, after uploading a refreshed www.sicirec.org, some character encoding issues popped up because I had converted the website’s content from ISO-8859-1 (Latin 1) to UTF-8. (I wanted to be able to type and paste special characters from PuTTY into VIM without worrying about the particular encoding of each file.)

The Apache HTTPD at InitFour, our webhosting provider, is configured to send ISO-8859-1 by default, while the one on our test server is configured for UTF-8. This caused a little bit of a surprise when I uploaded the refreshed website and saw all characters outside the ASCII range mangled on the life website!

I quickly dug into my .htaccess file to add the AddCharset utf-8 .xhtml directive. To my surprise, this didn’t do squat. A lot of fiddling, reloading and researching later, I realized that the following section in my .htaccess file rendered the AddCharset directive irrelevant:

<Files *.xhtml>
ForceType text/html
</Files>

I had to change the ForceType directive to include the charset as a MIME parameter:

<Files *.xhtml>
ForceType 'text/html; charset=UTF-8'
</Files>

Now, it all seemed to work. (Except that it didn’t really because I do some ridiculously complex content negotiation stuff involving a 406 handler in PHP that virtuals the most appropriate variant when no match is found. This script didn’t send a useful Content-Type header. After first adding it to the script, I noticed that the AddDefaultCharset is actually allowed in .htaccess context—a discovery which luckily rendered the other hacks useless.)

Replacing the trunk of a Subversion repository with a feature branch

February 25, 2007 / Rowan Rodrik

For the Sicirec website, I use Subversion to track all changes. When working on big changes which take more than a day to implement, I follow the Feature Branches branching pattern. This pattern means that the trunk remains relatively stable and usable for everyday updates while I can climb in a feature branch whenever I want to work on the big new feature(s).

Subversion’s merge tracking is non-existent. This means that, when I climb from branch to trunk and back again a lot, I have to manually keep track of all the changes in trunk/ that I merged into the branch. Every one such change, once merged, loses much of its meaningful history unless I painstakingly merge all the commit messages of the patch into the message of the commit that I do after the merge.

Today, after having maintained a branch for months to keep it somewhat in sync with an every-changing trunk, I’m at the point of having to merge the branch back into trunk. This is rather nightmarish because there are bound to be the many merge conflicts that I already suffered whenever merging changes from the trunk into the branch and then multiplied some.

To avoid torture, I decided I’d rather just replace the trunk with my feature branch. This is especially attractive because I then retain the history of the branch which is a little more useful to me than the history of the trunk.

I googled around a bit and could find one thread discussing a similar problem. The solution proposed there seemed to involve a few too many steps for my taste, so I did the following:

# From the working copy of my branch:
$ svn del file:///repos/trunk -m "Temporarily deleted trunk."
$ svn mv file:///repos/branches/my_branch file:///repos/trunk -m "Moved /branches/my_branch to /trunk"
$ svn switch file:///repos/trunk

That worked perfectly fine. (Except that I still want automatic merge tracking, dammit!)

Nested hashes derail Rails’ `url_for` helpers

February 19, 2007 / Rowan Rodrik

While working on the Sicirec PostgreSQL database front-end today, I had to pass a lot of nested parameters to a link_to helper in Rails. Software being what it is, this didn’t work.

There are a few patches awaiting acceptance. The most promising of these patches was part of an open Trac ticket. Because we use Rails as an svn external, applying the patch myself wouldn’t work when deploying unless I’d create a vendor branch for Rails in our own repository. Hoping that someone had forgotten to close the ticket, I first tried to upgrade to Rails 1.2.2, which was about time anyway because we were still in the 1.1 branch. The upgrade went fine but didn’t fix the problem.

Next, I tried to integrate the patch by redefining the methods changed by the patch in our lib/ directory. When this didn’t work, I decided to simply do some flattening of the hash myself for this one particular case.

A bit of googling around gave me many clues that the problem has cost a lot of people lots of time already.

Eventually, I settled with a derivate of some code by Peter Marklund to flatten my hashes:

class Hash
  # Flatten a hash into a flat form suitable for an URL.
  # Accepts as an optional parameter an array of names that pretend to be the ancestor key names.
  #
  # Example 1:
  #
  #   { 'animals' => {
  #       'fish' => { 'legs' => 0, 'sound' => 'Blub' }
  #       'cat' => { 'legs' => 4, 'sound' => 'Miaow' }
  #   }.flatten_for_url
  #
  #   # => { 'animals[fish][legs]'  => 0,
  #          'animals[fish][sound]' => 'Blub',
  #          'animals[cat][legs]'   => 4,
  #          'animals[cat][sound]'  => 'Miaow'
  #        }
  #
  # Example 2:
  #
  #   {'color' => 'blue'}.flatten_for_url( %w(world things) )  # => {'world[things][color]' => 'blue'}
  #
  def flatten_for_url(ancestor_names = [])
    flat_hash = Hash.new
 
    each do |key, value|
      names = Array.new(ancestor_names)
      names << key
 
      if value.is_a?(Hash)
        flat_hash.merge!(value.flatten_for_url(names))
      else
        flat_key = names.shift.to_s.dup
        names.each do |name|
          flat_key << "[#{name}]"
        end
        flat_key << "[]" if value.is_a?(Array)
        flat_hash[flat_key] = value
      end
    end
 
    flat_hash
  end
end

As you can see, I turned my code into a single method of the Hash class. It can be used simply in any url_for (based) call as in the following example:

url_for {
    :controller => 'post',
    :action => 'new',
    'author' => {'name' => 'Rowan', 'gender' => 'm'}
  }.flatten_for_url
  # => /post/new?author[name]=Rowan&author[gender]=m

Now if only some Rails developer would commit the patch already.

Automatic resizing of an HTML `<textarea>` element

February 18, 2007 / Rowan Rodrik

Today, while improving the Rails GUI for the Sicirec database, I was struck once again by how annoyingly small <textarea>s can be when having the user type lots of text.

I had already seen the ideal solution when commenting on Laurelin’s waarbenjij.nu weblog. Although their response box is much too narrow, the height of the box auto-adjusts to the amount of text typed. I decided to borrow their code and amend it slightly for our own use. Differences are:

My code works with Opera, but is untested in IE because we don’t feel the need to support IE for an internal application.
In our DB, notes will often be shortened, so my code also shrinks the textarea when the text shrinks. The function remembers the original number of rows set in the source and will never shrink past that number.

userAgentLowerCase = navigator.userAgent.toLowerCase();
 
function resizeTextarea(t) {
  if ( !t.initialRows ) t.initialRows = t.rows;
 
  a = t.value.split('\n');
  b=0;
  for (x=0; x < a.length; x++) {
    if (a[x].length >= t.cols) b+= Math.floor(a[x].length / t.cols);
  }
 
  b += a.length;
 
  if (userAgentLowerCase.indexOf('opera') != -1) b += 2;
 
  if (b > t.rows || b < t.rows)
    t.rows = (b < t.initialRows ? t.initialRows : b);
}

The function can easily be added to the onkeyup and onmouseup event handlers of a <textarea> element as in:

<textarea cols="60" rows="4"
          onkeyup="resizeTextarea(this)"
          onmouseup="resizeTextarea(this)"></textarea>

I didn’t add it inline as in the example, though. I used Ben Nolan’s Behaviour Javascript library to tie things together a little more cleanly.