Smokes your problems, coughs fresh air.

Author: Rowan Rodrik (Page 14 of 27)

Rowan is mainly a writer. This blog here is a dumping ground for miscellaneous stuff that he just needs to get out of his head. He is way more passionate about the subjects he writes about on Sapiens Habitat: the connections between humans, each other, and to nature, including their human nature.

If you are dreaming of a holiday in the forests of Drenthe (the Netherlands), look no further than “De Schuilplaats”: a beautiful vacation home, around which Rowan maintains a magnificent ecological garden and a private heather field, brimming with biological diversity.

FlashMQ is a business that offers managed MQTT hosting and other services that Rowan co-founded with Jeroen and Wiebe.

Exemplary web design: Qt

Often, I come across websites that have a beautiful design or even just interesting design elements. Instead of continuing to spread these URLs all over my $HOME and my (del.icio.us) bookmarks, I thought I’d start adding them here. Today, I want to start with an entry from ~/jot/exemplary-web-design.txt: Qt

Qt homepage

Qt homepage

I’ve actually programmed in Qt a bit in a dark past. Even though I’m not too fond of it (or C++ in general), it’s a very decent toolkit as far as toolkits go. What’s really great, though, is their website (now at qt.nokia.com).

The website logo with integrated slogan is perfectly clear. At the top right, there’s a nice and clear Google Custom Search, below which there’s a language switcher and a cleverly placed contact link.

Then comes the horizontal navigation bar with the tabs. It clearly shows which section you’re in. The homepage has an icon instead of a text, which is a nice touch. Also, the Developer Zone tab has a distinct layout with a big icon. I like this; it makes it clear that Nokia (formerly Trolltech) appreciates its developers (insert Ballmer monkey dance here).

The content area starts with a clear h1 heading text and a one-sentence-introductory paragraph. Then, four of the sections are highlighted again, with a short summation below of what can be expected in that section.

Below that, is another visually distinct area which highlights the latest news-items, events and other recently featured items.

Testimonies by two high-profile projects are used to interrupt the flow of information at this point, before Qt in 2 minutes is presented. Qt in 2 minutes is clearly made to quickly help people who are new to Qt along to the right information. This takes up 7 headings and they use JavaScript to show only one subsection at the time, allowing you to switch subsection by clicking the headings in the menu at the left.

At the end of the content area, there’s a subtle reference to the KDE project and a list of the biggest-name customer logos.

The content area is closed by another horizontal navigation bar. This one has a link to the sitemap, an accessibility statement, and to the contact page again. At the right, it also contains a Nokia logo.

Then in almost invisible print (because it isn’t interesting), there’s the copyright statement and a link to the privacy policy.

Crimson Dark, a sci-fi webcomic

For a while now, I’ve been following Crimson Dark, which is a very cool, grimy sci-fi web comic. David Simon, the writer and artist, created quite the cynical bunch of characters. I love it. The story and dialog are a compliment to my intelligence, the artwork is nice (and getting better with each new chapter). Pure procrastinator’s poetry, that’s what it is.

To get a touch of the nice dark space atmosphere, below I placed the comic posted on Monday, the 14th of last December:

Crimson Dark comic for December 14, 2009

You’ll want to read from the beginning though.

jQuery plugin for auto-growing textareas

With my big blog redesign, I wanted a auto-growing comment box. In the past, I’ve written a nice auto-resize textarea JavaScript function which does just that, but with jQuery belonging to the standard equipment of WordPress these days, I thought it would be cooler to find a nice jQuery plugin to do this.

I added the Auto Growing Textareas plugin by Chrys Bader to my theme. In my header.php:

<?php wp_enqueue_script('jqueryautogrow', get_bloginfo('template_directory').'/jquery.autogrow.js', array('jquery'), '1.2.2') ?>
<script type="text/javascript">
$ = jQuery; /* FIXME: Ugly hack */
jQuery(document).ready(function(){
  jQuery('textarea[name=comment]').autogrow();
});
</script>

However, I noticed that the <textarea> shrunk beyond the original number of rows defined in the rows attribute. (My own function used this attribute as the minimum number of rows.)

While looking for documentation on Chrys Bader’s plugin, I noticed that all the links on the plugin page now redirect to crysbader.com. (Sometimes, I really hate these catch-all redirects! :-x) I also found the Auto Growing Textareas Update plugin by daaz, which is the same with a few updates because the former project has not been updated since January 12, 2008, and had some issues that needed to be resolved. Sounds like a good idea to install the update.

Back to the minimum height problem: the plugin’s source file proved a good source of documentation. I learned that it has a minHeight option. I didn’t manage to actually pass that option in JavaScript, though; doing the following didn’t work:

jQuery('textarea[name=comment]').autogrow({minHeight: 8});

Luckily, it defaults to the min-height defined in the element’s CSS, so I could add the following to my stylesheet to stop the auto-shrinkage madness:

#comments textarea
{
  min: 8em;
 : 8em;
}

Linux OCR with Tesseract

I’m scanning old Flor y Fauna news letters for my Dutch Hardwood Investment Wiki. I need to do this because most of these newsletters, although produced digitally, are available in the Sicirec archive only in paper form. The only graphical item these news-letters sport is a simple graphical header, so I want to convert the scans to text and put the text in a wiki article for each newsletter; I don’t want to upload dozens of image-heavy PDFs just to show the original (crappy) layout.

The problem, of course, is that I’m on Linux and I don’t know of any good free, open source OCR programs. I don’t know much at all about OCR to be frank. 😕

Anyway, I’ve found this Linux.com article by Mathis Dirksen-Thedens about doing OCR the hardcore way. The downside of his process is that you have to preprocess each image to end up with square, border-less chunks of just text. He recommend Tesseract. The Tesseract project brags that their “engine was one of the top 3 engines in the 1995 UNLV Accuracy test”. Wow, impressive! But, wait, there’s more: “Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available.” They’ve actually made me embarrassed for trying to do this with open source software. 🙁

Tesseract (and similar programs like GOCR and Ocrad) only do line-by-line, word-by-word character recognition, so it’s useful if you have a program that first breaks up a page in graphical elements and simple text blocks.

In that category, OCRopus (Wikipedia) seems very promising, but it’s still in the alpha stage of development. Maybe that’s why it isn’t in Portage yet. Either way, that means I’m not going to try it. Before the 0.4 release, OCRopus supported only Tesseract as a character recognition option, but now Tesseract has been replaced by their own system, although it’s still supported as a plug-in.

gscan2pdf is a GUI program that seems to be meant to pull many of these tools together, although it doesn’t seem as if it can break down page lay-outs into separate text blocks. I’ll have to try it out to better judge this, though. First, I want to return to the command-line.

In my case, I’m scanning the news letters using a Xerox WorkCentre 7232. This machine has a network scanning feature that creates PDFs by default. It can also create (multi-page) TIFFs, which saves me one conversion step, and I’m glad it does because I yet have to find out how to convert PDF to TIFF with ImageMagick without losing too much image detail to be be able to blame Tesseract for not producing anything useful.

Quite early on, I noticed that Tesseract supports multi-page TIFFs. This is cool. I was less enthusiastic to discover that it doesn’t support the MMR compression used by the Xerox machine (even though I’ve compiled it with the tiff use-flag enabled in Gentoo). Luckily, a simple convert a.tif b.tif seems to produce a b.tif without border-some compression schemes.

Then, all of a sudden, the Xerox would start delivering scans with the wrong rotation. I fixed this again with ImageMagick:

convert Document005.TIF -transpose teakwood-info-uncompressed-rotated.tif

Now, surely, I would get some kind of result.

tesseract teakwood-info-uncompressed-rotated.tif result -l nld
head result.txt
]AV.LQHVBEKODWB'A' ];I`OI5ABVfll/|V2'V` D2'.LVI`V\IV2IM(2EF=tI°ö52I EB BEKCOIN ·
 
$UIDp9DKGD°
_ GIDqbLOq¤K;GD SOSI2 AIOGLGD‘ KOSIJDGU‘ I¤IKGU‘ bGLöOIS\2 GD
. bLSCD;Iä DI; GD GL IB AGGI pGISDä2;GIIIDö AOOL qG
AGLMGLKGD° DI; $0­JSLIö‘ ID MGqGLISDq äGqLOOäq‘ DON; SIG; GL
ssuqqud pong ­Agu sou suqsns bjsu;sds­ sqqu ms gsm ps;
MGQGLIQDQ ;G IS;GD MGDUGD SSD CO2;SLIGSSU2 ;GSKDO¤;' DG GGL2;G
§OSI2 MG H SI GGLqGL wGIqqGD SIJD MG pGSIö Ow qG NSLK; IU
MOLqGD'

Ok. Maybe not… At this point (ignoring all the other side-tracks), I noticed that although Gliv showed the image with the proper rotation, when importing a page from the mTIFF in the GIMP, it would show the image upside down. Then I realized that I was using -transpose just to please Gliv. Gliv simply doesn’t read the endian-ness of the file right! Instead of fixing my rotation problem convert -transpose actually made it worse! From Wikipedia: ‘Every TIFF begins with a 2-byte indicator of byte order: “II” for little endian and “MM” for big endian byte ordering.’ What I should have done is a convert -rotate 270.

This was starting to look a lot better, and I hadn’t even removed any borders or headers:

A'
E A K W C) O D M @
Bergum, november 1991
Geachte bosbouwer,
· Een maand eerder dan beloofd sturen we u een nieuwe Teakwood _
Info. Ik was van 14 september tot en met 12 oktober weer op
onze plantages in Costa Rica en heb geconstateerd dat onze
bomen er goed bij staan. Op sommige heuvels blijft de groei
ietsje achter, maar door extra voeding (bemesting) te geven,
_ trekken we dat bij.
Teakwood IV is inmiddels nageplant. Dat wil zeggen dat we de
stekken die niet wilden aanslaan, hebben vervangen door
nieuwe. Teakwood II en III zijn al voor de tweede keer
sx nageplant en doen het uitstekend. Deze keer stuur ik u nog
ïïw eens een foto van Teakwood I, vanuit hetzelfde standpunt als
de vorige foto van jongstleden juni, bij nummerpaal 1. "

(Tesseract processed two pages, by the way, but tried to convince me in its CLI output that it had only processed one.)

Now, I want to see what the program does if I give it a cleaner image, without scanning artifacts. I would like to use unpaper for this, but it’s masked in Portage, so for now I’ll use the GIMP to make a single-page TIFF, cropped from the original image. (When creating the new image in the GIMP, I had to change the image mode to be indexed, 1 bit black and white, and remove the alpha channel.)

Now, I was getting a better result:

Bergum, november 1991
Geachte bosbouwer,
Een maand eerder dan beloofd sturen we u een nieuwe Teakwood
Info. Ik was van 14 september tot en met 12 oktober weer op
onze plantages in Costa Rica en heb geconstateerd dat onze
bomen er goed bij staan. Op sommige heuvels blijft de groei
ietsje achter, maar door extra voeding (bemesting) te geven,
trekken we dat bij.
Teakwood IV is inmiddels nageplant. Dat wil zeggen dat we de
stekken die niet wilden aanslaan, hebben vervangen door
nieuwe. Teakwood II en III zijn al voor de tweede keer
nageplant en doen het uitstekend. Deze keer stuur ik u nog
eens een foto van Teakwood I, vanuit hetzelfde standpunt als
de vorige foto van jongstleden juni, bij nummerpaal 1. "

My conclusion is that users of open source OCR software must suffer. I’m not going to clean up this post to make it more useful for people who want to do the same as I did, because you shouldn’t want to do the same. You should simply go out and buy or pirate some proprietary piece of OCR software. Really, you should.

Now, I want a massage; my shoulders are stiff.

Structure and Interpretation of Computer Programs

Structure and Interpretation of Computer Programs is on-line free MIT text-book that uses Scheme in an attempt to give the reader a general and practical understanding of programming.

Not having read the book myself, I’m not sure if the goal of its authors was successful but it is neatly summarized in the Preface:

Our design of this introductory computer-science subject reflects two major concerns. First, we want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute. Second, we believe that the essential material to be addressed by a subject at this level is not the syntax of particular programming-language constructs, nor clever algorithms for computing particular functions efficiently, nor even the mathematical analysis of algorithms and the foundations of computing, but rather the techniques used to control the intellectual complexity of large software systems.

Our goal is that students who complete this subject should have a good feel for the elements of style and the aesthetics of programming. They should have command of the major techniques for controlling complexity in a large system. They should be capable of reading a 50-page-long program, if it is written in an exemplary style. They should know what not to read, and what they need not understand at any moment. They should feel secure about modifying a program, retaining the spirit and style of the original author.

Maybe this is worth a read?

Matriux, a penetration testing and security analysis LiveCD

Last December, someone pointed me to Matriux. In their own words:

It is a fully featured security distribution consisting of a bunch of powerful, open source and free tools that can be used for various purposes including, but not limited to, penetration testing, ethical hacking, system and network administration, cyber forensics investigations, security testing, vulnerability analysis, and much more. It is a distribution designed for security enthusiasts and professionals, although it can be used normally as your default desktop system.

It comes with a wide arsenal of free software tools to do naughty things to your network. I think I should give it a swing and download it some time.

Monitor the progress of Unix commands with Pipe Viewer (pv)

I just stumbled across the following post while trying to find out how to copy text from VIM using XSel without losing the selected text. It introduces Pipe Viewer, a Unix utility which is a kind of cat with a progress bar.

I emerged it (it’s in Gentoo (Debian too)) and it works very simple, but allows you to do cooler, more complicated things.

# pv emerge.log |gzip >emerge.log.gz
1.24MB 0:00:00 [1.76MB/s] [================>] 100%  
$ pv -cN source access_log | gzip | pv -cN gzip > access_log.gz
   source: 28.7MB 0:00:00 [32.2MB/s] [=====>] 100%            
     gzip: 2.27MB 0:00:00 [2.54MB/s] [ <=>  ]

The first example is easy enough to understand when you mentally substitute pv with cat. The second example is much cooler. It uses the -N flag to make named groups and the -c flag to make sure that the output for these groups doesn’t get garbled.

Read Peteris Krumins’ article for more cool uses of Pipe Viewer.

XSel, for command-line operations on X selections

Since I first learned that Windowmaker installs two command-line tools, wxcopy and wxpaste, to play around with X selections, I have wanted to be able to make and use X selections from my Bash shell. wxcopy and wxpaste never did what I expected them to do, so I gave up until recently I learned about all the different X selections.

By default, wxcopy and wxpaste operate on the CUT_BUFFER[n] selections. These are deprecated. That’s why I could never make it work, because modern applications use only CLIPBOARD and SELECTION. So, wxcopy is pretty useless (unless its used to copy something to paste with wxpaste). With this knowledge wxcopy does seem useful thanks to its -selection [selection-name] flag, but this doesn’t seem to work; I only get the contents of CUT_BUFFER. This is not how the feature is advertised:

-selection [selection-name]
The data will be copied from the named selection. If cutting from the selection fails, the cutbuffer will be used. The default value for the selection name is PRIMARY.

Enter XSel

Fortunately, there’s XSel by Conrad Parker, a program which made him passionately hate the ICCCM.

XSel does exactly what it advertises. I’m actually surprised that I never heard of it before. It’s available in Gentoo, Debian and Ubuntu, so it’s a breeze to install.

Among its features are: --append, --follow, --clear, --delete (very weird, but logical if you understand X IPC), --primary, --secondary, --clipboard, --keep, and --exchange. Read the man page for more. It’s an excellent read.

One of the places where I’m going to use this tool is when copy-pasting to and from VIM. I really like how this compares to using :insert or :r!cat</dev/tty and then using the pointer to paste (or (Shift+)Insert with my custom XTerm config). Now, to paste something in VIM, I can simply type:

:r!xsel

I use the following to copy any amount of text from VIM. This works much better than fooling around with the mouse:

:'>,'> !tee >(xsel -i)

The '>,'> range is entered automatically if you press : while in visual (selection) mode. You could enter any range there, or even % to select the whole file. To copy to the CLIPBOARD instead of the PRIMARY, use xsel -i -b in the above example.

If someone know of a way to make VIM pipe something to a program without replacing the given range with that program’s output, I could simplify this…

« Older posts Newer posts »

© 2024 BigSmoke

Theme by Anders NorenUp ↑