BigSmoke

Smokes your problems, coughs fresh air.

Page 50 of 52

Crackdown on my .US infiltration attempt

Wether it has something to do with the current Terror Alert level or with a renewed surge of isolationism I don’t know, but my foreign ass no longer seems to be welcome below the Dot-US TLD. Never mind that almost all of my visitors are American. Or that my dot-US websites are hosted at US-based NearlyFreeSpeech. Or are my ties to the states sufficient that I just need to deliver the proof?

www.bigsmoke.us - awstats - Visitors by country

So, what happened? Yesterday, I got a mail from .US Nexus, forwarded by GoDaddy. It wasn’t the worst that GoDaddy billed me $9.95 for … forwarding a mail to me. What was bad was the mail that they forwarded:

From: “cctldhelp@godaddy.com” <cctldhelp@godaddy.com>
To: bigsmoke@gmail.com
Date: Jul 19, 2007 5:16 PM
Subject: [FWD: {Registry#542-209} .US NEXUS COMPLIANCE BIGSMOKE.US]

Dear Rowan Rodrik van der Molen,

Please see the Nexus Compliance Notice below from Neustar.

Regards,

Domain Services

Subject: {Registry#542-209} .US NEXUS COMPLIANCE BIGSMOKE.US From: “.US Nexus” <nexus-compliance@neustar.us>
Date: Wed, July 18, 2007 3:46 pm
To: “cctldhelp@godaddy.com” <cctldhelp@godaddy.com>

Dear Go Daddy,

Please send the following verbiage to your customer.

Thank-You
Andrea
Neustar Registries

Dear Rowan,

As you may be aware, in November 2001, the United States Department of Commerce (“DOC”) selected NeuStar, Inc. (“NeuStar”) to be the Administrator of the .US top-level domain (“usTLD”), the official top-level domain for the United States of America. As Administrator of the usTLD, NeuStar has agreed to perform random “spot checks” on registrations in the usTLD to endure that they comply with the usTLD Nexus Requirements which can be found at http://www.neustar.us/policies/docs/ustld_nexus_requirements.pdf (“Nexus Requirements”).

Our records indicate that you are the registrant of the domain name BIGSMOKE.US.

On July 18, 2007, this domain name was selected for Nexus revalidation and confirmation. According to the information you provided with your registration of these Domain Names, you indicated that you qualify under:

Category 1 – You are a US citizen or permanent resident

As part of our verification process, we ask that you provide to us by no later than ten (10) days after the date set forth above, a written response describing how you qualify under the above Nexus category.

In addition, please verify that the name-servers that you have selected to use are also physically located within the United States as required by the Nexus Requirements.

In some instances, we may request additional documentary evidence from you to demonstrate that you meet the Nexus requirements.

You should be aware that if you either (i) do not respond within the ten (10) days, or (ii) are unable to adequately explain or demonstrate through documentary evidence that you meet any of the Nexus Requirements, NeuStar may issue a finding that your entity or organization has failed to meet the Nexus Requirements. Upon such a finding, you will then be given a total of ten (10) days to cure the US Nexus deficiency. If you are able to demonstrate within ten (10) days that your entity or organization has remedied such deficiency, you will be allowed to keep the domain name. If, however, you either (i) do not respond within the ten (10) days of such a finding of noncompliance, or (ii) are unable to proffer evidence demonstration compliance with the Nexus Requirements, the domain name registration will be deleted from the registry database without refund, and the domain name will be placed into the list of available domain names.

Thank you for your cooperation in this matter. Please let us know if you have any questions.

Kind Regards
Andrea
.US Customer Support
___________________________________________

NeuStar
.US America’s Internet Address

Email: support.us@neustar.us
Address: Loudoun Tech Center
46000 Center Oak Plaza
Sterling, VA 20166 USA
Web Site: www.neustar.us
___________________________________________

This transmission (the e-mail and all attachments) is confidential and intended solely for the use of the addressee(s). If you have received this transmission in error, please notify the sender by reply and delete this transmission immediately. Any unauthorized distribution, or copying of this transmission, or misuse or wrongful disclosure of information contained in it, is strictly prohibited. The information contained in this document is provided on an as-is basis and does not constitute a binding legal contract or receipt for services. While this information is believed to be substantially correct, it is not intended to be substituted for appropriate legal counsel.

If you have any questions related to intellectual property rights, copyrights, service marks, whether in common use or legally registered, please contact your legal counsel. No statement made, printed, or otherwise disseminated by NeuStar or any of its employees, contractors, sub-contractors, web site, or interactive voice response system should be considered in any way legal or other advice.

I was left a little confused and hoped that, maybe, Andrea could shed some light on my ignorance.

From: Rowan Rodrik van der Molen <rowan@bigsmoke.us>
To: nexus-compliance@neustar.us
Date: Jul 19, 2007 6:40 PM
Subject: Re: [FWD: {Registry#542-209} .US NEXUS COMPLIANCE BIGSMOKE.US]

Dear Andrea,

When registering my bigsmoke.us domain, I actually did so because I qualify according to Category 3, not Category 1. I qualify because my .US websites are hosted at a US hosting provider (NearlyFreeSpeech) and donations to my website are processed by a US company (Paypal). Also, advertisements are served by Google inc.

Most of my visitors are US residents because my websites are targeted at an American audience. (Detailed statistics about this can be obtained from http://www.bigsmoke.us/awstats.cgi) I’d like to note that my website is a valuable resource to many American web developers, database developers and system administrators. Because most of my visitors are American, it would be Americans which would be harmed most if I where to loose my dot-us domain.

As can be inferred from the Whois info, the nameservers for my domain are located at the same US hosting company as where my .us websites are hosted.

If you require any additional information, I’d be more than willing to send it to you. I wouldn’t have registered this domain if I hadn’t been convinced of the legality of such an action.

Thank you for your time,
Rowan

Today, I got a friendly reply from Andrea:

From: “.US Nexus” <nexus-compliance@neustar.us>
To: Rowan Rodrik van der Molen <rowan@bigsmoke.us>
Cc: “cctldhelp@godaddy.com” <cctldhelp@godaddy.com>
Date: Jul 20, 2007 6:31 AM
Subject: RE: {Registry#542-209} .US NEXUS COMPLIANCE BIGSMOKE.US

Rowan,

Your domain information in WHOIS shows you are a Category 1. That would indicate that you are a United States citizen. You will need to provide your current US drivers license to prove how you meet the .US Nexus guideline.

If you are doing legitimate business within the United States you will need to correct your domain information to reflect the .US WHOIS.

Below are two categories of which you may fall into.

C31: A foreign entity or organization that has a bona fide presence in the United States of America or any of its insular areas who regularly engages in lawful activities (e.g., sales of goods or services or other business, commercial or non-commercial, including not-for-profit relations in the United States).

C32: Entity has an office or other facility in the United States

If you claim C31, you will need to provide to us documentation in the form of a certificate of corporation or the ability to provide not only the sales of goods but to prove those sales are with United States residents/companies.

If you claim C32, you will need to provide documentation that proves you have and office or facility in the United States.

The information that you have provided in your e-mail is not sufficient enough to prove you meet the Nexus requirements.

Kind Regards,
Andrea
.US Customer Support
___________________________________________

NeuStar
.US America’s Internet Address

Email: support.us@neustar.us
Address: Loudoun Tech Center
46000 Center Oak Plaza
Sterling, VA 20166 USA
Web Site: www.neustar.us
___________________________________________

[The same interesting legalese as in the previous mail from .US Nexus …]

All good and well, but all I can extract from this communication is that I need to change the category at GoDaddy. I still don’t understand if I’m eligible to have an dot-US domain (which I recently extended (with US dollars), by the way). Based on the usTLD Nexus Requirements I’d assume that I qualify for a dot-US domain under Category 3, A foreign entity or organization that has a bona fide presence in the United States of America or any of its possessions or territories. In full, the requirements for Category 3 are as follows:

Nexus Category 3

A foreign entity or organization that has a bona fide presence in the United States of America or any of its possessions or territories.

  • Applicant must state country of citizenship.
  • Applicant must also (1) regularly engage in lawful activities (sales of goods or services or other business, commercial or non-commercial including not-for-profit activities) in the United States; or (2) maintain an office or other property within the United States.

Category 3 Nexus Certification

Prospective Registrants will certify compliance with Category 3 Nexus based upon substantial lawful contacts with, or lawful activities in, the United States.

Factors that should be considered in determining whether an entity or organization has a bona fide presence in the United States shall include, without limitation, whether such prospective usTLD domain name Registrant:

  • Regularly performs lawful activities within the United States related to the purposes for which the entity or organization is constituted (e.g., selling goods or providing services to customers, conducting regular training activities, attending conferences), provided such activities are not conducted solely or primarily to permit it to register for a usTLD domain name and are lawful under the laws and regulations of the United States and satisfy policies for the usTLD, including policies approved and/or mandated by the DoC;
  • Maintains an office or other facility in the United States for a lawful business, noncommercial, educational or governmental purpose, and not solely or primarily to permit it to register for a usTLD domain name.

Apart from the fact that these days The Netherlands can be considered American territory, you’d think I neatly fit the requirements for C31, since I perform the following lawful activities in the United States:

  • I regularly pay my US hosting provider, NearlyFreeSpeech.Net, US dollars to host my US website.
  • I pay my US domain registar (Wild West Hosting / Go Daddy) in US dollars for my domain.
  • These and other services are paid for using Paypal, which, last time I checked, was still a US company.
  • Advertisements on my regular website are served by Google, which, also, is a US company. This also means I get income from … a US company.
  • Almost all my visitors are American as I write for an English speaking audience.

I’m not sure if any of this is lawful. Perhaps, being active in America in any other way than singing the national anthem and waving a flag is illegal these days. But, I’d say that an English resource which is heavily linked to and visited by thousands (mostly Americans) should somehow be able to fit these requirements. After all, how are the interests of the American people served if a .US website is taken off-line because it’s run by a foreigner from overseas? Are my American visitors supposed to be happy if their links stop working and the top search results for some of their searches suddenly disappear?

I guess that’s not the point and I’m hoping that one of my visitors can help me figure out what I should send to Andrea to make her happy to let me keep the domain for which I’ve paid good USD.

Native PostgreSQL authentication in Rails with rails-psql-auth

A while ago, I wrote a PostgreSQL auth plugin for Rails. The plugin basically defers all authentication and authorization worries to the database layer where they are supposed to be taken care of anyway.

Using this plugin, the user is asked for his or her credentials using a HTTP Basic authentication challenge. (The code for this is adapted from Coda Hale‘s Basic HTTP authentication plugin.) It’s possible to specify a guest_username in the database.yml which will be used as a fall-back if no credentials are supplied. After successful login or if a guest user is found, the plugin will make sure that all database operations run as that user. If any operation fails due to insufficient user rights, the user will be prompted for a username/password pair again.

Detailed and up-to-date documentation for the plugin can always be found at the plugin’s homepage. Go to the plugin’s project page for getting help or for reporting issues with the plugin.

Efficient scanning and storing of documents

I don’t like having an administration in dead-tree format, but there are those who insist on sending you all kinds of things in this format. To make this data easier to access and back up, I scan it to a digital format. I used to do this manually, with the GIMP, but I decided it was time for some automation. Therefore I wrote a script, which scans whatever you put under the lid of the scanner in lineart mode, and stores it very efficiently in a DjVu DjVuBitonal document. And here it is:

#! /bin/bash
# Author: halfgaar
 
# Prevent attacker from placing unholy replacements of system commands in your
# working path.
PATH="/usr/bin:/bin:/usr/local/bin"
 
# User settings
RESOLUTION="400"
SCANNER_DEVICE="plustek"
 
OUTPUT_FILE_BASENAME=$1
OUTPUT_FILE="$OUTPUT_FILE_BASENAME.djvu"
TEMP_FILE="/tmp/halfgaars_scanned_image"
 
[ ! -n "$OUTPUT_FILE_BASENAME" ] && "No filename given" && 1
 [ -e "$TEMP_FILE" ];
  "Temp file $TEMP_FILE already exists. We don't want to create a symlink vulnerability here..."
  1
 [ -e "$OUTPUT_FILE" ];
  "$OUTPUT_FILE already exists."
  1
 
# page dimensions are in mm
scanimage -d $SCANNER_DEVICE -x 210 -y 297 --mode lineart --resolution $RESOLUTION > $TEMP_FILE
cjb2 -dpi $RESOLUTION $TEMP_FILE $OUTPUT_FILE
 
rm $TEMP_FILE

Simple, but effective. I may extend it in the future to also be able to scan into a DjVuDocument file (a file containing both DjVuBitonal and DjVuPhoto segments), but for now, this serves.

Microsoft batch file meets bash shellscript

Luca City, who already shared a nice readline keyboard shortcut with me, wrote me again on May 14 to share another unrelated, but very interesting trick:

Hi Rowan,
as you are interested in tricks and curiosities, I send you a thing.
I wanted a script to be runnable from both windows and linux and I found out a way to do it. Generally you can have two different files, one for each OS, but I started with this goal in mind and then it became a challenge. After trying a bit, playing with the strangest tricks of the two batch languages (bat and bash), I ended up with this solution. Actually it is not so useful πŸ™‚ but anyway…

Well Luca, regardless of the usefulness of your script, I happen to think that it’s pure genius, so I’m going to share it here:

off ; +v # > NUL
; GOTO { true; } # > NUL
 
GOTO WIN
# bash part, replace it to suit your needs
0
 
:WIN
REM win part, replace it to suit your needs

Give the script a .bat extension for Windows and set the executable bit(s) for Unix.

Thanks, Luca, for sharing another nice trick with us.

Allowing dots in WordPress post slugs

I was once again annoyed by the fact that WordPress doesn’t allow dots in post slugs. Luckily, this time I hadn’t published the post with a botched URL yet. (I don’t like changing permalinks because they’re meant to be permanent; cool URLs don’t change.) A quick googling pointed me to a post in the WordPress support forum with a reference to the Periods in Titles WordPress plugin.

The plugin works great and allowed me to post http:///2007/05/30/jeroen-dekker.com with dots and without problems.

Jeroen Dekker (photography) on-line

Jeroen Dekker, a friend and photographer, has recently, on May the 5th, put his website on concert photography on-line. (Go check it out! He has some great pictures there.)

I was very flattered when I was asked by Jeroen to give some SEO advice in the test stage of his website. I was even happier when I saw how well he had implemented my suggestions. In his concert photography section, he now has links consisting of the event name and the band name and the number of the photo. An example URL: http://jeroen-dekker.com/concerts/noordschok-2007/prey-band/1/. Also his page titles follow the same structure. As is often the case with SEO, the best results are acquired by remembering that good URLs are URLs which are cool enough that you won’t want to change the in the future and that good titles are titles which look good anywhere, be it in a bookmark or a search result.

Jeroen Dekker concert photography Jeroen Dekker news

I also noticed that, following some evangelizing on semantics and CSS from me, he had greatly cleaned up the HTML markup. Some pages could still profit from some bettermore pedantic markup though. An example from the news section (cleaned up for readability):

<p> The following bands played:<br>
 - <a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/eluveitie-band/">Eluveitie</a><br>
 - <a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/thy-majestie-band/">Thy  Majestie</a><br>
 - <a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/drottnar-band/">Drottnar</a><br>
 
 - <a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/whispering-gallery-band/">Whispering Gallery</a><br>
</p>

In my opinion, the above is a very awkward way to define what is really an unordered list:

<p>The following bands played:</p>
<ul>
  <li><a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/eluveitie-band/">Eluveitie</a></li>
  <li><a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/thy-majestie-band/">Thy  Majestie</a></li>
  <li><a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/drottnar-band/">Drottnar</a></li>
  <li><a href="http://jeroen-dekker.com/concerts/fear-dark-festival-hedon-12-mei-2007/whispering-gallery-band/">Whispering Gallery</a></li>
</ul>

Finally, a nice touch that I noticed on his site is that he doesn’t have explicit pagination. By this I mean that clicking on the page 2 link simply takes you to the first photo on that page, so that he needs only an URL for each photo and not an URL for each page or even photoset.

Another contributed Readline keyboard shortcut

Last Wednesday, I was given a very nice response with a great tip to my table of Readline keyboard shortcuts by Luca City. Yesterday, Lance Levine gave me another extremely nice response and another great tip:

Just wanted to say appreciate the nice readline cheatsheet. There were a couple I never knew (the ctrl-alt-asterisk is gonna be a real time saver) and I never knew about ctrl-G or ctrl-J to end incremental searches either.

One that might be worth knowing for a lot of people if you ever make updates, would be the ctrl-x-x cmd. which takes you to the beginning of the line (and then back again if you hit it again). I enjoy working in screen, and the default ctrl-a escapes you from readline when you’re in a screen session so I never use it lest get confused.

Best Regards,
Lance Levine

Well, Lance, I’m an avid GNU screen user myself, so your tip is very useful to me! I’ve added it to the table to ease the suffering of our fellow GNU screen users. πŸ™‚

Indeed I did, but I found it difficult to come up with a concise and clear description of the shortcut. So difficult, in fact, that I didn’t succeed at it:

Ctrl+x+x readline keyboard shortcut with ugly description

So, what does the Readline user manual have to say that may help me with a description?

exchange-point-and-mark (C-x C-x)
Swap the point with the mark. The current cursor position is set to the saved position, and the old cursor position is saved as the mark.

While typing, the mark normally is at the beginning of the line. Pressing Ctrl-x-x will move the cursor to the mark and set the mark to the old cursor position. If you now move the cursor and press Ctrl-x-x again, the mark won’t be at the beginning of the line but at place where you moved the cursor to. This means that the Ctrl-x-x shortcut is more than just a way to move back and forth between the beginning and ending of a line.

Another goody worth mentioning is the Ctrl-@ shortcut which will simply set the mark at the current cursor position or at the position specified by a numeric argument.

Now, I just need to think of a way to integrate these two Readline command bindings into the table without the descriptions taking up as many lines as this blog post. πŸ˜• Any bright ideas, anyone?

Bypassing smart completion in Bash

Luca Citi, a nice Italian Ubuntu user, just gave me an excellent tip in response to my list of Readline keyboard shortcuts. Modern Linux distributions such as Ubuntu and Gentoo can easily be configured for Bash to use smart completion. With smart completion enabled, instead of just looking among all the available files and directories without discrimination, TAB will be able to more accurately adjust its list of available completions depending on the program for which arguments are being sought.

An example of smart completion is that completions for the cd command will only include actual directories and no longer any regular files. Luca gave me another good example: completions for the kpdf command will only include files with the .pdf extension.

Myself, I’ve been bitten by smart completion a few times because I’d want to complete a command argument towards a filename which wasn’t supported by the smart completion rules. Luca gave me his typical example of such a case: his smart completion configuration includes only entries from the fstab as valid mount points for the mount command. But, what if you want to do an ad-hoc mount? Will you just have to type out the full mount point without auto completion? That’s what I used to think before Luca told me about the Alt+/ shortcut. In Bash, this shortcut will act as the TAB-key would without smart completion enabled.

Of course, I’ve updated my list of Readline keyboard shortcuts to include Alt+/.Thanks Luca! πŸ™‚

Web scraping in Ruby: why I had to use scrAPI instead of WWW::Mechanize and Hpricot

Thursday evening: so, I had written myself a nice little script using Aaron Patterson’s WWW::Mechanize and why’s Hpricot to extract some data from a popular web-based airport directory.

Hpricot logo

I was warmed up for Hpricot by the promise of XPath and CSS selector support (and a very cool logo, of course). As a long time XPath user, I started banging out some crispy XPath expressions until I realized that XPath support was only very partial. I kept on trying expressions that would work, even bowing down to expressions that, according to the Wiki, would work, but differently. Come on guys, either support a standard or just plainly ignore it, please! 😑 Because I couldn’t figure out how I’d have to integrate why’s fork of the XPath spec in my expressions, I decided to stick with why’s fork of the CSS selectors instead.

Then, it became time to execute my code. I had estimated that it would take about two hours to finish downloading and parsing the approximately 10.000 pages which contained the data in which I was interested. So, I executed my script, detached my screen session and went to bed, trusting that I would find a nice, handy CSV file in the morning.

Friday morning, I was disappointed to find that my script had been killed. I was left wondering what could have killed the script. I decided to restart the script at the countries starting with the letter b (it had died somewhere halfway the list of countries starting with a b). Soon the script was happily appending data again to the existing CSV file.

Disclaimer: why is a much more prolific Ruby coder than I’ll ever be, so please take my comments with a grain of salt. No, actually, rather take them with a few spoonfuls of salt.

Later, I talked about the spontaneous death of the script with Wiebe. Curious, he looked at the memory usage of my script and saw that it was happily munching away hundreds of megs of memory on our server. And memory usage was growing! With crucial server processes at the risk of running out of memory and with me having to build a circumference around the vegetable garden to protect it from a bunch of brawling chickens, Wiebe was friendly enough to drop in and take a look at my spaghetti code to see if he could fix the leak. He couldn’t, because the leak didn’t appear to be in my code. I wasn’t the first to be bugged by a leak in Hpricot.

That news didn’t make me very happy, because it implied I had to redo the script using different tools. I knew that WWW::Mechanize had been inspired by the Perl package by the same name, so I started by looking at that. After installing WWW::Mechanize, I explored CPAN’s WWW namespace a bit further and noticed that the Perl crowd also had two other good scrapers at their fingertips: WWW::Extractor and WWW::Scraper. Once again I was reminded that Perl, despite its funky syntax, is still the king of all scripting languages when it comes to the availability of quality modules. πŸ™ After a few deep breaths, I set my rusty Perl skill into (slow)motion. Hell, this was supposed to be a quick script. Why was this taking so much time? (Yeah, yeah; cue all the jokes about developer incompetence. πŸ˜• )

I was almost stamped by a horde of camels, each with a name more syntactically confusing than the other. Just before I was crushed, I came across a reference to a Ruby scraper with decent support for CSS3 selectors: scrAPI. Credits for this discovery go to the documentors of scRUBYt, a featurefull scraper layered on top of WWW::Mechanize. The documentation writers of scRUBYt where friendly enough to help their users by including a link to the competition.

It took me some time to rewrite the script using scrAPI, partially because it was hard to find any documentation that was more comprehensive than a few blog posts and a cheat sheet and less of a hassle than reading the source. But, when Assaf answered my need by pointing me to the online API docs, I was happy.

Another reason why it was hard to migrate from WWW::Mechanize/Hpricot to scrAPI was that Hpricot starts element offsets for XPath predicates and CSS selectors at zero instead of one where they should start. And of course, I had to rid myself of the weird breed between CSS and XPath selectors.

I was surprised that the script using scrAPI ran about twice as fast as the Hpricot-based script. This was including a cumulative sleep() time between each request of almost an hour, because the speed during testing made me worry about over-exerting their web server. Knowing that one of the popular features of Hpricot is its speed, this was very unexpected, although I have to admit that Hpricot did fill my memory very quickly.

« Older posts Newer posts »

© 2026 BigSmoke

Theme by Anders NorenUp ↑