Smokes your problems, coughs fresh air.

Author: Rowan Rodrik (Page 9 of 27)

Rowan is mainly a writer. This blog here is a dumping ground for miscellaneous stuff that he just needs to get out of his head. He is way more passionate about the subjects he writes about on Sapiens Habitat: the connections between humans, each other, and to nature, including their human nature.

If you are dreaming of a holiday in the forests of Drenthe (the Netherlands), look no further than “De Schuilplaats”: a beautiful vacation home, around which Rowan maintains a magnificent ecological garden and a private heather field, brimming with biological diversity.

FlashMQ is a business that offers managed MQTT hosting and other services that Rowan co-founded with Jeroen and Wiebe.

Shrinking/compressing a MediaWiki database

As of late, I haven’t had a lot of time to chase after spammers, so – despite of anti-spam captchas and everything – a couple of my wikis have been overgrowing with spam. One after the other I’ve been closing them down to anonymous edits, even closing down user registration alltogether, but some a little too late.

The last couple of months my hosting expenses shot through the roof, because my Timber Investments Wiki database kept expanding to well over 14 GiB. So I kind of went into panic mode and I even made time for another one of my famous spam crackdowns—the first in many, many months.

The awfully inefficient bulk deletion of spam users

Most of this latest tsunami of spam was in the form of “fake” user pages filled with bullshit and links. The only process that I could think of to get rid of it was quite cumbersome. First, I made a special category for all legitimite users. From that I created a simple text file (“realusers.txt”) with one user page name per line.

Then, I used Special:AllPages to get a list of everything in the User namespace. After struggling through all the paginated horror, I finally found myself with another copy-pasted text file (“unfilteredusers.txt”) that I could filter:

cp unfilteredusers.txt todelete.txt
cat realusers.txt | u
  sed -i -e "/$u/d" todelete.txt

(I’d like to know how I could have done this with less code, by the way.)

This filtered list, I fed to the deleteBatch.php maintenance script:

php maintenance/deleteBatch.php -u BigSmoke -r spam todelete.txt

By itself, this would only increase the size of MW’s history, so, as a last step, I used deleteArchivedRevisions.php to delete the full revision history of all deleted pages.

This work-flow sucked so bad that I missed thousands of pages (I had to copy-paste this listings by hand, as I mentioned earlier above), and had to redo it again. This time, the mw_text table size shrunk from 11.5 GiB to about 10 GiB. Not enough. Even the complete DB dump was still way over 5 Gig [not to mention the process size which remained stuck at around 15 GiB, something which I woudn’t be able to solve even with the configuration setttings mentioned after this].

Enter $wgCompressRevisions and compressOld.php

The huge size of mw_text was at long last easily resolved by a MW setting that I had never heard about before: $wgCompressRevisions. Setting that, followed by an invocation of the compressOld.php maintenance script took the mw_text table size down all the way from >10 GiB to a measly few MiB:

php maintenance/storage/compressOld.php

SELECT table_schema 'DB name', sum(data_length + index_length) / 1024 / 1024 "DB size in MiB"
FROM information_schema.TABLES
WHERE table_schema LIKE 'hard%'
GROUP BY table_schema;

+----------+----------------+
| DB name  | DB size in MiB |
+----------+----------------+
| hardhout |    41.88052750 | 
| hardwood |   489.10618973 | 
+----------+----------------+

But it didn’t really, because of sweet, good, decent, old MySQL. 🙁 After all this action, the DB process was still huge (still ~15 GiB). This far exceeded the combined reported database sizes. Apparently, MySQL’s InnoDB engine is much like our economy. It only allows growth and if you want it to shrink, you have to stop it first, delete everything and then restart and reload.

Future plans? Controlled access only?

One day I may reopen some wikis to new users with a combination of ConfirmAccount and RevisionDelete and such, but combatting spam versus giving up on the whole wiki principle is a topic for some other day.

How to test payformystay.com

I haven’t got much experience when it comes to testing web applications. Instead (and more so out of apathy than belief), I’ve always adhered to the ad-hoc test approach. However, the usage of pure Posgres unit tests back when I worked on a complicated investment database with Halfgaar did teach me the advantages of test-driven development.

For payformystay, though, unit tests simply won’t cut it. The database design is quite straight-forward with not that many relationships and the schema’s only complexities arise from it being remarkably denormalized and full of duplication. Besides and contrary to mine and Halfgaar’s PostgreSQL project for Sicirec, the business logic doesn’t live all neatly and contained on the database level. And I’m not using a clean ORM wrapper either, which I could use as a unit test target. And what would be the point, since in typical MySQL/PHP fashion it would be much too easy to circumvent for a particular function.

What I want for this application is full functional test coverage so that I know that all parts of the website function correctly in different browser versions across operating systems. In other words: I want to know that the various parts are functioning correctly as implied by the fact that the whole is functioning correctly.

But how do you do automated tests from a browser?

At first, I thought I should probably cobble something together myself with jQuery, maybe even using a plugin such as Qunit with the composite addon.

But how was I going to run the tests for JavaScript independence then? Using curl/wget or one of these hip, headless browsers which seem to be bred for this purpose?

Choises, choises…

Selenium

Then, there’s Selenium which is a pretty comprehensive set of test tools, meant precisely for what I need. Sadly my wants weren’t easily aligned with my needs. Hence, it took me some time (months, actually) before I was sure that Selenium was right for me.

Selenium provides the WebDriver API—implemented in a number programming languages—that lets you steer all popular browsers either through the standalone Selenium Server or Selenium Grid. The server executes and controls the browser. Since Selenium 2, it doesn’t even need a JavaScript injection in the browser to do this, which is very interesting for my tests related to my desire to make my AJAX-heavy toy also available to browsers with JavaScript disabled for whatever reason.

Selenium versus my pipe dream

Selenium IDE is a Firefox extension which lets you develop Selenium scripts by recording your interactions with the browser. It stores its script in “Selenese”. This held quite some appeal to me, because my initial testing fantasy revolved around doing it all “client-side”, in the sense that I wouldn’t have to leave my browser to do the testing. I wanted to be able to steer any browser on any machine that I happened to stumble upon at my test site and fire those tests.

Well, Selenese can be interpreted by some WebDriver API implementations to remotely steer the browser, but it can’t be executed from within the browser, except by loading it into the Selenium IDE, which is a Firefox-only extension. Also, driving the browser through JavaScript has been abandoned by Selenium with the move away from Selenium-RC to WebDriver (which they’re currently trying to push through the W3C standardization process).

With everyone moving away from my precious pipe-dream, I remained clinging to some home-grown jQuery fantasy. But, how was I going to test my JavaScript-free version? Questions.

I had to eventually replace my pipe dream with the question of which WebDriver implementation to use and which testing framework to use to wrap around it.

PHPUnit

I thought PHPUnit had some serious traction, but seeing that it had “unit” in its name, I thought it might not be suitable for functional testing. The documentation being unit-test-centric, in the sense of recommending you to name you test cases “[ClassYouWannaTest]Test” didn’t help in clearing the confusion.

Luckily, I came across an article about acceptance testing using Selenium/PHPUnit [acceptence test = functional test].

I’ve since settled on PHPUnit by Sebastian Bergmann with the Selenium extension also by Bergmann. His Selenium extension provides two base TestCase classes: PHPUnit_Extensions_SeleniumTestCase and PHPUnit_Extensions_Selenium2TestCase. I chose to use the latter. I hope I won’t be sorry for it, since it uses Selenium 2’s Selenium 1 backward compatible API. Otherwise, they’ll probably have me running for Facebook’s PHP-WebDriver in the end. (PHP-Webdriver also has a nice feature that it allows you to distribute a FF profile to Selenium Server/Grid.)

But what about my pipe dream?

If only I’d be able to visit my test site from any browser, click a button and watch all the test scripts run, the failures being filed into the issue tracker (with error log + screenshot) and a unicorn flying over the rainbow…

Anyway, it’s a pipe dream and the best way to deal with it is probably to put the pipe away, smoothen the sore and scratch the itch.

PEAR pain

As customary for PEAR projects, PHPUnit and its Selenium extension have quite a number of dependencies, meaning that installing and maintaining them manually in my project repo would be quite a pain. I’ve used the pear command to install everything locally, but my hosting provider doesn’t have all these packages intalled, so if I want to run tests from there (calling Selenium Server here), I’ll have to manage all that pear pain along with my project files.

Doesn’t PEAR offer some way to manage packages in any odd location? I’m not interested in what’s in /usr/share/php/. I want my stuff in ~/php-project-of-the-day/libs/

Process pain

So far, I’ve remotely hosted both the production and the development version of payformystay, which is specially nice if you want to share development features with others. Now, it’s difficult to decide what’s more annoying:

  1. Creating a full-fledged, locally hosted version of the website, so that I can execute the tests locally as well as host the testing version (Apache+PHP+MySQL) locally. A lot of misleading positive test results assured due to guaranteed differences between software versions and configurations
  2. Installing all the PEAR packages remotely so that I can run the test from my hosting provider’s shell. This implies having to punch a hole through the NAT wall at home or anywhere I happen to be testing at any moment. Bad idea. I don’t even have the password to all the routers that I pass during the year.
  3. Running the development version of the website remotely, but running the tests locally so that there are no holes to punch, except that I’ll have to tunnel to my host’s MySQL process because my tests need to setup, look-up and check stuff in the database. At least, now I don’t have to install server software on my development machine and need only the php-cli stuff.

Why doesn’t he just…

You know the conversation. You had the conversation:
“Why doesn’t he just …”
“I don’t understand why he can’t simply …”
“If he’d only …”

Usually followed by: “I used to, but I …”
Or: “At least you have (not) …”

Well, I had this conversation, but at least I …
I am writing about it, so that at least you …

You know the truth:
No, he can’t just …
At least, he couldn’t …

Now, you might …
But if it’d be so easy to …
You wouldn’t be congratulating each other that …
You’re just slightly better than him …

How does it feel?
Safe?

If only you would …

Commenting fixed for blog.bigsmoke.us

To my great surprise, thanks to Tobias Sjösten, I found out that commenting was broken on blog.bigsmoke.us. I couldn’t pinpoint the exact problem, but it must have been introduced with some WordPress upgrade somewhere along the line. I never noticed it because it did work for logged in users. (If I must really guess, I suspect a silent ReCaptcha version compatibility problem.)

Upgrading WordPress and wp-recaptcha to their latest versions (3.3.1 and 3.1.4 respectively) seems to have solved the problem.

Psychopathic Saturday

I’m trying to pump up myself to write a piece of text about psychopathy. All three other group members already wrote their part. We’re making a scientific poster titled “Is there a psychopath hidden in your brain?” But, do I even want to know? It’s all very close to home, with a mother who’s been accusing her ex-husband (my dad) of being a psychopath for, like, forever, and, simultaneously, this monkey in my brain, pointing it’s accusative little finger straight at me.

I am naked and feeling very vulnerable

There are many clever ways to tell you this. There are many ways to deceive. But in the end I feel that more often than not the deception merely serves to reinforce that image of a very vulnerable naked man.

Thus: “I am naked and feeling very vulnerable.”

MediaWiki ConfirmEdit/QuestyCaptcha extension

Since I moved my LDAP wiki over from DokuWiki to MediaWiki, I’ve been burried by a daily torrent of spam. Just like with my tropical timber investments wiki, the ReCaptcha extension (with pretty intrusive settings) doesn’t seem to do much to stop this shitstream.

How do the spammers do this? Do they primarily trick visitors of other websites into solving this captchas for them or do they employ spam-sweatshops in third-world countries? Fuck them! I’m trying something new.

I’ve upgraded to the ConfirmEdit extension. (ReCaptcha has also moved into this extension.) This allows me to try different Captcha types. The one I was most interested in is QuestyChaptcha, which allows me to define a set of questions which the user needs to answer. I’m now trying it out with the following question:

$wgCaptchaQuestions[] = array( 'question' => "LDAP stands for ...", 'answer' => "Lightweight Directory Access Protocol" );

I don’t think it’s a particularly good question, since it’s incredibly easy to Google. But, we’ll see, and in the mean time I’ll try to come up with one or two questions that are context-sensitive, yet easy enough to answer for anyone with some knowledge of LDAP. If you have an idea, please leave a comment.

Safari: don’t give gzipped content a .gz extension

Yesterday, while helping Caloe with the website for her company De Buitenkok, I came across the mother of all stupid bugs in Safari. Me having recently announced payformystay.com, I loaded it up in Apple’s hipster browser only to notice that the CSS wasn’t loaded. Oops!

Reloading didn’t help, but … going over to the development version, everything loaded just fine. Conclusion? My recent optimizations—concatenating + gzipping all javascript and css—somehow fucked up payformystay for Safari users. The 14 Safari visitors (16.28% of our small group of alpha users) I received since the sixth must have gotten a pretty bleak image of the technical abilities of payformystay.com’s Chief Technician (me). 😥

The old cat | gzip

So, what happened?

To reduce the number of HTTP requests per page for all the JavaScript/CSS stuff (especially when none of it is in the browser cache yet), I made a few changes to my build file to scrape the <head> of my layout template (layout.php), which I made to look something like this:

<?php if (DEV_MODE): ?>
  <link rel="stylesheet" type="text/css" href="/layout/jquery.ui.selectmenu.css" />                                   <!--MERGE ME-->
  <link rel="stylesheet" type="text/css" href="/layout/fancybox/jquery.fancybox-1.3.4.css" />                         <!--MERGE ME-->
  <link rel="stylesheet" type="text/css" href="/layout/style.css" />                                                  <!--MERGE ME-->
 
  <script src="/layout/jquery-1.4.4.min.js" type="text/javascript"></script>                                          <!--MERGE ME-->
  <script src="/layout/jquery.base64.js" type="text/javascript"></script>                                             <!--MERGE ME-->
  <script src="/layout/jquery-ui-1.8.10.custom.min.js" type="text/javascript"></script>                               <!--MERGE ME-->
  <script src="/layout/jquery.ui.selectmenu.js" type="text/javascript"></script>                                      <!--MERGE ME-->
  <script src="/layout/jquery.cookie.js" type="text/javascript"></script>                                             <!--MERGE ME-->
  <script src="/layout/fancybox/jquery.fancybox-1.3.4.js" type="text/javascript"></script>                            <!--MERGE ME-->
  <script src="/layout/jquery.ba-hashchange.min.js" type="text/javascript"></script>                                  <!--MERGE ME-->
  <script src="/layout/jquery.writeCapture-1.0.5-min.js" type="text/javascript"></script>                             <!--MERGE ME-->
<?php else: # if (!DEV_MODE) ?>
  <link href="/layout/motherofall.css.gz?2" rel="stylesheet" type="text/css" />
  <script src="/layout/3rdparty.js.gz?2" type="text/javascript"></script>
<?php endif ?>

It’s very simple: All the files with a “<!--MERGE ME-->” comment on the same line got concatenated and gzipped into motherofall.css.gz and 3rdparty.js.gz respectively, like so:

MERGE_JS_FILES := $(shell grep '<script.*<!--MERGE ME-->' layout/layout.php|sed -e 's/^.*<script src="\/\([^"]*\)".*/\1/')
MERGE_CSS_FILES := $(shell grep '<link.*<!--MERGE ME-->' layout/layout.php|sed -e 's/^.*<link .*href="\/\([^"]*\)".*/\1/')
 
all: layout/3rdparty.js.gz layout/motherofall.css.gz
 
layout/3rdparty.js.gz: layout/layout.php $(MERGE_JS_FILES)
        cat $(MERGE_JS_FILES) | gzip > $@
 
layout/motherofall.css.gz: layout/layout.php $(MERGE_CSS_FILES)
        cat $(MERGE_CSS_FILES) | gzip > $@

Of course, I simplified away the rest of my Makefile. You may notice that I could have used yui-compressor or something alike to minify the concatenated files before gzipping them, but yui-compressor chokes on some of the third-party stuff. I am using it for optimizing my own css/js (again, only in production).

Safari ignores the Content-Type for anything ending in .gz

As far as the HTTP spec is concerned, “file” extensions mean absolutely nothing. They’re trivial drivel. Whether an URL ends in .gz, .css, .gif or .png, what it all comes down to is what the Content-Type header tells the browser about the response being sent.

You may have noticed me being lazy in the layout template above when I referenced the merged files:

<link href="/layout/motherofall.css.gz?2" rel="stylesheet" type="text/css" />
  <script src="/layout/3rdparty.js.gz?2" type="text/javascript"></script>

I chose to directly reference the gzipped version of the css/js, even though I had a .htaccess files in place (within /layout/) which was perfectly capable of using the right Content-Encoding for each Accept-Encoding.

$ cat /layout/.htaccess

AddEncoding gzip .gz
 
RewriteEngine On
 
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [QSA,L]
 
<Files *.css.gz>
ForceType text/css
</Files>
 
<Files *.js.gz>
ForceType application/javascript
</Files>

You may notice that the .htaccess file contains some configuration to make sure that the .gz files are not served as something like application/gzip-compressed.

Anyway, I went to see if there were any browsers left that do not yet Accept-Encoding: gzip and could find none. When, yesterday, I was faced with an unstyled version of my homepage, my first reaction was (after the one where I was like hitting reload 20 times, embarrassedly mumbling something about “those damn browser-caches!”): “O then, apparently, Safari must be some exception to the rule that browsers have all been supporting gzip encoding for like forever!”

No, it isn’t so. Apparently Safari ignores the Content-Type header for any resource with an URL ending in .gz. Yes, that’s right. Safari understands Content-Encoding: gzip just fine. No problem. Just don’t call it .gz.

The new cat ; gzip

So, let’s remove the .gz suffix from these files and be done with it. The .htaccess was already capable of instructing all necessary negotiations to be able to properly serve the gzipped version only when it’s accepted (which is always, but I digress).

A few adjustments to my Makefile:

MERGE_JS_FILES := $(shell grep '<script.*<!--MERGE ME-->' layout/layout.php|sed -e 's/^.*<script src="\/\([^"]*\)".*/\1/')
MERGE_CSS_FILES := $(shell grep '<link.*<!--MERGE ME-->' layout/layout.php|sed -e 's/^.*<link .*href="\/\([^"]*\)".*/\1/')
 
all: layout/3rdparty.js.gz layout/motherofall.css.gz layout/pfms.min.js.gz
 
layout/3rdparty.js: layout/layout.php $(MERGE_JS_FILES)
	cat $(MERGE_JS_FILES) > $@
 
layout/motherofall.css: layout/layout.php $(MERGE_CSS_FILES)
	cat $(MERGE_CSS_FILES) > $@
 
%.gz: %
	gzip -c $^ > $@

And here’s the simple change to my layout.php template:

<link href="/layout/motherofall.css?2" rel="stylesheet" type="text/css" />
  <script src="/layout/3rdparty.js?2" type="text/javascript"></script>

That’s it. I welcome back all 14 Safari users looking for paid work abroad! Be it that you’re looking for international work in Africa, in America, in Asia or in Europe, please come visit and have a look at what we have on offer. 😉

Announcing payformystay.com

Januari the first, a very good day to announce a new project that I’ve been working on this past year. Which I did, on Facebook and Twitter. Now, five days later, it’s time te repeat the announcement to give it some much-needed link-juice. I know that normal people don’t follow this blog. (I don’t even follow this blog!) But it does have PageRank. And it does have 4000 monthly visitors. Time for some link-whoring!

PFMS search screen - top

PFMS search screen - top

PFMS search screen - bottom

PFMS search screen - bottom

payformystay.com is a website for adventurers who’re looking for paid work abroad. Whether you want to work in Europe, work in Afrika, work in Asia, work in Australia or whether you just want to do some seasonal work anywhere but home (grape picking, strawberry harvest, whatever you fancy). Of course we have many types of work: office jobs, tourism jobs, healthcare jobs, childcare jobs, wildlife jobs, anything.

The cool thing about payformystay, though, is that we only sport paid jobs. So, no wrestling through page after page of crappy offers where some evil cunt swine tries to make you pay for your own work. That’s right! Job offers on payformystay.com must at the very least include full board (something like a bed or tent and 3 meals daily) or enough pay to cover these basic living expenses! Offers are audited and violators are fed to the spammers.

Go get yourself a piece of the action:

payformystay.com – where people get paid to go on adventure

Peace out. End of announcement.

Have fun! Be scared! Be tough! And be safe!

« Older posts Newer posts »

© 2024 BigSmoke

Theme by Anders NorenUp ↑