Smokes your problems, coughs fresh air.

Category: Technology (Page 14 of 47)

Adding a clock in screen to avoid your ssh’s from being killed

The world is filled with stupid routers, which kill all connections that have no activity for a while (even a very short while). I keep loosing my SSH sessions because of this. To fix it, I added a clock in my GNU screen bar:

hardstatus alwayslastline "%= %H | %l | [%c:%s]"

For the record, my entire .screenrc:

multiuser on
caption always "%{= kB}%-Lw%{=s kB}%50>%n%f* %t %{-}%+Lw%<"
vbell off
startup_message off
term linux
hardstatus alwayslastline "%= %H | %l | [%c:%s]"

Trying to reduce MySQL InnoDB disk usage after major reduction of data

So, two days ago, I tried to shrink my MediaWiki database and it almost worked, except the MySQL process wouldn’t shrink along with it.

Of course I tried all the obvious things such as dropping the database, stopping and restarting the process, followed by reloading the database. Optimizing tables, altering tables, all the obvious. But, to no avail, because there’s this bug. Well, technically it’s not a bug. Like most of MySQL, it has been “designed” to be counter-intuitive, non-standard, riddled with special cases, exceptions, work-arounds and duct tape. So, the “bug” is really just a feature request from dumb people much like myself who want stupid things like a database that becomes smaller in size when you delete most of your data.

I should really move to a virtual private server environment, where I can just run a real database (PostgreSQL), but I’m still on NFSN, whom (besides the sky-rocketing storage costs as of late) have given me no reason to complain so far.

I thought I’d ask them to help me out.

Recently, due to inattention to spam, one of my wiki databases has grown to well over 10 GiB. After finally getting around to removing most of this spam and tweaking some settings to reduce the table size from over 11Gig to a couple of MiB, I thought my Last Reported Size would go down accordingly.

But no such luck. Apparently it’s a MySQL issue, with the only solution being to drop the offending database (after a dump, of course), stop the MySQL process, remove the offending table, restart the process and then reload the database.

Instead, you could use the innodb_file_per_table option, which is enabled by default on MariaDB.

Without that option set, OPTIMIZE, ALTER TABLE and all that stuff will do nothing to reduce a table size. It’s one of those issues which the MySQL devs are really not interested in solving: http://bugs.mysql.com/bug.php?id=1341

I hope you can help me out with this, either by setting the innodb_file_per_table option or by removing all my database files. In the latter case, I’d hope you ask me to green light this first so that I can make some other data size reductions in various databases before I make a final backup.

But then I thought better of it, when I learned that—contrary to my expectations—the option really was enabled:

SHOW VARIABLES LIKE ‘innodb_file_per_table’;

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    | 
+-----------------------+-------+
1 row in set (0.00 sec)

So the option was enabled. I had to look elsewhere for an answer, which made me decide to do the following.

OPT="--host=myhost --user=myuser --password=mypassword --skip-column-names --batch hardwood";
mysql $OPT --execute="show tables" | grep -v mw_searchindex | tablename;
  mysql $OPT --execute="alter table $tablename ENGINE=InnoDB; optimize table $tablename"

Needless to say, this didn’t help. So I wrote the support request for my hosting provider:

Recently, due to inattention to spam, one of my wiki databases has grown to well over 11 GiB. After finally getting around to removing most of this spam and tweaking some settings to reduce the table size of the most offending table from over 11Gig to a couple of MiB, I thought my Last Reported Size would go down accordingly.

Since you have the innodb_file_per_table option enabled (the default in MariaDB and confirmed by “SHOW VARIABLES LIKE 'innodb_file_per_table';”), I’d expect “ALTER TABLE mw_text ENGINE=InnoDB” to shrink the table (the command should move the table to its own file, even if it wasn’t already in its own file before the move to MariaDB). It didn’t. Last Reported Size is still approximately the same. Dropping and reloading the entire database didn’t do much to help either.

I suspect that the problem is that that the old shared file still exists and that this file won’t be deleted even if there are no more tables in it with the only solution then being to dump the database, drop it, remove all its files and then reload the dump.

Anyway, I’d like it to be so that my database will actually shrink when I delete stuff and the way I understand it this should actually be possible thanks to innodb_file_per_table. If it’s the same old story as without innodb_file_per_table, that would just be awful, because then I’d need your intervention every time that I’m trying to reduce my database process size.

I hope that you can somehow reload the database and remove the bloated
ibdata1 file.

Now, I’m just waiting patiently for their response while my funds whither…

Shrinking/compressing a MediaWiki database

As of late, I haven’t had a lot of time to chase after spammers, so – despite of anti-spam captchas and everything – a couple of my wikis have been overgrowing with spam. One after the other I’ve been closing them down to anonymous edits, even closing down user registration alltogether, but some a little too late.

The last couple of months my hosting expenses shot through the roof, because my Timber Investments Wiki database kept expanding to well over 14 GiB. So I kind of went into panic mode and I even made time for another one of my famous spam crackdowns—the first in many, many months.

The awfully inefficient bulk deletion of spam users

Most of this latest tsunami of spam was in the form of “fake” user pages filled with bullshit and links. The only process that I could think of to get rid of it was quite cumbersome. First, I made a special category for all legitimite users. From that I created a simple text file (“realusers.txt”) with one user page name per line.

Then, I used Special:AllPages to get a list of everything in the User namespace. After struggling through all the paginated horror, I finally found myself with another copy-pasted text file (“unfilteredusers.txt”) that I could filter:

cp unfilteredusers.txt todelete.txt
cat realusers.txt | u
  sed -i -e "/$u/d" todelete.txt

(I’d like to know how I could have done this with less code, by the way.)

This filtered list, I fed to the deleteBatch.php maintenance script:

php maintenance/deleteBatch.php -u BigSmoke -r spam todelete.txt

By itself, this would only increase the size of MW’s history, so, as a last step, I used deleteArchivedRevisions.php to delete the full revision history of all deleted pages.

This work-flow sucked so bad that I missed thousands of pages (I had to copy-paste this listings by hand, as I mentioned earlier above), and had to redo it again. This time, the mw_text table size shrunk from 11.5 GiB to about 10 GiB. Not enough. Even the complete DB dump was still way over 5 Gig [not to mention the process size which remained stuck at around 15 GiB, something which I woudn’t be able to solve even with the configuration setttings mentioned after this].

Enter $wgCompressRevisions and compressOld.php

The huge size of mw_text was at long last easily resolved by a MW setting that I had never heard about before: $wgCompressRevisions. Setting that, followed by an invocation of the compressOld.php maintenance script took the mw_text table size down all the way from >10 GiB to a measly few MiB:

php maintenance/storage/compressOld.php

SELECT table_schema 'DB name', sum(data_length + index_length) / 1024 / 1024 "DB size in MiB"
FROM information_schema.TABLES
WHERE table_schema LIKE 'hard%'
GROUP BY table_schema;

+----------+----------------+
| DB name  | DB size in MiB |
+----------+----------------+
| hardhout |    41.88052750 | 
| hardwood |   489.10618973 | 
+----------+----------------+

But it didn’t really, because of sweet, good, decent, old MySQL. 🙁 After all this action, the DB process was still huge (still ~15 GiB). This far exceeded the combined reported database sizes. Apparently, MySQL’s InnoDB engine is much like our economy. It only allows growth and if you want it to shrink, you have to stop it first, delete everything and then restart and reload.

Future plans? Controlled access only?

One day I may reopen some wikis to new users with a combination of ConfirmAccount and RevisionDelete and such, but combatting spam versus giving up on the whole wiki principle is a topic for some other day.

Converting all tables in MySQL DB to InnoDB

#!/bin/bash
 
exit 1
 
dbname="eorder"
 
echo 'SHOW TABLES;'  | mysql $dbname  | awk '!/^Tables_in_/ {print "ALTER TABLE `"$0"` ENGINE = InnoDB;"}'  | column -t 
echo 'SHOW TABLES;'  | mysql $dbname  | awk '!/^Tables_in_/ {print "ALTER TABLE `"$0"` ENGINE = InnoDB;"}'  | column -t | mysql $dbname

My universal remote programming codes

Whenever this remote’s battery is loose for a while, it forgets its programming. So, here it is:

For computer, use the TV setting. Press 1 and 3 for a few seconds, then when the light turns on, enter 0677 as code.

For my amp, you have to set the aux mode for a second TV. To do that, press 1 and 6 for a while. Then press 9, 9, 2. Then press TV, then aux. Next step is to enter the TV code. Press aux, then hold 1 and 3. Enter the following code: 0064.

Apache mod_proxy configuration for The Pirate Bay

I found several apache mod_proxy configs for setting up a proxy for The Pirate Bay, but none worked fully.

You need to enable/install:

  • mod_proxy
  • mod_rewrite
  • mod_headers
  • mod_proxy_http

<Virtualhost *:80>
        ServerName tpb.yourdomain.com
 
        # Plausible deniability, and respecting your fellow pirate's privacy.
        Loglevel emerg
        CustomLog /dev/null combined
        ErrorLog /dev/null
 
        <Proxy *>
          Order deny,allow
          Allow from all
        </Proxy>
 
        # Just to fix a few links...
        RewriteEngine On
        RewriteRule \/static\.thepiratebay\.se\/(.*)$ /static/$1 [R=302,L]
 
        ProxyRequests off
 
        # Cookies are imporant to be able to disable the annoying double-row mode.
        # The . before the domain is required, but I don't know why :)
        ProxyPassReverseCookieDomain .thepiratebay.se tpb.yourdomain.com
 
        ProxyPass / http://thepiratebay.se/
        ProxyPass /static/ http://static.thepiratebay.se/
        ProxyPass /torrents/ http://torrents.thepiratebay.se/
        ProxyHTMLURLMap http://thepiratebay.se /
        ProxyHTMLURLMap http://([a-z]*).thepiratebay.se /$1 R
 
        ProxyHTMLEnable On
 
        <Location /static/>
          ProxyPassReverse /
          SetOutputFilter proxy-html
          ProxyHTMLURLMap / /static/
          RequestHeader unset Accept-Encoding
        </Location>
 
        <Location /torrents/>
          ProxyPassReverse /
          SetOutputFilter proxy-html
          ProxyHTMLURLMap / /torrents/
          RequestHeader unset Accept-Encoding
        </Location>
 
</Virtualhost>

Listing MySQL table sizes

This query lists the sizes of all tables in MySQL:

SELECT 
  TABLE_SCHEMA, 
  TABLE_NAME, 
  CONCAT(ROUND(data_length / ( 1024 * 1024 ), 2), 'MB') DATA, 
  CONCAT(ROUND(data_free  / ( 1024 * 1024 ), 2), 'MB') FREE 
from information_schema.TABLES 
where TABLE_SCHEMA NOT IN ('information_schema','mysql', 'performance_schema');

This query lists the database sizes:

SELECT 
  TABLE_SCHEMA, 
  TABLE_NAME, 
  CONCAT(ROUND(sum(data_length) / ( 1024 * 1024 ), 2), 'MB') DATA 
from information_schema.TABLES 
where TABLE_SCHEMA NOT IN ('information_schema','mysql', 'performance_schema') 
group by TABLE_SCHEMA;

How to test payformystay.com

I haven’t got much experience when it comes to testing web applications. Instead (and more so out of apathy than belief), I’ve always adhered to the ad-hoc test approach. However, the usage of pure Posgres unit tests back when I worked on a complicated investment database with Halfgaar did teach me the advantages of test-driven development.

For payformystay, though, unit tests simply won’t cut it. The database design is quite straight-forward with not that many relationships and the schema’s only complexities arise from it being remarkably denormalized and full of duplication. Besides and contrary to mine and Halfgaar’s PostgreSQL project for Sicirec, the business logic doesn’t live all neatly and contained on the database level. And I’m not using a clean ORM wrapper either, which I could use as a unit test target. And what would be the point, since in typical MySQL/PHP fashion it would be much too easy to circumvent for a particular function.

What I want for this application is full functional test coverage so that I know that all parts of the website function correctly in different browser versions across operating systems. In other words: I want to know that the various parts are functioning correctly as implied by the fact that the whole is functioning correctly.

But how do you do automated tests from a browser?

At first, I thought I should probably cobble something together myself with jQuery, maybe even using a plugin such as Qunit with the composite addon.

But how was I going to run the tests for JavaScript independence then? Using curl/wget or one of these hip, headless browsers which seem to be bred for this purpose?

Choises, choises…

Selenium

Then, there’s Selenium which is a pretty comprehensive set of test tools, meant precisely for what I need. Sadly my wants weren’t easily aligned with my needs. Hence, it took me some time (months, actually) before I was sure that Selenium was right for me.

Selenium provides the WebDriver API—implemented in a number programming languages—that lets you steer all popular browsers either through the standalone Selenium Server or Selenium Grid. The server executes and controls the browser. Since Selenium 2, it doesn’t even need a JavaScript injection in the browser to do this, which is very interesting for my tests related to my desire to make my AJAX-heavy toy also available to browsers with JavaScript disabled for whatever reason.

Selenium versus my pipe dream

Selenium IDE is a Firefox extension which lets you develop Selenium scripts by recording your interactions with the browser. It stores its script in “Selenese”. This held quite some appeal to me, because my initial testing fantasy revolved around doing it all “client-side”, in the sense that I wouldn’t have to leave my browser to do the testing. I wanted to be able to steer any browser on any machine that I happened to stumble upon at my test site and fire those tests.

Well, Selenese can be interpreted by some WebDriver API implementations to remotely steer the browser, but it can’t be executed from within the browser, except by loading it into the Selenium IDE, which is a Firefox-only extension. Also, driving the browser through JavaScript has been abandoned by Selenium with the move away from Selenium-RC to WebDriver (which they’re currently trying to push through the W3C standardization process).

With everyone moving away from my precious pipe-dream, I remained clinging to some home-grown jQuery fantasy. But, how was I going to test my JavaScript-free version? Questions.

I had to eventually replace my pipe dream with the question of which WebDriver implementation to use and which testing framework to use to wrap around it.

PHPUnit

I thought PHPUnit had some serious traction, but seeing that it had “unit” in its name, I thought it might not be suitable for functional testing. The documentation being unit-test-centric, in the sense of recommending you to name you test cases “[ClassYouWannaTest]Test” didn’t help in clearing the confusion.

Luckily, I came across an article about acceptance testing using Selenium/PHPUnit [acceptence test = functional test].

I’ve since settled on PHPUnit by Sebastian Bergmann with the Selenium extension also by Bergmann. His Selenium extension provides two base TestCase classes: PHPUnit_Extensions_SeleniumTestCase and PHPUnit_Extensions_Selenium2TestCase. I chose to use the latter. I hope I won’t be sorry for it, since it uses Selenium 2’s Selenium 1 backward compatible API. Otherwise, they’ll probably have me running for Facebook’s PHP-WebDriver in the end. (PHP-Webdriver also has a nice feature that it allows you to distribute a FF profile to Selenium Server/Grid.)

But what about my pipe dream?

If only I’d be able to visit my test site from any browser, click a button and watch all the test scripts run, the failures being filed into the issue tracker (with error log + screenshot) and a unicorn flying over the rainbow…

Anyway, it’s a pipe dream and the best way to deal with it is probably to put the pipe away, smoothen the sore and scratch the itch.

PEAR pain

As customary for PEAR projects, PHPUnit and its Selenium extension have quite a number of dependencies, meaning that installing and maintaining them manually in my project repo would be quite a pain. I’ve used the pear command to install everything locally, but my hosting provider doesn’t have all these packages intalled, so if I want to run tests from there (calling Selenium Server here), I’ll have to manage all that pear pain along with my project files.

Doesn’t PEAR offer some way to manage packages in any odd location? I’m not interested in what’s in /usr/share/php/. I want my stuff in ~/php-project-of-the-day/libs/

Process pain

So far, I’ve remotely hosted both the production and the development version of payformystay, which is specially nice if you want to share development features with others. Now, it’s difficult to decide what’s more annoying:

  1. Creating a full-fledged, locally hosted version of the website, so that I can execute the tests locally as well as host the testing version (Apache+PHP+MySQL) locally. A lot of misleading positive test results assured due to guaranteed differences between software versions and configurations
  2. Installing all the PEAR packages remotely so that I can run the test from my hosting provider’s shell. This implies having to punch a hole through the NAT wall at home or anywhere I happen to be testing at any moment. Bad idea. I don’t even have the password to all the routers that I pass during the year.
  3. Running the development version of the website remotely, but running the tests locally so that there are no holes to punch, except that I’ll have to tunnel to my host’s MySQL process because my tests need to setup, look-up and check stuff in the database. At least, now I don’t have to install server software on my development machine and need only the php-cli stuff.

Generating an SSL CSR and key

To generate an SSL certificate signing request (CSR) with key you can do this:

openssl req -nodes -newkey rsa:2048 -keyout bla.key -out bla.csr

This syntax does not force you to supply a password, which is convenient.

If you generate a CSR for startcom, you don’t have to fill in any fields; only the public key from the CSR is used. For other vendors, the common name is important; the domain name must be entered there.

« Older posts Newer posts »

© 2024 BigSmoke

Theme by Anders NorenUp ↑