Smokes your problems, coughs fresh air.

Using wget to download all files on a page

Just a quick one-liner I used to download a bunch of MIDI files from an on-line listing of Chopin MIDIs:

$ wget http://www.piano-midi.de/chopin.htm -q -O - \
| grep 'href=".*\.mid"' \
| sed -e 's/^.*href="\(.*\)".*$/\1/' \
| xargs -i{} wget http://www.piano-midi.de/{}

Maybe not so useful to you, but it’s a good demonstration of applying the hacker’s mentality to one of those moments where, after clicking the fourth link or so, I find myself thinking: Wouldn’t spending a few moments on a one-liner be much more fun than clicking through and saving 44 more links?

Useful or not, I am now testing timidity’s piano sound with a nice rendition of Chopin in the background whereas, without this trick, I’d still be right-click-click-saving links instead of writing this post. It’s up to you to decide whether this is actually good or bad. πŸ˜›

5 Comments

  1. Lucas

    Hey, man, thanks! It worked for me πŸ™‚

    I just had to change “href” to “HREF” because the site I was downloading the files from had the code in upper case.

  2. Rowan Rodrik

    Hey Lucas,

    Thanks for leaving your comment, man! It’s so cool to hear that this trick ended up actually being useful to someone. πŸ˜€

  3. Guru

    That seems a bit long winded. Much simpler would be…
    wget -r -l1 -np -nc -A.mid http://www.website.com/stuff.html

    if you omit ‘-A.mid’ it would just download everything. -l1 means 1 level deep. -np means ignore parent links. -r means recursively download links. -nc means don’t download stuff that is already downloaded (if you want to resume later, or check for new files some other time)

  4. Rowan Rodrik

    Thanks for illuminating me, Great Guru! πŸ˜€ Comments like yours are one of the best reasons to keep blogging. wget really is a cool tool! πŸ™‚

    Let me give you some extra link juice in exchange: Guru’s ROM Dumps (for archival purposes only)

    Ok, true: I still need to install a DoFollow plugin for WordPress to unfreeze the link juice. I will, I will. Soon enough, I will. πŸ˜‰

  5. Rowan Rodrik

    Note to myself: dude, why didn’t you tag this post? πŸ˜• You could have just copy pasted when you wrote that other post. But, on the other hand, maybe you’d have missed out on a very useful wget lesson.

© 2024 BigSmoke

Theme by Anders NorenUp ↑