Skip to content

Using wget to download all files on a page

Just a quick one-liner I used to download a bunch of MIDI files from an on-line listing of Chopin MIDIs:

$ wget http://www.piano-midi.de/chopin.htm -q -O - \
| grep 'href=".*\.mid"' \
| sed -e 's/^.*href="\(.*\)".*$/\1/' \
| xargs -i{} wget http://www.piano-midi.de/{}

Maybe not so useful to you, but it’s a good demonstration of applying the hacker’s mentality to one of those moments where, after clicking the fourth link or so, I find myself thinking: Wouldn’t spending a few moments on a one-liner be much more fun than clicking through and saving 44 more links?

Useful or not, I am now testing timidity’s piano sound with a nice rendition of Chopin in the background whereas, without this trick, I’d still be right-click-click-saving links instead of writing this post. It’s up to you to decide whether this is actually good or bad. 😛


    5 Comments ( Add comment / trackback )

    1. (permalink)
      Comment by Lucas
      On November 22, 2008 at 22:34

      Hey, man, thanks! It worked for me 🙂

      I just had to change “href” to “HREF” because the site I was downloading the files from had the code in upper case.

    2. (permalink)
      Comment by Rowan Rodrik
      On November 22, 2008 at 23:39

      Hey Lucas,

      Thanks for leaving your comment, man! It’s so cool to hear that this trick ended up actually being useful to someone. 😀

    3. (permalink)
      Comment by Guru
      On March 18, 2009 at 19:48

      That seems a bit long winded. Much simpler would be…
      wget -r -l1 -np -nc -A.mid http://www.website.com/stuff.html

      if you omit ‘-A.mid’ it would just download everything. -l1 means 1 level deep. -np means ignore parent links. -r means recursively download links. -nc means don’t download stuff that is already downloaded (if you want to resume later, or check for new files some other time)

    4. (permalink)
      Comment by Rowan Rodrik
      On March 18, 2009 at 20:07

      Thanks for illuminating me, Great Guru! 😀 Comments like yours are one of the best reasons to keep blogging. wget really is a cool tool! 🙂

      Let me give you some extra link juice in exchange: Guru’s ROM Dumps (for archival purposes only)

      Ok, true: I still need to install a DoFollow plugin for WordPress to unfreeze the link juice. I will, I will. Soon enough, I will. 😉

    5. (permalink)
      Comment by Rowan Rodrik
      On March 18, 2009 at 20:11

      Note to myself: dude, why didn’t you tag this post? 😕 You could have just copy pasted when you wrote that other post. But, on the other hand, maybe you’d have missed out on a very useful wget lesson.