Showing posts with label regexp. Show all posts
Showing posts with label regexp. Show all posts

Saturday, 29 September 2012

Downloading Photobucket albums with Matlab and wget

Here's a script to take the pain out of downloading an album of photos from photobucket.

It uses matlab to search a particular url for photo links (actually it looks for the thumbnail links and modifies the string to give you the full res photo link), then uses a system command call to wget to download the photos All in 8 lines of code! 

Ok to start you need the full url for your album. The '?start=all' bit at the end is important because this will show you all the thumbnails on a single page. Then simply use regexp to find the instances of thumbnail urls. Then loop through each link, strip the unimportant bits out, remove the 'th_' from the string which indicates the thumbnail version of the photo you want, and call wget to download the photo. Easy-peasy!

Friday, 7 September 2012

Craigslist Part 1: Using matlab to search according to your criteria and retrieve posting urls

I have a certain fondness for using matlab for obtaining data from websites. It simply involves understanding the way in which urls are constructed and a healthy does of regexp. In this example, I've written a matlab script to search craiglist for rental properties in Flagstaff with the following criteria: 1) 2 bedrooms 2) has pictures 3) within the price range 500 to 1250 dollars a month 4) allows pets 5) are not part of a 'community' housing development. This script will get their urls, prices, and google maps links, and write the results to a csv file. First, get the url and find indices of the hyperlinks only, then get only the indices of urls which are properties. Then get only the indices of urls which have a price in the advert description; find the prices; strip the advert urls from the string; sort the prices lowest to highest; and get rid of urls which mention a community housing development. Finally, get the links to google maps in the adverts if there are some, and write the filtered urls, maps, and prices to csv file. Save and execute in matlab, or perhaps add this line to your crontab for a daily update!

/usr/local/MATLAB/R2011b/bin/matlab -nodesktop -nosplash -r "craigslist_search;quit;"