Friday 7 September 2012

Craigslist Part 1: Using matlab to search according to your criteria and retrieve posting urls

I have a certain fondness for using matlab for obtaining data from websites. It simply involves understanding the way in which urls are constructed and a healthy does of regexp. In this example, I've written a matlab script to search craiglist for rental properties in Flagstaff with the following criteria: 1) 2 bedrooms 2) has pictures 3) within the price range 500 to 1250 dollars a month 4) allows pets 5) are not part of a 'community' housing development. This script will get their urls, prices, and google maps links, and write the results to a csv file. First, get the url and find indices of the hyperlinks only, then get only the indices of urls which are properties. Then get only the indices of urls which have a price in the advert description; find the prices; strip the advert urls from the string; sort the prices lowest to highest; and get rid of urls which mention a community housing development. Finally, get the links to google maps in the adverts if there are some, and write the filtered urls, maps, and prices to csv file. Save and execute in matlab, or perhaps add this line to your crontab for a daily update!

/usr/local/MATLAB/R2011b/bin/matlab -nodesktop -nosplash -r "craigslist_search;quit;"

No comments: