Sunday, 9 September 2012

Craigslist Part 2: using matlab to plot your search results in Google Maps

In 'Craigslist Part 1' I demonstrated a way to use matlab to automatically search craigslist for properties with certain attributes. In this post I show how to use matlab and python to create a kml file for plotting the results in Google Earth. Download the matlab googleearth toolbox from here Add google earth toolbox to path. Import search results data and get map links (location strings). Loop through each map location and string the address from the url, then geocode to obtain coordinates.
addpath(genpath('~/googleearth'))
dat=importdata('myFile.csv');
maps=dat.textdata(:,2);
prices=dat.data;
for i=1:size(maps,1)
address=char(maps(i,:));
address=address(regexp(address,'loc')+3:end);
% strip address of unnneccessary stuff
tmp = strrep(address, '%', ' ');
tmp = strrep(tmp, '+', ' ');
tmp = strrep(tmp, '+', ' ');
tmp(isstrprop(tmp, 'digit'))=[];
space=isspace(tmp);
letters=find(isletter(tmp)==1);
letters = letters(floor(gradient(letters))==1);
space(letters)=1;
% trim single letters but preserve location of spaces to make address
% readable
clean_address=strtrim(tmp(space));
% geocode address to lat,long coordinate. Geocode generates and runs a
% python script to do the conversion
[g1,g2] = geocode(clean_address);
% if any empty or unconverted addresses, call them nans
if isempty(g1)
g1=NaN; g2=NaN;
end
lat(i)=g1; lon(i)=g2;
end
view raw craig2.m hosted with ❤ by GitHub
The function 'geocode' writes and executes a python script to geocode the addresses (turn the address strings into longitude and latitude coordinates). The python module may be downloaded here
function [lat,lon] = geocode(address)
fid=fopen('test.py','wt');
fprintf(fid,'%s\n','from pygeocoder import Geocoder');
fprintf(fid,'%s\n',['results = Geocoder.geocode(''',deblank(address),''')']);
fprintf(fid,'%s\n','print results[0].coordinates');
fclose(fid);
[stat1,stat2]=system('python test.py');
stat2=deblank(stat2);
stat2=stat2(2:end-1);
lon=str2num(deblank(stat2(regexp(stat2,',')+1:end)));
lat=str2num(deblank(stat2(1:regexp(stat2,',')-1)));
view raw geocode.m hosted with ❤ by GitHub
Once we have the coordinates, we then need to get rid of nans and outliers (badly converted coordinates due to unreadable address strings). Use the google earth toolbox to build the kml file. Finally, run google earth and open the kml file using a system command:
f=isnan(lat);
lat(f)=[];
lon(f)=[];
maps(f)=[];
prices(f)=[];
f=find(lat>(mean(lat)+2*std(lat)));
lat(f)=[]; lon(f)=[]; maps(f)=[]; prices(f)=[];
f=find(lat<(mean(lat)-2*std(lat)));
lat(f)=[]; lon(f)=[]; maps(f)=[]; prices(f)=[];
f=find(lon>(mean(lon)+2*std(lon)));
lat(f)=[]; lon(f)=[]; maps(f)=[]; prices(f)=[];
f=find(lon<(mean(lon)-2*std(lon)));
lat(f)=[]; lon(f)=[]; maps(f)=[]; prices(f)=[];
kmlStr = ge_scatter(lon(:),lat(:),...
'marker','*',...
'markerEdgeColor','FFFF00FF',...
'markerFaceColor','80FF00FF',...
'markerScale',1e-3);
ge_output('scatter.kml',[kmlStr])
system(['google-earth "',pwd,filesep,'scatter.kml" &'])
view raw craig2b.m hosted with ❤ by GitHub
The above is a simple scatter plot which only shows the location of the properties and not any information about them. Next shows a more complicated example where the points are plotted with labels (the asking price) and text details (the google map links) in pop-up boxes First each coordinate pair is packaged with the map and name tags. Concatenate the strings for each coordinate and make a kml file. Finally, run google earth and open the kml file using a system command:
iconStr = 'http://maps.google.com/mapfiles/kml/pal2/icon10.png';
kmlStr=cell(1,length(maps));
for i=1:length(maps)
kmlStr{i} = ge_point_new(lon(i),lat(i),0,...
'iconURL',iconStr,...
'iconColor','FF0080FF',...
'description',maps{i},...
'name',['$',num2str(prices(i))]);
end
tmp=[];
for i=1:length(maps)
tmp=[tmp,eval(['kmlStr{',num2str(i),'}'])];
end
ge_output('points.kml',...
ge_folder('points',tmp));
system(['google-earth "',pwd,filesep,'points.kml" &'])
view raw craig2c.m hosted with ❤ by GitHub

No comments: