Friday, 16 January 2009

How to read and search your blog page with the imaginative use of the 'regexp' in matlab

Just a quickie - there may be some application for this (?)

You start with your blog url:

>> url='http://thezestyblogfarmer.blogspot.com/';

read it in, then you can start searching through its contents:

>> text=urlread(url);

for example, I can list all the unix commands I have mentioned - its easy because they all start with a dollar sign

>> text_linux=regexp(text,'>\$[^<\r\n]*<','match')' which gives me the output:

'>$ sudo apt-get update<' '>$ sudo apt-get upgrade<' '>$ sudo apt-get clean<' '>$ sudo apt-get autoclean<' '>$ sudo apt-get -f install<' '>$ sudo badblocks -v /dev/sda1<' '>$ sudo aptitude install debsums<' '>$ debsums <' '>$ sudo apt-get install firestarter<' '>$ sudo apt-get install youtube-dl<' '>$ youtube-dl http://uk.youtube.com/watch?v=r9OjoPskf_c<' '>$ ffmpeg -i r9OjoPskf_c.flv people_everyday.avi<' [1x97 char] '>$ sudo su | sudo apt-get install skype<' '>$ sudo jhas -jh 837afm$^&qeiuhn>>KOUUIG4n we8f-&hcjku8hujbn ok?<' '>$ sudo su | python setup.py install<' [1x75 char] '>$ sudo dpkg -i skysentials_1.0.1-1_all.deb<' '>$ konsole<' or maybe the matlab commands which all start with 2 > signs:

>> text_matlab=regexp(text,'>\>>[^<\r\n]*<','match')'

'>>> cf=0; %current frame<' '>> for k=10:50 % identity matrix from [10,10] to [50,50]<' '>> clf;<' '>> plot(fft(eye(k))) %plot<' '>> axis equal; axis off; axis([-1 1 -1 1]); % sort out axes<' '>> pause(0.01) %take a break<' '>> cf=cf+1; %update current frame<' [1x87 char] '>> end<' [1x94 char] '>>KOUUIG4n we8f-&hcjku8hujbn ok?<' '>>> convert my_image.jpg -resize 200% my_new_image.jpg<' '>>> direc=dir([pwd,filesep,'*.','jpeg']);<' '>>> filenames={};<' '>>> [filenames{1:length(direc),1}] = deal(direc.name);<' '>>> filenames=sortrows(char(filenames{:}));<' '>>> mkdir([pwd,filesep,'james_and_the_giant_peach'])<' '>>> mkdir([pwd,filesep,'thumblina'])<' '>>> for i=1:size(filenames,1)<' '>>> system(['convert ',deblank(filenames(i,:))...<' '>>> system(['convert ',deblank(filenames(i,:))...<' '>>> end<' '>>> url='http://www.mathworks.com/moler/ncm.tar.gz';<' '>>> gunzip(url,'ncm')<' '>>> untar('ncm/ncm.tar','ncm')<' '>>> cd([pwd,filesep,'ncm'])<' '>>> [U,G]=surfer('http://www.thedailydanielblog.blogspot.com',200);<' '>>> fid=fopen('dansweb.txt','wt');<' '>>> for i=1:size(char(U),1), fprintf(fid,'%s\n',char(U(i,:))); end<' '>>> fclose(fid)<' '>>> pagerank(U,G)<' and finally how to list the websites you refer to - I can't post the code into my blog because it involves html code which blogger doesn't like - replace '>\>>' with '< [the first letter of the alphabet] href'


enjoy!

1 comment:

Daniel Buscombe said...

oh, I just found another useful package 'fslint' which allows you to search through your filesystem for duplicates, name clashes, empty directories, and other stuff which might be clogging up space on your hard drive

$ sudo apt-get install fslint