With a goal of trying to separate the two classes, the results were reasonably satisfactory with image histograms as a feature metric, and a little user input. However, it'd be nice to get a more automated/objective way to separate the images into two discrete clusters. Perhaps the feature detection method requires a little more scrutiny?
In this post I look at 5 methods using the functionality of the excellent Mahotas toolbox for python downloaded here:
1) Image moments (I compute the first 5)
2) Haralick texture descriptors which are based on the co-occurrence matrix of the image
3) Zernike Moments
4) Threshold adjacency statistics (TAS)
5) Parameter-free threshold adjacency statistics (PFTAS)
The code is identical to the last post except for the feature extraction loop. What I'm looking for is as many images of the river at either low distances (extreme left in the dendrogram) or very high distances (extreme right in the dendrograms). These clusters have been highlighted below. There are 18 images of the river bend.
In order of success (low to high):
1) Zernike (using a radius of 50 pixels and 8 degrees, but other combinations tried):
2) Haralick:
There is 1 cluster of 4 images at the start
3) Moments:
4) TAS:
5) PFTAS:
The cluster at the end contains 16 out of 18 images of the river bend at the end. Plus there are no user-defined parameters. My kind of algorithm. The winner by far!
So there you have it, progress so far! No methods tested so far is perfect. There may or may not be a 'part 3' depending on my progress, or lack of!