The R commands used to analyse this dataset are here.
I performed PCA on three versions of the 36 variables other than the class - first, on the original variables, second on the variables scaled by dividing by their sample standard deviations.
The scree plots for scaled and unscaled PCA show that the first two PCs captures a large amount of the variance, though more PCs out to at least eight also seem to carry at least some information. It makes sense that two PCs would be sufficient if the three classes have means that differ by much more than the standard deviations of the variables. The data would then consist of three tight clusters of points, and would hence almost lie in the two-dimensional plane containing the centres of these clusters. Two PCs would be sufficient to specify a position in this plane. Of course, the actual data has some scatter that isn't captured by just these two PCs.
I looked at plots of the 36 coefficients for each principal component, which show a pattern that is due the fact that the 36 variables are measurements on 4 spectral bands for 9 adjacent pixels. We wouldn't expect things to change much from one pixel to an adjacent one, but we might expect some differences between spectral bands, which is what we see.
Since the variables are all measured in the same units, and there is not a huge difference in variance between spectral bands, it seems simplest to just use the unscaled data. An alternative (which I didn't explore) would be to scale all 9 variables for one spectral band in the same way, which would equalize variance between spectral bands without introducing spurious differences between pixels.
I also performed factor analysis with two common factors. The loadings and uniquesses (for scaled data) are plotted here. I compared the linear combinations of unscaled variables that produce the projections on the unscaled PC directions with the linear combinations that produce the factor scores. The result is seen in this plot. Groups of coefficients that correspond to the four spectral bands can be seen for both PCA and FA, but the groups are fairly different in their positions (even accounting for possible rotation). Also, the coefficients for the nine pixels within a group are spread out more for FA than for PCA. For three of the spectral bands, the coefficient for the centre pixel (E) is much larger than for the other pixels. This might seem surprising, since the nine pixels should be very similar. However, the centre pixel will have higher correlation with other pixels than the other pixels do, since it is fairly near all eight other pixels. It therefore makes sense that it is more closely related to the common factors than the other pixels. One can also see larger coefficients for the next-most-central pixels (B, D, F, H) than for the pixels furthest from the centre (A, C, G, I).
Scatterplots of the projections of the observations on the first two PCs (unscaled) and the factor scores, seen here, with class identified by colour, show that both PCA and FA can reduce dimensionality while keeping information needed to classify quite well. The plots for PCA and FA are quite similar (apart from a sign flip), but differ in some details. Without more detailed analysis, it is unclear which will work best for classification.