Thursday, August 24, 2006

Some notes on CCA

I spent the last day or so playing around with a CCA (Canonical Correlation Analysis) demo I created. It was pretty simple, I generated a random cluster of data for the first view, and for the second view I randomly rotated this first view and added noise. The results were a little surprising though. When I didn't change the second view (that is the first view and second view are identical) I intuitively expected CCA to find the main diagonals going through the clusters as maximum correlation directions, however this turned out be false. While the main diagonals do lead to maximal correlation, other directions lead to the same correlation too, and CCA seems to pick up on these non-diagonal directions more readily than not. This leads to the conclusion that, in general, the optimization problem used to solve CCA does not have a single global optimum.

When the two views were randomized, again off diagonal directions were being picked up as having maximal correlation (this was a value of 0.5) When the diagonal were tested they usually came close to 0.5, though not actually equal to it.

The problem that I have with these results, on a gut level, is that CCA picks out correlation directions that may not be the most interpretable directions.

Nevertheless when you look at the two directions they do correspond to each other. For example, the correlation direction in one view is the same direction in the other but rotated by the appropriate amount. So in a sense the CCA correlation directions provide a "compass" between the two views; in other words, it provides a canonical representation of the clusters by giving us corresponding "axes" in the form of correlation directions.
Figure: Correlation directions found by CCA.
The second view is generated by rotating the first view through a random angle only. Green lines are the correlation directions. The direction of maximal correlation is that line closest to the cluster diagonal.
Figure: Correlation directions found by CCA.
The second view is generated by rotating the first view then adding random noise. Green lines are the correlation directions. The direction of maximal correlation is that line closest to the cluster diagonal.

0 Comments:

Post a Comment

<< Home