A quick and dirty intro to CCA.
CCA -
A good introduction to CCA can be found in the paper Eigenproblems in Pattern Recognition. In short CCA works as follows: CCA assumes that you have two views of the same underlying data. Think two data clouds that "live" in different feature spaces. The goal of CCA is to find two linear projections of these data sets such that the correlation of the two projections is maximized.
For example, previous applications of CCA onto real world problems have included finding correlations between two sets of text documents in two different languages. CCA should in theory pick up words that are strongly correlated to each other across documents, effectively learning word translations. In another application researchers apply CCA to find pixels that are strongly correlated to sound in a video.
Mathematically, CCA is posed as the following maximization problem:
find the directions x, y that maximize the correlation x'X'Yy
such that |Xx| = 1 and |Yy| = 1.
The apostrophe operator ' denotes matrix transpose. X and Y are the two data matrices, with one data point per row. The values Xx and Yy are the projections of the data onto the directions x and y. If you fix the projection lengths to one, then the correlation of the projections is given by (Xx)'(Yy) = x'X'Yy.
The solution to the optimization problem above reduces to a generalized eigenvalue problem. It should also be noted that the dual formulation of the problem is kernelizable.
Sparsity -
In general the solution directions that result from the CCA problem are linear combinations of all the dimensions of the respective feature space. This can lead to problems in interpreting the results of CCA. For example, in the language translation task above, the directions that maximize correlation would be a linear combination of words in the respective word spaces. What does it mean for one to have .5 of one word plus .3 or another word? The researchers in the Pixels That Sound example encounter the same problem. They find that the projection direction in pixel space that is correlated to sound is a linear combination of many (nearly all?) the pixels in the pixel space. This means that picking out specific pixels to correlate to sound is difficult because CCA tells you that all of the pixels are at least slightly correlated.
To alleviate this problem you can add a sparsity constraint to the optimization problem: you restrict the number of dimensions that the directions x and y live in. This means that instead of linear combinations of all the dimensions in your feature space, you get a small collection of highly correlated dimensions. In the language translation example you should get a small collection of relevant words. In the Pixels That Sound example this gives you a small collection of highly correlated pixels.
Of course adding the sparsity constraint makes the optimization problem more difficult to solve, (in the worst case you make the problem NP-hard.) So the practical solution is to relax the sparsity constraints and solve an approximate problem. This kind of relaxation is performed for sparse PCA (another similar statistical method) by my advisor, et al., in this paper.
Duality - (What I've been working on so far.)
Optimization problems such as the one above have duals, i.e., alternate formulations. Sometimes these formulations are easier to solve the original problem, lead to interesting insights, allow the problem to be kernelizable. Other times they are useless. I have spent the last week or so playing around with the sparse CCA problem in hopes of deriving a dual formulation. The original problem is posed as a Quadratically Constrained Quadratic Program (QCQP). I have tried to formulate a dual which is also a QCQP but have come across some problems. I was about to all it quits when I came across some references indicating that, in general, the dual of a QCQP is a semidefinite program problem (SDP) so I think I'm going to give that a crack for another few more hours.
0 Comments:
Post a Comment
<< Home