So took a bit of a hiatus on the sparse CCA project last quarter to help out my lab mates, Doug and Luke, on another music research paper. Now that that's over with I'm back to my project and committed to getting serious work done as, for one, my time and money here in UCSD is running out, and two, there's a great opportunity to get a research paper out of this as
ICML has their deadline next month.
The bad news is that that gives about a month to get this baby off of the ground. With a little bit of luck (i.e. hard work) perhaps I can make the deadline.
Anyway, at the end of last quarter I ran some experiments comparing sCCA with other methods of inducing sparsity. I wanna talk about some of the results here. Before I do that I want to add a note about something I stumbled across as solution to the regularization issue that I encountered a few months ago.
The regularization issue arises because of the problem constraint,
x'X'Xx = 1 (and similiarly for
y) In the math, it turns out that in solving the CCA problem involves inverting the matrix
X'X, which is an estimation of the covariance matrix for our model. Note that a problem could surface if the eigenvalues of
X'X are small, because any small error in the estimation of these values blows up when you invert the matrix.
The solution to this is to add a small diagonal component to the matrix (
X'X + a*I, for a small
a ) which effectively restricts the degrees of freedom of the solutions. Admittedly I'm not too clear on why this works , I need to get back to the math and figure this out.
Of course you can get rid of this regularization issue entirely by replacing the contraint above with
x'x = 1. Doing this, the matrix
X'X never shows up in the math and hence, no regularization problem. Doing this means that we are solving the CCA problem (
x'X'Yy) subject to the constraints that the vectors
x and
y are of unit length. In doing this we lose the interpretability of the correlation score,
x'X'Yy, because they are no longer correlations proper, i.e., values between 0 and 1. However in the case when we are only concerned with the correlation directions and not the values then this is okay. Besides you can always go back and find the scale factor that will give you a correlation using elementary algebra.
This said, I think this is a much better method to solve CCA as opposed to the standard method with regularization as is commonly presented. The old method imposes a regularization parameter which has to be hand tuned, and I have yet to come across a principled way of setting this parameter. Through this second method, you can simply set the constraint
x'x = 1 (and similarly for y) and forget all about regularization.
Now lets shift gears to some quantitative studies I performed at the end of last quarter. I compared sCCA to the unconstrained CCA problem and with two alternative methods of imposing sparsity. One method of imposing sparsity is simply to solve the CCA problem in which you constrain the 1-norm of the canonical directions. This is a very popular method of imposing sparsity constraints. Another method is to simply threshold the unconstrained CCA directions, in my experiment I simply pick the top
n directions where
n is a randomized parameter in the experiments.
In all experiments we constrain the 2-norm of the canonical directions to be unit length, hence we can compare the resulting scores. The figure above shows the covariance scores for each method on synthetic data. It's a little hard to read so let me say here that bar 1 shows the score for 1-Norm CCA, bar 2 is for sparse CCA, bar 3 is for thresholded CCA, and bar 4 is for plain-vanilla unconstrained CCA (These values are averages over about 50 trials, their standard deviations are all low). As expected all scores do no better than than unconstrained CCA. It's encouraging that sCCA (bar 2) does as well as it does, especially compared to how poorly norm-1 CCA performs, however
a quick look at the actual sparsity of sCCA shows that it is not much sparser than the unconstrained CCA directions. See the figure below.
Again, it looks a littel crappy, sorry, but in this figure the panels, reading them left to right, top to bottom, the first panel shows the values of one of the canonical directions for the 1-Norm CCA. Very sparse wouldn't you say? Panel 2 shows sCCA, Panel 3 shows thresholded CCA where we keep the top 10 largest entries, and Panel 4 shows the unconstrained CCA direction.
The important thing to notice is that sparse CCA really isn't that much more sparse than the unconstrained direction. This could mean that adding sparsity constraints is a bit of overkill, of course the unconstrained solution is itself pretty sparse. If I can do it quickly, I'd like to set up a test on synthetic data which isn't so sparse.
I need to go back and check out Gert's Sparse PCA paper again and see just how sparse their solutions are.
So to recap here, I'm concerned that even though Sparse CCA is scoring pretty high, it isn't doing much more than plain CCA would. If this is the case then adding the sparsity constraints is unnecessary overhead. There are a few nuances to dig into though, because my sparse CCA solution is based on solving a sparse PCA problem, which of coarse has worked well for Gert.