Wednesday, January 17, 2007

notes from meeting with doug

I had a talk with Doug last night about the direction in which to take this project (at least as far as the the ICML deadline is concerned.) As was the original motivation, I was planning on showing how you can use CCA to (try to) solve a vocabulary selection task, specifically on our music/text data set.

A key insight is that this puts an emphasis on the application, not the method. With this being a general machine learning conference, emphasis on this admittedly narrow application may not be the best way to impress upon the community the usefulness of CCA, much less sparse CCA. It is true that CCA is a far lesser known cousin to PCA. While the name is thrown around the hallways of machine learning departments all around, it does seem like CCA is one of those tools that falls short of being standard canonical instruction material. I'm not touting that CCA should be taught in all machine learning intro courses or anything like that, CCA is in some respects deservingly relegated to the second ranks of statistical tools because the conditions in which such a tool would be used (i.e., when dealing with heterogeneous) is rarer.

However dealing with heterogeneous data is a very obvious ground for exploration, and anyone interested in such a type of data modeling should be aware of this tool. This is the point that I will try drive home for this paper, that, in a general machine learning framework, CCA is a very important tool when it comes to modeling heterogeneous data, and that additionally there is this slick way to impose sparsity on the solution. So in some respect part of this may be doing some consciousness raising, depending on just how much previous research has been on concerning CCA.

The next step is to scour the web for any good CCA data sets that may be around, and performing a more comprehensive literature review.

0 Comments:

Post a Comment

<< Home