Research Abstract
(written in 30 minutes :))
To
be rewritten.
Some of my own IT notes:
TAO/PETSc/Lapack
Installation on Cygwin: [link] (updated on August 5, 2007)
Using Matlab
C/C++ Math Library
without Installing Matlab or its Compiler:
[link]
(updated on August
7, 2007)
Accessing remote computer
via ssh without keying in password every time: [link]
TAO/PETSc Installation on
SoC@NUS computing clusters: [link]
Practical examples for
manipulating Vectors and Matrices in PETSc: [c
file]
Some research
notes I wrote for presentation at reading groups:
Maxmargin Methods for
Structured Outputs: [my notes]
Survey Propagation: [my
notes] [original
paper]
Loopy BP for Max bmatching:
[my notes] [original
paper]
Message Passing Formula in
Gaussian MRF [notes]
Proof of Factorization of
Treestructured Distributions [notes]
A Very Gentle Note on the Construction of Dirichlet Process
[notes]
Below was written when I was
in Singapore.
OUT OF DATE
ALREADY. Will be updated.
Note: Here are some background information of the area I study.
For my OWN work, please see
Publications.
SemiSupervised
Learning
I
planed to write something for Semisupervised Learning.
However, after reading
Zhu Xiaojin's PhD dissertation
(now Assistant Professor at UWisconsin), I find it far better to read
his
thesis. He also maintains a
web
page that puts together the historical and cutting edge research
in semisupervised learning. The work is uptodate (May
2005), clear and comprehensive. Proudly, I got my
Bachelor's degree from the same university as Prof. Zhu, Shanghai Jiao Tong University (though he graduated and
left the university 3 years before I was admitted)
.

Learning on Structured Data
I also planed to write something for
Structured Data. However, after reading
Ben Taskar's PhD
dissertation, I find it far better to read his
thesis. It contains his work
that won the Student's Award for NIPS 2003. The work is
uptodate (December 2004), clear and comprehensive.
Here
is a list (must be incomplete) of recent papers (up to early 2005) on machine
learning with structured data.

Optimization tools and miscellaneous Linux programming skills
There is a huge number of
optimization tools/software available online. What I like to
use is
TAO,
based on
PETSc. This package utilized
MPI, thus it is
very suitable for maximum entropy models and CRF. In fact, we
used 30 processors to compute the objective function value and
gradient (expectations) in parallel, by uniformly distributing data
examples to all processors available and then assemble their
contribution to calculate the gradient.

