Samy Bengio
The machine learning community has lately devoted considerable
attention to the decomposition of large scale classification
problems into a series of sub-problems and to the recombination
of the learned models into a global model. Two major motivations
underlie these approaches:
- 1.
- reducing the complexity of each single task, eventually by
increasing the number of tasks,
- 2.
- improving the global accuracy by combining several classifiers.
These motivations are particularly relevant to the research
themes covered by IDIAP (such as speech recognition and computer
vision tasks), since the databases we are typically dealing with are
of large size: the number of attributes can be several
hundreds; the number of data points in the order of several
thousands and the number of classes (in classification tasks)
is typically 10 or more (10
digits, 26 characters, 30-60 phonemes, etc.). To handle each of
these scaling problems, a series of subtasks is typically
defined where each subtask focuses either on a subset of the
attributes (feature selection); on a different sample
of the data (resampling, i.e. sub-sampling, bagging,
boosting, etc.); or on a different relabeling of the data
(decomposition of polychotomies into dichotomies).
When mixing several basic learners, the global accuracy can
improve beyond that of the best basic learner only if the
errors of the learners are not too positively correlated. This
is ensured either by changing the model used to learn each
sub-problem (e.g., by using models from different families or by
modifying the parameters of the model from one subtask to
another one) or by varying the data set used to train each model
(e.g. by feature selection or resampling).
A range of solutions have been proposed in the literature for
the combination of different models into a global system.
In the simplest case, this is done with a majority vote; in
other situations, this combination is taken as a new learning
problem having as inputs the outputs of the basic models
(stacking). Finally, in its most elaborated form, this
combination is dynamic (i.e. varies with each input) and its
parameters are determined simultaneously with the training phase
of each basic model. The latter form is the so called
mixture of experts (ME) model, which has been developed
in a rigorous probabilistic framework in the early nineties and
has been widely studied and extended since then. It was the main
object of study of the last 3 years of research of one of our previous
project.
As stated before, the less correlated the experts
(basic models) are, the better the performance of the ME model.
A ME has an inherent bias towards uncorrelated experts because
of its dynamic recombination that partitions the input space in
different regions on which the experts specialize. To further
favor this property, a natural way is to use feature selection
and dimensionality reduction in such a way that each expert
relies on a different set of inputs.
The expert models of a ME used for classification problems can
be based, in principle, on any method that estimates a
posteriori class probabilities (e.g. neural networks).
Introduced in 1995, the support vector machine (SVM)
has been shown to be an extremely powerful learning tool for
2-class problems. Although an SVM does not output probabilities,
its good performance makes it very appealing to find a way to design a ME
model with experts based on SVMs.
SVMs were originally designed for binary classification problems.
In our research group, we have acquired some know-how in the
decomposition of multiclass classification problems into 2-class
sub-problems. Besides different strategies of decomposition, we
have only investigated static combination techniques (i.e.
combinations of the binary classifiers which do not depend on
the inputs) so far. The design of K-class (or multiclass) classification systems
decomposed into binary classifiers and recombined dynamically
(i.e. MEs for classification with binary classifiers as experts)
constitutes a new field of study of great potential in the field
of pattern classification in general and in speech processing
and computer vision tasks in particular.
This research project is thus composed of three main parts:
- A.
- exploitation of feature selection in mixture of experts models,
- B.
- elaboration of a mixture of experts based on support vector machines,
- C.
- development of a mixture of binary classifiers for multiclass classification.
Keywords:
learning,
classification,
mixture of experts,
support vector machine,
feature selection,
dimensionality reduction,
resampling,
binary classifiers for multiclass classification.
|