3 Why would you need it?
3.1 Features
- Handles multi-valued attributes. This has already become widespread
among decision tree learners, but some implementations still only support
binary ones.
- Allows multiple target attributes. At the time of writing, there
does not seem to be any publically available implementation that provides
this feature.
- Supports structured data, both for input and target
attributes. This, too, is still a little-understood, hot research topic
in machine learning.
- Allows recursive data. This kind of data is even less common in the
machine learning community. AIFAD can learn models for recursive data,
but currently only induce non-recursive functions.
- Handles missing values by algebraic encodings into option types.
This is a much more comprehensible solution than the usual ones and has
competitive accuracy.
- Provides several heuristics of generating models, including a
generalized gain ratio criterion as used by the state-of-the-art decision
tree learner C4.5.
- Can read data specifications as understood by C4.5.
- Pretty efficient. Even though it was written to handle more complex
forms of data, it is only about 3-5 times slower than C4.5 on datasets
that the latter supports. The memory footprint is somewhat larger due to
the impossibility of performing certain operations in-place, but still
low enough to handle even our largest datasets on stock hardware.
Generally speaking, you have the same amount of expressiveness on both
sides of the data (input and output), which is a very neat solution. Usual
discrete data specifications as supported by more common decision tree
learners (e.g. C4.5) are a strict subcase of algebraic datatypes as
used in AIFAD. Furthermore, all kinds of algorithms that operate on
“normal” decision trees or data should be generalizable to the more
powerful representation.
3.2 Missing features
- Cannot (yet) handle numerical values. Should not be too difficult to
implement, but there is a very general solution involving abstract
datatypes, which can even go beyond mere handling of numerical values. It
may be more interesting to research into this direction than to just
implement the special case. If there is numerical data in C4.5-data,
AIFAD will ignore it during learning, thus only considering discrete data.
- Does not (yet) provide post-pruning. Can be easily added (e.g.
reduced error pruning).
- Cannot (yet) learn recursive functions. It would certainly be
extremely interesting to support this, but there is still a lot of
research ahead before this can be done somewhat efficiently. Learning
certain classes of recursive functions (e.g. primitive recursive ones)
might be a viable starting point.