The CONFMAN mediation database1, a detailed collection of currently 3676 mediation attempts in 309 conflicts, consists of a total of 237 attributes including 1168 different attribute values. Achieving improvements in excess of the base line accuracy has turned out to be hard2, which may be a consequence of the used data representation not allowing the expression of certain semantic properties. Furthermore, the comprehensibility of learnt models may also benefit from more fine-grained representations.
This report will not go about the details of learning using the more expressive representation that builds on algebraic datatypes, but will focus on how to design such data specifications, describing their advantages as we go along. It suffices here to mention that learning using the strictly more general representation can be done equally efficiently as with decision trees. The generalized algorithm actually performs absolutely identical when given a specification that can be handled in the traditional way.
We will furthermore assume that the reader has a basic understanding of algebraic datatypes3, which are conceptually quite simple.