Modelling Large Datasets Using Algebraic Datatypes:
A Case Study of the CONFMAN Database Markus Mottl Austrian Research Institute for Artificial Intelligence Schottengasse 3, A-1010 Vienna, Austria markus@oefai.at May 15, 2002 |
Abstract:
Being able to provide clear specifications of large datasets comprising hundreds of variables, each of which can take on many different values, while still being able to efficiently and accurately learn functional relations from such data would certainly make data mining techniques even more viable in the real world. In this report we describe a new modelling approach, which essentially generalizes discrete decision tree learning to induction of non-recursive functions over algebraic datatypes. Taking the CONFMAN mediation database as guiding example, we demonstrate how this approach allows us to give more natural data specifications that can take into account semantic aspects which are hard or even impossible to model in common attribute-value representations. We will also explain how this can have a positive impact on accuracy and efficiency.
This document was translated from LATEX by HEVEA and HACHA.