In this section we will give a short summary of design hints that one should apply when creating new data specifications.
The following steps may give a suitable start:
An annoying but important point in machine learning concerns the treatment
of missing values, i.e. value tags that indicate that some value could
not be observed or is even meaningless in a certain context. An often
used approach applied in traditional techniques is to explicitly encode
(“hardwire”) strategies for handling missing values in the learning
algorithm. Given more expressive representations it seems necessary to
ask whether this is still required or even advisable: more declarative
ways of specifying (handling of) missing values may be possible.
Lets consider the following definition:
t = A | B
If this variable could have missing values, then users of machine learning systems often extend this definition as follows:
t = A | B | Missing
This, however, may cause not necessarily expected results. Assume that
some dataset that uses t as result variable contains ten As, nine
Bs and eleven missing values. Then an algorithm taking the most frequent
value would predict Missing. But since the opposite of a missing
value is an available value, we would actually have to predict that
there will indeed be an observation. This case happens more frequently,
because it encompasses both the observations of As and Bs.
Therefore, it may be more suitable to choose the following definition:
t = Observed value | Missing value = A | B
This advises the learning algorithm to treat observed values separately from missing ones. We can even specify structures that explain in more detail how to interpret missing values. For example:
t = Relevant observable_value | Irrelevant observable_value = Observed value | Missing value = A | B
This allows more precise specification of why some value is actually missing: because it may be irrelevant8 or because it could not be observed for some reason, etc. In all these cases the result may indeed be a consequence of the input data, which is why prediction of missing values does indeed make sense. E.g., some value may have been unobservable, because of other (observable) conditions that make collecting data very difficult.