Missing Variables

Using missing as a categorical variable

It’s possible to use missing as a label for categorical variables when their number is high or relevant. Otherwise it may be seen as an adding another rare categorical variable so the data set.

Using random values for missing data

It’s possible to use random sample imputation to fill the values by selecting random values among the possible values found in the data set.

Using this is a bit weird for my taste however when for example your looking for a prediction and not all values are presented in your sample then you can find a value by only the supplied values.

For random variables we need to get the values by seeding the random number generator with the give values. We’ll obtain the same values for random variables for the same supplied values. This will provide some kind of consistency.