Using missing as a categorical variable

It’s possible to use “missing” as a label for categorical variables when their frequency is high or relevant. Otherwise, it may be seen as adding another rare categorical variable to the dataset.

Using random values for missing data

Random sample imputation can be used to fill missing entries by selecting random values from those already present in the dataset.

Using this approach feels a bit unusual to me. However, when you are looking for a prediction and not all values are present in your sample, you can estimate a value based only on the supplied data.

For random variables, we need to generate values by seeding the random number generator with the given values. This ensures we obtain the same values for random variables whenever the same inputs are supplied, providing a degree of consistency.