As a rule of thumb, if less than 5% of the observations are missing, the missing data can simply be deleted without any significant ramifications (3).
- What percentage of missing data is acceptable?
- How much missing data is acceptable for single imputation?
- How do you deal with 50% missing data?
What percentage of missing data is acceptable?
How much data is missing? The overall percentage of data that is missing is important. Generally, if less than 5% of values are missing then it is acceptable to ignore them (REF).
How much missing data is acceptable for single imputation?
Scheffer (2002) suggests complete cases can be used if no more than 6% of the data is missing, single imputation if no more than 10% of the data is missing and more complex procedures such as multiple imputation if between 10% and 25% of the data is missing.
How do you deal with 50% missing data?
Run predictive models that impute the missing data. This should be done in conjunction with some kind of cross-validation scheme in order to avoid leakage. This can be very effective and can help with the final model. Use the number of missing values in a given row to create a new engineered feature.