Articles on: Training models

Columns to ignore

In machine learning, although it is not suggested to remove columns unless you are confident that the removed columns will not affect the result, you might still find it necessary to ignore certain columns (features) from the dataset for several reasons:

Irrelevant columns: Some columns may contain information that is not relevant to the problem being solved and can potentially harm the model's performance. For example, in a model that predicts the likelihood of a person being approved for a loan, the person's favorite music genre might be an irrelevant feature.

Correlated columns: Columns that are highly correlated with each other can lead to a condition where the independent variables are too strongly related, making it difficult to determine the effect of each feature on the target variable.

Insignificant columns: These are features that have a relationship with the target variable, but their relationship is not strong enough to be considered important. They might not have a high correlation with the target variable, or they might have a weak relationship when considered in combination with other features. For instance, in a model that predicts the likelihood of a person being diagnosed with a certain disease, the patient's hair color might be an insignificant feature.

Features test set doesn't have: If you have different kinds of training and test sets, and the test set doesn't contain a feature the training set has, and it is not possible to create this feature based on other features the test set, you need to remove that column as a machine learning model expects the same features in both training and test set.

It's important to evaluate the columns in a dataset carefully and use techniques like feature selection and feature engineering to identify and remove the columns that are not necessary for the model's performance.

Updated on: 02/02/2023

Was this article helpful?

Share your feedback

Cancel

Thank you!