Signup/Sign In
LAST UPDATED: SEPTEMBER 7, 2021

What is Feature Extraction and Feature Selection?

    It is nowadays becoming quite common to be working with datasets of hundreds (or even thousands) of features. If the number of features becomes similar (or even bigger!) than the number of observations stored in a dataset then this can most likely lead to a Machine Learning model suffering from overfitting.

    Feature Extraction

    Feature Extraction is quite a complex concept concerning the translation of raw data into the inputs that a particular Machine Learning algorithm requires.

    The model is the motor, but it needs fuel to work.

    Features must represent the information of the data in a format that will best fit the needs of the algorithm that is going to be used to solve the problem.

    While some inherent features can be obtained directly from the raw data, we usually need to derive features from these inherent features that are actually relevant to attack the underlying problem. Having said that, I will share one more quote-worthy line,

    A poor model fed with meaningful features will surely perform better than an amazing algorithm fed with low-quality features - "garbage in, garbage out".

    Feature extraction fills the following requirements:

    It builds valuable information from raw data - the features, by reformatting, combining, transforming primary features into new ones, until it yields a new set of data that can be consumed by the Machine Learning models to achieve their goals.


    Feature Selection

    Feature Selection, for its part, is a clearer task. As per the feature selection process, from a given set of potential features, select some and discard the rest. Feature selection is applied either to prevent redundancy and/or irrelevancy existing in the features or just to get a limited number of features to prevent from overfitting.

    Note that if features are equally relevant, we could perform the PCA technique(Principal component analysis) to reduce the dimensionality and eliminate redundancy if that was the case. Here we would be doing feature extraction, as we were transforming the primary features and not just selecting a subset of them.


    But what is the need for all this?

    First of all, we have to take into account what kind of algorithm we are going to feed with the produced features. Abstraction skills, irrelevancy, and redundancy sensitivities vary a lot depending on the specific Machine Learning technique.

    In general, a minimum of feature extraction is always needed. The unique case when we wouldn't need any feature extraction is when our algorithm can perform feature extraction by itself as in the deep learning neural networks, that can get a low dimensional representation of high dimensional data. In spite of this, it must be pointed out that getting success is always easier with good features.

    We should apply feature selection, when there is a suspicion of redundancy or irrelevancy since these affect the model accuracy or simply add noise at best. Sometimes, despite having relevant and non-redundant features, feature selection may be performed only to reduce the number of features, in order to favor interpretability and computing feasibility or to avoid the curse of dimensionality phenomena, i.e., too many features to describe not enough samples.

    And Feature Engineering? Well, sometimes it is used as a synonym for feature extraction, although contrary to extraction, there seems to be a relatively universal consensus that engineering involves not only creativity constructions but pre-processing tasks and naive transformations as well.

    And are these concepts related to data mining? Yes, of course, but… stop!!

    Perhaps it is too soon to try to label any tasks involved in the Machine Learning field and it is good enough just knowing what makes sense as an input to help our model to succeed until an automatic feature extraction tool came up with an alternative.

    While we wait, maybe less than we think, don't take features for granted; most of the time problems are in the data, not in the algorithm. Good recipes need good ingredients, so take care of your features. See these concepts sound difficult but are not that difficult to understand.

    You may also like:

    Incoming Software Engineer @Vedantu, Codeforces (1765, expert). Former Summer Intern @Wikimedia Foundation(GSoC), @Egnify, @Vedantu.
    IF YOU LIKE IT, THEN SHARE IT
    Advertisement

    RELATED POSTS