Feature Engineering
What is Feature Engineering?
Feature engineering refers to the process of using domain knowledge to select and transform the most relevant variables from raw data when creating a predictive model using machine learning or statistical modeling. The goal of feature engineering and selection is to improve the performance of machine learning (ML) algorithms.
The feature engineering pipeline is the preprocessing steps that transform raw data into features that can be used in machine learning algorithms, such as predictive models. Predictive models consist of an outcome variable and predictor variables, and it is during the feature engineering process that the most useful predictor variables are created and selected for the predictive model. Automated feature engineering has been available in some machine learning software since 2016. Feature engineering in ML consists of four main steps: Feature Creation, Transformations, Feature Extraction, and Feature Selection.
Related Resources
A Primer on Feature Engineering
Feature engineering is the process of selecting, interpreting, and transforming structured or unstructured raw data into attributes (features) that can be used to build effective machine learning models which more accurately represent the problem at hand. In this context, a “feature” refers to any quantifiable unique input that may be used in a predictive model, […]
Read MoreHierarchical Features and their Importance in Feature Engineering
Feature engineering is both a central task in machine learning engineering and is also arguably the most complex task. Data scientists who build models that need to be deployed at large scales, across functional, technical, geographic, demographic and other categories have to reason about how they choose the features for the models. Despite the divergent […]
Read MoreFeature Stores: The CEO’s Guide
As industries across the globe attempt to adapt to the big data architecture, expensive and ineffective feature engineering practices mean that businesses are very likely to “hit a wall” when it comes to organizing their machine learning operations (MLOps). A lot of time is consumed in data ingestion, and lackluster machine outputs indicate that stakeholders […]
Read MoreStay updated on the latest and greatest at Scribble Data
Sign up to our newsletter and get exclusive access to our launches and updates