[Tech Dive] A Practical Guide to AI Project Development, Part 3: Feature Engineering
This is the third article in a six-part series on how to approach AI development projects.
In this article, we will discuss feature engineering: the holy grail of AI, and the most critical step that determines the quality of AI outcomes. Irrespective of the algorithm used, feature engineering drives AI model performance, governs the ability of machine learning to generate meaningful insights, and ultimately solves business problems.
Data is like the crude oil of machine learning, meaning that it has to be refined into features – predictor variables – in order to be useful for training a model.
What is Feature Engineering?
Feature engineering involves the application of business knowledge, mathematics, and statistics to transform data into a format that can be directly consumed by machine learning models.
For example, predicting which customers likely to churn in any given quarter implies having to identify potential customers who have the highest probability of no longer doing business with the company. How do you go about making such a prediction? We make predictions about the churn rate by looking at the underlying causes. The process is based on analyzing customer behaviour and then creating hypotheses.
For example, customer A contacted customer support five times in the last month – implying customer A has complaints and is likely to churn. In another scenario, customer A’s product usage might have dropped by 40% in the previous three months; again, implying that customer A has a high probability of churning. Looking at the historical behaviour, extracting some hypothesis patterns and then testing those hypotheses is the process of feature engineering.
Next, to give you a more advanced understanding of feature engineering, we will discuss some details of how we do feature engineering on one of the most popular machine learning tasks – Computer Vision.
Features in Computer Vision
In computer vision, a feature is a measurable piece of data in your image which is unique to this specific object. It may be a distinct colour in an image or a specific shape such as a line, edge, or an image segment.
A good feature is used to distinguish objects from one another. For example, if I give you a feature like a wheel, and ask you to guess whether the object is a motorcycle or a dog. What would your guess be? A motorcycle. Correct! In this case, the wheel is a strong feature that clearly distinguishes between motorcycles and dogs.
Suppose I give you the same feature (a wheel) and ask you to guess whether the object is a bicycle or a motorcycle. In this case, this feature isn’t strong enough to distinguish between both objects. In this case, we need to look for more features, such as a mirror, license plate, or maybe a pedal – features which collectively describe an object.
In AI projects, we want to transform the raw data (image) into a features vector to show our learning algorithm how to learn the characteristics of the object.
Extracting features (hand-crafted vs automatic extracting)
Traditional ML uses hand-crafted features
In traditional machine learning problems, we spend a good amount of time in manual features selection and engineering.
In this process, we rely on our domain knowledge (or partnering with domain experts) to create features that make machine learning algorithms work better. Some of the handcrafted feature sets are colour, texture, shape, structure, etc. These features could be obtained through traditional image processing techniques such as the following:
- Haar Cascades
- Histogram of Oriented Gradients (HOG)
- Scale-Invariant Feature Transform (SIFT)
- Speeded Up Robust Feature (SURF)
Deep learning automatically extracts features
In deep learning, we don’t need to extract features from the images manually. The model itself automatically extract features and learns their importance on the output by applying weights to its connections.
We feed the raw image to the network, and as it passes through the network layers, wherein patterns within the image are identified to create features.
Neural networks can be thought of as feature extractors + classifiers, which are end-to-end trainable; traditional ML models, on the other hand, use hand-crafted features.
Above is an example of which features deep learning models (Convolutional Neural Networks) identify when classifying different types of images. It’s worth noting that these features aren’t general, low-level features such as edges or corners; instead, they are tailored for each class. That is the power of training a model to extract features automatically.
Because learned features are extracted automatically to solve a specific task, deep learning models are extremely effective at image classification. In fact, deep learning models that perform feature engineering outperform models that classify manually extracted features by a large margin. This is one of the reasons behind why deep learning is so popular.
On the other hand, deep learning models provide no control over what features the neural networks will extract from the data. In many cases, these features are only useful for the task that they were trained for, and have no real-world interpretation.
The real goal is to engineer features that will help models to learn. The only way to get good features is through experimentation.
Due to the huge diversity of potential features, feature engineering is often called an art.
Yinghua is a Machine Learning (AI) engineer with a special interest in Computer Vision and Natural Language Processing (NLP). He holds an M.Sc. in Data Science and a B.Sc. in Applied Mathematics.