Anomaly detection refers to the process of analysing data sets to detect unusual patterns and outliers that do not conform to expectations.
It takes on even more importance in a world where enterprises depend heavily on an intricate web of distributed systems. With thousands of potentially important data items to monitor every second, it is nearly impossible to pinpoint the cause of an error and quickly resolve it without a robust anomaly detection process.
What is an anomaly?
Before we proceed, let’s define what constitutes an ‘anomaly’. Broadly, anomalies can be categorized as follows:
- Contextual anomalies: There is a contextual reason that explains the anomaly. These types of anomalies are usually found in time-series data sets.
Example: Spending $300 on clothes in a single day might be normal during the holiday season but may be odd during the rest of the year. - Point anomalies: Here, a single item of data stands out as an anomaly if it is too different from the rest.
Example: Credit card frauds may be detected by examining an unusual amount spent in a single transaction. - Collective anomalies: Sometimes, a collated set of data instances can be used to detect an anomaly.
Example: Someone is trying to copy multiple files from a web server unexpectedly, which could signal a potential hack attempt.
Why is Anomaly Detection important?
Anomaly detection offers a significant competitive advantage to businesses that embrace its potential. Some of the most significant benefits are:
- Radical cybersecurity upgrades
Hack attempts and malicious entities have never been more of a threat to businesses and online data. IBM reports that in 2022, the average time that would elapse before a breach was detected was a whopping 277 days!
Imagine the amount of damage 277days of unchecked malicious interference into private data could cause.
However, with a robust anomaly detection solution, security breaches can be detected in real-time and can be acted upon immediately. - Automation of KPI analysis
When done manually, KPI analysis is a monumental task that involves poring over a ton of digital data coming in from different dashboards. But with an anomaly detection system, AI algorithms scan, collate and analyse metrics across all your platforms 24/7.
As unusual patterns and outliers become apparent in real-time, it is much easier to pivot your strategies. - Revealing hidden opportunities
Anomaly detection can unearth potential niches and opportunities that wouldn’t be apparent otherwise. For example, anomaly detection could point out untapped SEO keywords that have great potential for driving traffic and conversions to your website. It could save you months of manual tracking, analysing Google Analytics reports and trial and error. - Maximize the potential of your talent and resources
With modern anomaly detection capabilities , cutting-edge business analytics and insights are now available to much smaller teams. With an enterprise-grade decision making platform like Enrich, organizations can easily build decision-making products that solve various anomaly detection use cases without depending on DevOps, software engineers, or a big-tech sized Data Science/ML Engineering team.
Methods of Anomaly Detection
There are a number of widely used methods for anomaly detection. These are some of the more common ones:
- Supervised learning-based anomaly detection: These algorithms are defined by their use of labeled datasets which are designed to train the algorithm to classify or categorize data and use it to predict outcomes. There are two major types of problems that a supervised anomaly detection algorithm attempts to solve.
-
- Classification: Here, the algorithm uses the categories it has previously been trained to recognize to accurately sort input data into specific categories. One of the most commonly used applications of classification is to separate spam from important emails as they reach your inbox.
- Regression: Here, the algorithm uses training data to understand how dependent and independent variables are related. This is useful for predicting outcomes based on different data points, such as revenue projections for a business.
Some of the most commonly used supervised anomaly detection machine learning algorithms are Support Vector Machine Learning, K-Nearest Neighbors Classifier and Supervised Neural Networks.
- Clustering-based anomaly detection: Clustering is one of the most widely used techniques in the field of unsupervised learning. Clustering-based anomaly detection operates upon the assumption that data points that are similar belong to similar clusters, as is determined by how far away they are from local centroids.One of the most popular clustering algorithms is ‘K-Means’. This algorithm creates a total of ‘k’ clusters of similar data items. Data that falls outside these clusters may be classified as an anomaly.For example, in the automotive sector, pricing anomalies often occur for components of different products (despite similar physical characteristics) which raises the total production cost. Since the detection of these price discrepancies involves observing thousands of different pieces of data, it is often neglected leading to inconsistent product engineering. However, a K-means based anomaly detection algorithm has been shown to be effective at automatically grouping parts with similar physical characteristics, thus enabling efficient outlier detection and reducing costs in the long run. (Source: Applied Sciences Journal)
- Density-based anomaly detection: This type of anomaly detection is related to clustering and is based upon the assumption that normal data points converge in dense proximity and abnormalities lie further away.In December 2019, researchers published a paper in the International Journal of Medicine showcasing the efficacy of density-based anomaly detection methods for improving machine learning models for stroke outcome prediction. Specifically, the DBSCAN algorithm was proven to be most effective at identifying abnormalities when used on a large dataset obtained from a prospective stroke registry in Taiwan. The tool developed as a result of this detection method can be further iterated upon over time to provide increasingly accurate data quality in stroke outcome measures. (Source: Science Direct)Here, the nearest set of data points are scored using Euclidean distance or a similar metric depending upon whether the data is categorical or numerical.
- Time series anomaly detection: Businesses use a lot of time-series based data sets such as web activity (daily traffic, bounce rates, number of users), marketing (CPC, customer acquisition costs, retention rates) and sales (average order value).To detect anomalies within this data, a baseline or range of normal behaviour for a given KPI must be specified. It is also important to take into consideration seasonal events and trends such as an increase in web traffic or the volume of customer inquiries during specific times of the year. A time series model is built using historical trend data.
The model specifies an expectation on the time series data at the next (or some future) time instant, and large deviations from this expectation can be flagged as an anomaly. - Hierarchical anomaly detection: This is not a different type of anomaly detection method, but rather, enables the detection of anomalies across varying scales or groupings of the original dataset.
Hierarchical anomaly detection uses the standard approaches to flagging anomalies but treats sub-groups of the data as independent dimensions. For instance, consider a dataset containing sale volume and visitor volume to an e-commerce website. While it is useful to detect anomalies in the individual number-of-sales and number-of-visitors attributes, we can treat the combination of both attributes as 2-dimensional points in space and look for outliers.
What are the major challenges in Anomaly Detection?
When it works, anomaly detection showcases the power of machine learning in a manner that feels like magic. However, getting the magic to work as intended can be a remarkably difficult process when you look under the hood.
Some of the most common challenges that need to be overcome during anomaly detection are:
Data Quality
One of the first questions you may ask while considering an anomaly detection model is “Which algorithm should I choose?” Naturally, your answer will depend upon the nature of the problem you are trying to solve.
But even more crucial than selecting the right algorithm is the quality of your input data.
Data quality is the single biggest factor that will determine how successful your anomaly detection model can be. Your input data sets may have several problems – incomplete entries, inconsistent formats, duplicates, different benchmarks for measurement, human error – that must be ironed out meticulously to give the ML model the best chance of succeeding.
Size of training data samples
Having a large enough training data set is extremely important for several reasons. If you don’t have enough training data, the algorithm won’t have enough historical context to accurately build a model of what “normal” data looks like.
One of the easiest ways to understand the problems that may be caused by an insufficient training set is to consider the example of a supermarket. As part of normal operations, customer traffic spikes at certain times of the day, certain days of the week and during certain seasons. Without enough of a historical data set to understand this seasonality, it can be difficult to understand why sales go up or down at different periods.
False alarms
A dynamic anomaly detection system learns from the past to identify expected patterns of behaviour and predicts anomalous events. But what if your model consistently throws up the wrong alerts at the wrong time?
It is crucial to achieve a balance in the sensitivity of your model, because leaning too much in either direction can make you lose the trust of your customers.
One of the things you may want to look at if you are getting a lot of false alerts is how strict your limits are around the baseline. If the limits are too narrow, the model may falsely detect normal variance as an anomaly. Additionally, you should increase the sample size used to inform the algorithm. More historical data will allow the model to account for expected outliers and improve its overall accuracy.
Imbalanced distributions
One of the most common ways to build an anomaly detection model is with a supervised algorithm which requires labelled data to understand what is good and what isn’t.
However, labelling data usually creates a problem called distribution imbalance. In many domains, the volume of normal samples will swamp the volume of anomalous samples (e.g. credit card fraud) . As a result, the model may not have enough examples to properly learn what is a ‘bad’ state.
Black swan events
Anomaly detection methods work by learning what is “normal” and then flagging data that deviates from that norm. When rare black swan events, like the COVID-19 pandemic, occur, anomaly detection models are thrown off since the behavior of underlying data generation processes change overnight. For instance, flight cancellations in the first few days of lockdowns being announced were through the roof and online food ordering saw jumps of orders of magnitudes. Any anomaly detection systems put in place by companies like Booking.com and Uber would have failed and new models would have to be trained.
Anomaly Detection Use Cases
Let’s look at a few use cases of anomaly detection, and the business changes that can occur as a result.
Finance (TerraPay)
TerraPay, a cross-border B2B payment infrastructure solution provider, developed the TerraPay Intelligence Platform (TIP) powered by Scribble Data’s Enrich.
Enrich was able to dramatically reduce time-to-resolution of AML (Anti Money Laundering) cases by empowering Terrapay analysts with high-trust feature sets flagging suspicious transactions and users on a daily basis. An AML expert rule system is a crucial component for a company like Terrapay which collects and aggregates a massive amount of transaction data for customers, accounts and other stakeholders.
Ecommerce (AOV, Tracking conversion rate changes)
Detecting anomalies in real-time can make or break an ecommerce business. Two of the most significant metrics that an anomaly detection system will help you optimize for are:
Tracking conversion rate changes: Let’s say you’re in the middle of a rebranding campaign and are currently running multiple offers across your digital platforms. When your sales suddenly start dipping, it may be difficult to analyse what’s wrong from just a Google Analytics report.
Here, an anomaly detection solution can help you identify metrics and connections that it would take too long to make manually, such as
- The changes to your pricing page are making a lot of customers bounce off it
- Your site isn’t loading fast enough on mobile devices
- Your products naturally sell less during specific seasons
- Your tracking code is improperly installed on multiple affiliate pages
- The new CTA isn’t converting as well as the previous one
- Your site has multiple SEO errors which is affecting your traffic
Changes in average order value (AOV): Here, an anomaly detection solution can automatically point out reasons for a fall in AOV, such as
- Technical issues with the layout of the landing page
- Your current promotion isn’t converting well
Cybersecurity
Due to the massive risks posed by malicious entities and data breaches, many experts argue that modern-day cybersecurity standards are impossible to achieve without a robust, algorithmic anomaly detection system.
There are several different types of threats that an anomaly detection process can help businesses identify, such as
- Abnormal data deletions: These happen when data is deleted unexpectedly during storage or transit from any part of the network.
- Unusual logins: Anomaly detection systems can instantly alert you when an unexpected entity logs on to a server, host, database, or cloud service. Multiple failed login attempts can also point to a malicious attack.
- Data exfiltration: With an anomaly detection system, a build up of red flags such as unusual logins, data hoarding and data uploads can be looked at to recognize unauthorized transfers of data from inside your organization to an external entity.
Anomaly detection is one of the foremost applications of machine learning, and we’ve touched upon several of the possibilities that the technology enables for businesses that use it. As modern businesses take in an ever-increasing amount of data, the importance of an anomaly detection algorithm grows.
If you would like to experience the true potential of anomaly detection as a tool for identifying business opportunities, detecting frauds, and achieving zero-trust cybersecurity, we’d like you to consider Enrich. Enrich is a modular feature store built as a collection of apps atop a platform, making it easy to integrate into your existing data stack to solve specific problems.
For more in-depth articles about anomaly detection, feature engineering and the transformative power of data, stay tuned to the Scribble Data blog.