
Catching Cheaters in Video Games with a Low False-Positive Rate
Here at Anybrain, we are currently focused on perfecting a cheat detection system based on player behavior analysis, without affecting the experience of legitimate players.

Simply put: players click their mouses, tap their keyboards, and we analyze the way they do so. Do cheaters behave differently? Yes!
Before analyzing the data, we have to decide how to structure it. Some relevant questions are:
- Are the phenomena in human-computer interaction correlated in time?
- Can we simply create aggregational features to summarize the interactions?
These questions are important because their answers affect the structure of the data we train our models with. For example, if there are temporal dependencies in the interactions, we would want to use a model capable of capturing them.
We ought to have a generic data pipeline that supports different approaches and a process to find the best models for any game on any platform, given a set of goals such as a low false-positive rate.
This post is not meant to present the best data structure to use, nor the best data collection method. Today, we share some thoughts on:
- An example of a data structure we might have — Multivariate Time Series.
- The tools we’re using to develop deep learning models and finding the best architectures;
- The methods for validating results, knowing they’ll hold up for new users and prioritizing a low false-positive rate;
- How we plan to deploy these models.
The Data — Multivariate Time Series
There are several ways we can represent the interaction between the player and the gaming platform. One of them is multivariate time series. These consist on fixed-length sequences where each element is an array of measurements. For the sake of this post, it is not necessary to know what each feature represents, so we won’t dive into that matter.

Depending on the game and platform being used, the number of features in these time series can change, which affects the optimal model architecture. So how can we set up an architecture that allows for such drastic changes in model architecture?
Deep Learning with Tensorflow and Keras
In Deep Learning, backpropagation allows using the same objective function in neural networks with completely different architectures. Frameworks such as Tensorflow and APIs like Keras let us effortlessly implement these architectures (involving LSTM or convolutional layers, for example) with varying levels of complexity, but we still need a way of finding the best architecture for each model.
We could do this by performing a traditional grid search, or by implementing our own evolutionary algorithm, but we came across Optuna, which provides an excellent hyper-parameter search through a simple API.
In this example, we use Optuna locally and in-memory, but this library can work in a distributed fashion, by managing and keeping track of the search trials in an SQLite database.
Player-centered Cross-Validation
In many tutorials and even scientific papers addressing cheat detection in video games, we see a rudimentary validation process that doesn’t account for the need for the model to work with unseen players.

If we test our models for players with data in the training test, how do we know that we can catch new cheaters? This is important especially when we consider that many cheaters might be new to the game.
In this sense, we adopted a simple strategy to solve this problem: we simply leave one player (or a set of players) out of the training data.

We’ve been seeing exciting results. Our models make a clear distinction between cheating and normal behavior.
An important step before deploying a model is to choose a threshold. In other words, to define the point above which a prediction is considered a cheating occurrence. If our predictions range between 0 and 1, our first thought would be that the best boundary is 0.5, but that is rarely the case. In fact, there various criteria for choosing the best threshold. We prioritize a low false-positive rate, to minimize false accusations on legitimate players. This approach can, to a certain extent, compromise the model’s sensibility. However, given that our models have been making a good distinction between legitimate and fraudulent behavior, we can afford to do this and still detect cheaters.
There is still room for improvement, especially in creating simpler and more efficient methods for collecting correctly labeled data.
Deploying Models in Java with SpringBoot
After we have trained our models, we need to be able of serving them in communities with thousands or even millions of players. Since our backend is built with SpringBoot, we use the TensorFlow Java API.
An example of how we can load a TensorFlow model and make a prediction in Java with the data structure seen above (a multivariate time series with 10 features).
Loading up and using a TensorFlow model in Java is certainly more verbose and not as straightforward as in Python, but it gets the job done.
From a performance standpoint, response times have been ranging from 20–100ms when tested locally. Although these response times already satisfy our current requirements, we can still greatly reduce their cost with batch predictions.
All we can say for now is that this is a promising solution, as we still need to perform additional stress tests to investigate its limitations.
Wrapping it Up
One of our main goals is to create generic pipelines to explore as many approaches to behavior analysis as possible. We hope to create an increasingly autonomous system, making fewer and fewer assumptions regarding gameplay.
In this post, I’ve tried to showcase a particular research direction that we’ve been taking here at Anybrain, and some practical examples of how we’re tackling our challenges.
We would be glad to hear any thoughts or suggestions, as we believe that sharing knowledge and ideas is a major driving force of innovation.