How Blitz.GG uses Machine Learning to analyze TFT Compositions

Published in

Blitz Press

17 min readMay 19, 2023

Machine Learning Powered Comps for TFT (preview)

At Blitz, we believe in coaching and helping our players become better at the games they love. Understanding the meta is an important part of Riot Games’ auto-chess: Teamfight Tactics (TFT). TFT is a strategic round-based game auto battler where you draft a team that battles against other players. Even though every TFT game is unique, understanding the meta is an important step in learning the game.

The problem with current tools

There have been many attempts from websites to propose compositions for players to learn from. However, we detected many different problems and issues in what we found - motivating us to do better.

For instance, most websites do not capture the meta very well, compositions are too close to each other, and sometimes some are missing, making it difficult to have a quick glance at the whole meta correctly. UI is key, as compositions are for players to make decisions quickly, we believe that sometimes statistics are not easy to leverage nor filter for specific use cases.

Finally, competitors are usually either too beginner friendly and too simple, or either too detailed with many advanced stats, not being a go-to solution for most players

We have gathered feedback from our community and are proud to be releasing our own solution to tackle all problems in order to provide the best tool for TFT players.

Our solution, the Blitz Way

We are honored to be releasing our new Machine Learning powered analyzed compositions. We have been working hard in order to provide best-in-class compositions in order to help you get better at the game.

With our newly analyzed comps, our goals are to:

Help beginners understand the synergies and how to place their items / augments based on their teams. Understanding what works well in most cases provides a strong foundation for learning the basics
Empower advanced players by providing statistics and data that inform better in-game decision making

We are giving insights on how we come up with our compositions in the rest of the article. Hope you enjoy it! https://blitz.gg/tft/comps/stats

Context

In Teamfight Tactics, when we talk about compositions, we usually talk about the late-game composition that you are building towards. An end game board usually contains 8 units (9 for late game). There are around 60 champions per set, meaning you could theoretically build around 60⁸ different compositions. However, since there are synergies between different champions, the observed set is actually way smaller. Players actually impact the meta, influencing each other as they interact (directly or not) through streaming, social media VODs, chatting, etc., which influences the meta in a certain direction. There are many examples of games where the observed meta is actually restricted to a small space of set strategies that a player could choose. It’s still quite common nowadays to see in TFT some compositions becoming viral after a streamer witnessing it in front of his/her viewers.

Nevertheless, knowing about a top composition isn’t usually enough to master it. It takes time and effort in order to master a composition. Different compositions imply different playstyles, difficulties, economy, strategy, timings, etc. Some are high risk and situational while some are relatively safe to play regularly. When looking at professional players, they usually also play around a subset of compositions that they are familiar with.

TFT sets are short (a couple of months only), and each set introduces a completely new set of units and synergies (and mechanics sometimes) that force players to reset their knowledge and explore synergies from scratch, which is, let’s face it, the most exciting part of TFT. While experience can help players understand better how to play and make better decisions, players can’t accumulate composition knowledge over sets, making it possible for everyone who starts playing TFT to compete with players who are around for years. This is not the case for other games like League of Legends where the knowledge gap between new players and 10 year old players is huge and challenging to close.

The other consequence of the TFT sets being short is that you do not have time to master every single composition unless you play dozens of games per day. Even if you had time, pro players usually focus on mastering a limited set of comps rather than imperfect knowledge on a larger set of comps.

For these reasons, we provide our players with a set of compositions that they can learn from, and give them some directions in their games. Additionally, we provide you with a variety of stats and side recommendations aside from the champions that you should aim at. This way you can — at a glance — get a good understanding of how to play a specific composition. That goes from the recommended augments, alternative champions you can go for, what to rush at a carousel, how to spend your money, and so on.

On to the fun stuff…

Let’s dive into the technical part of how we daily refresh our compositions. We collect millions of games every day from our players (across many games). We retrieve for each game most of the players’ options/decisions they took i.e. which augments they chose, which champions they owned at any single round (on board or on the bench), which items were slammed (when and on whom) etc.

We store and process the data in AWS S3 and Databricks, respectively: from the ETL process to the model training and inference up to serving data for our app and website. We also leverage Airflow to run a set of sequential jobs that goes through all the steps explained below.

Step 1: Clustering

Every single game is unique, meaning every single composition you’ll finish the game with is also unique. Even if you are the type of player that loves to force the same comp over and over again, you still could never play two identical games. However, you would likely have played in a similar manner across those games, around the same key units, with a similar strategy to spend your gold, looking for the similar items and augments, and the outcomes might have been different. The composition that you were aiming for, that you had in your head, is the key composition that we are trying to retrieve. That’s the key composition that best represents what you were trying to achieve. Again, you might not have reached it, or quite the opposite you might even have gone beyond that by doing better.

The goal is thus, given examples of millions of compositions played, determine what are the key compositions that players play around.

Identifying those compositions allows us to:

Understand the meta by simplifying it — having a list of key compositions is faster than playing a hundred games to discover them all or watching content online
Study the characteristics of each composition in order to help our players understand the key features of each composition

This problem can be framed as a clustering method. Imagine a two-dimensional set of points where each point represents a composition that you have played, two points are close if the two compositions are similar, far away if they are two different. We want to find the points in our space that best summarize the key compositions that one could play.

While heading for a specific comp implies a unique play style from the beginning of the game, we are identifying compositions only by looking at the final state of a player’s board (which is what you see in a match history). However, that does not mean we only leverage post-match data, we later will explain how we can use round data in order to come up with specific recommendations. One could try to integrate round-per-round data into the process of identifying compositions but from our experience, the added value is not worth the increased effort / computation.

We are identifying compositions through clustering methods. By representing each end game composition as a vector, we are able to compute distances between compositions and thus perform clustering algorithms.

Each end game composition is defined by:
- A list of units (theoretically 0 to 10+ units), with for each
— an optional list of items
— a level (from 1 to 3)
- A set of augments
- A set of metadata related to the game such as
— a placement (final placement in the game from 1 to 8)
— date of the game
— rank (rank of the player, for instance, Platinum 3 at the time of the game)
— server played (North America, Korea, West Europe, etc.)

Representing a composition can potentially leverage all these features, as long as they have some weight in the clustering.

While augments are correlated with the played composition, the actual set of augments is large and the chosen set small (3 out of several hundreds), leading to the pick rate for each augment to be relatively small. Augments are thus hardly includable in a representation vector. It is indeed not recommended to gamble a game on being dependent on a specific augment. We’ll see however later that heading for a specific composition does correlate with people’s behavior on augments choice, which can be turned into useful recommendations. Metadata isn’t really correlated with the composition itself but will be really useful information for later steps.

Items are also correlated with a specific composition. A given champion could be a carry unit in composition A and thus usually holds items, while for other composition B simply be used for its synergies and holds no items. Furthermore, even if a unit is included in two compositions and in both holding units, it is not even guaranteed that the chosen items would be the same, making items useful information to identify key comps. However, the variance of item distribution is pretty high, just like augments, it is pretty hard to achieve the perfect set of desired outcomes, the choice of components is limited during the game and one must adapt to what he/she gets. While some items are really important “win conditions”, it is also not recommended to keep components sleeping on the bench for the whole game. For these reasons, we actually did not include items as part of our features.

We limited our features to simply include the champions, which even though seems like a too simple set of features, is actually good enough to embrace the different variations that a composition can have. Indeed, one could come up with a much larger set of features, distinguishing every potential variation of compositions, but we found out, for segmenting purposes, it was not really helpful. Limiting the variation of our features made it easier for clustering algorithms to converge, while still leaving us the freedom to use more data to further analyze the clusters down the line as a post processing.

Step 2: Compositions Algorithm

We use the K-means algorithm in order to identify our clusters, which is the technique that got us the best results. Since the distance between comps is about the number of different units, our points are distributed in a spherical shape, where the center represents the core of a given composition, which works well with the K-means algorithm. It is also well-suited for clusters of similar sizes, which in our case favors equally distributed composition around core compositions, which is exactly what we aim at.

One of the disadvantages in our case of using K-means is that we need to input the number of clusters in advance, which itself is something that could evolve over the days. The number of top compositions is usually increasing as people discover new compositions that become popular. We overcome this issue by introducing a max number of clusters as well as filtering insufficient clusters.

Another downside that we faced was that generated centroids aren’t realistic compositions, in the way that they only code for a small number of champions (2 to 3). Because of the high variance of some compositions or a too small number of clusters, the centroid could be located in the middle of several compositions without actually representing any playable comps. For instance, we observed centroids including only two champions such as Warwick + Mordekaiser, or Leona + Ekko, which are two pairs of champions that go well together and could be the center of several compositions, but do not represent compositions as is. In order to overcome that we translated centroids to the closest and most represented composition around the centroid so that we would always have clusters centered around the playable composition. While the choice could be judged arbitrary, we eventually also re-train our model later with new randomly generated centers in order to fully cover the space if necessary.

Now that we have managed to find realistic centroids as well as having a method to attach any given composition to a cluster, we define an additional condition for a composition to belong to what we call the “inner cluster” as being 4 champions away from the realistic centroid. Indeed, as the space could be sparse, some edge compositions could be wrongly attributed to some clusters simply because that’s the closest even though

Step 3: Stats and Recommendations

Pick-Rate Win Rate Dilemma

When performing the recommendations, we usually take the assumption that, across games, the more a given “decision” is made, the better it is. Here a “decision” could mean, an item to craft on a unit, an augment to choose, a unit to place on the board, etc.

While it is true in most of the cases, there are some cases where this does not apply. For example when studying the different pick rates and average placement (or win rate) of most popular compositions, correlation isn’t always that obvious. Indeed, some compositions are quite situational (requiring a specific emblem or augment), and potentially hard to play, leading to a potential high win rate / low pick rate composition.

On the other side, if we were to purely base recommendations on win rate, some results could be odd as there are always some close to impossible to get a composition that leads to a guaranteed top 1, but only happens one game every thousand games, for example, 3 starring a 5 costs unit, or having 3 items on each of your unit.

For these reasons, when it comes to recommendations, one should find a balance between pick rate and strength, to make sure recommendations are at the same time, powerful, and reachable by a reasonable number of games. There are several ways to perform such balancing, the simplest would be to merge a popularity metric with an efficiency metric to create a score, allowing for later on rank options.

We won’t get into too many details on how we perform such recommendations. It comes down to weighing different metrics in order to find meaningful recommendations, as well as using conditional probabilities. Conditional probabilities allow us to reduce the bias, some options being overall more popular than others, regardless of the composition. For example, augments such as Portable Forge do have a strong win rate and pick rate for most of the compositions, which does not necessarily mean that this should be the recommended go-to augment for every single comp. It is the same for 5-cost units who have an inflated win rate since they are only accessible late in the game which prevents players who die early from obtaining them.

Below is an example of metrics we compute and provide to our users for composition augments.

Metrics Blitz uses to compute and provide to users

Consider the above example, augment recommendations for the composition: Threat Aurelion Sol. In this composition, there are three main carry champions, Aurelion Sol, Bel’Veth, and Aatrox. This composition is about taking advantage of strong Threat units, especially Aurelion Sol, which is responsible for a large share of the composition’s damage.

We analyze augments under three prisms, their average placement, pick rate as well as pick rate improvement. Pick rate improvement is the difference between the pick rate of that augment given that composition compared to the average pick rate of playing that augment with any composition. We can see that the pick rate of generally strong augments such as Portable Forge (12.4%), Thrill of the Hunt II (8.3%), and Jeweled Lotus (7.9%) are all really high, but their pick rate improvements are relatively modest, meaning these augments aren’t played more with this composition than with any other one. However, augments like Threat Level Maximum (22.6% pick rate) with a +21.1% improved pick rate (meaning its average pick rate with any composition is only 1.5%), are nearly exclusively played with this composition. By contrast, portable forge’s pick rate improvement is only +2.6%, meaning this augment is not really specific to this comp.

All these three metrics have various meanings and can help players fully understand the playstyles around different compositions and augments. For instance, an augment with a strong pick rate improvement indicates that this augment is a pretty good choice in that specific composition.

Strategy recommendations

Leveling and rolling are two key aspects of TFT, which is basically about spending your money wisely. While there are some rules that apply for every single composition of units that you play, there are also key differences. Heading to an end-of-game composition by only focusing on units and items isn’t usually enough to reach higher ranks.

Since we collect round by round data, we can extract key insights regarding these two specific strategy axes. It comes down to analyzing hundred of millions of rounds to understand how money is spent at each round, based on other factors of the game. We then define an overall strategy that people are familiar with, such as Slowrolling, Fast8, Standard, etc.

Production — ML Engineering

Models Lifecycle

Having a model with good performance isn’t the end of the journey. While training a machine learning model can be easy on a single fixed dataset, continuously training a model on evolving data can be challenging.

In our case, we want our recommended compositions to evolve over time, as the meta evolves. We also want to make sure our recommended compositions are consistent over time so that we do not display a completely different set of compositions every day, potentially confusing for our players.

A TFT set is composed of several patches, every new patch implies sometimes drastic changes that can significantly change the meta and thus the compositions. Nevertheless, even during a single patch meta can change, since it usually takes a couple of days for the players to discover new compositions, or to find out that the composition that used to work pretty well in the previous patch isn’t actually that good in the current one.

When designing a model lifecycle, there are several paths one could take, each with different pros and cons:

Train a single model once per patch (after a day or two of data), making sure compositions are 100% consistent over time but not adapting at all to the meta changes during the patch.
Retraining a new model every single day, which makes sure it’s adapting to the meta with the tradeoff of potentially outputting completely different comps every day.
Retraining a single model every single day, based on the latest data, making sure it’s both consistent with previously trained models while leaving some room for adjustment. A disadvantage could be to be stuck in some local optima for the whole patch

We chose the latter approach, illustrated below. On the first day of a patch we train a model from scratch using day 1 data, we use that model to infer on our single day of data. On the second day, we re-train our model (using the previous day’s model) using data from day 2. Since meta evolves over time, we expect our centroids to slightly move to better fit the newly encountered data points. The goal is to maximize the number of samples we have for each cluster, we re-infer using day 1’s data + day 2’s data.

As centroids are moving, one inferred sample from day 1 could be labeled as A on day 1, but B on day 2.

We repeat the same process every day, improving our confidence over the meta overtime.

K-evolving K-means

For us to better adapt to the meta, we must consider that the number of top compositions evolves over time. The diversity of play styles can change, meaning the ideal k for our k-means shouldn’t be fixed either. If you were to retrain a single model every day, one could perform some initialization steps by trying to determine the best k (elbow method or silhouette analysis) and train new models with different k every day.

On the other hand, K-means doesn’t guarantee any distance between centroids. If your k is too high, you might end up in a case where two clusters of highly represented compositions are really close to each other, which can be confusing for the end user. In order to reduce that phenomenon we perform some centroids merging, which reduces the number of clusters.

From a practical point of view, we use a target k which is the maximum number of clusters our model can find, which is automatically adjusted to the meta: we add randomly created clusters to the pre-trained model up to our maximum number of clusters, and prune centroids that end up too close to existing ones after training. This way we are constantly evolving and adapting. Typical k-means model methods do not easily allow re-training of a model with a different k than the original model’s, which we had to implement internally.

Model Monitoring

Drifting

In order to track our different training, for development purposes or for production monitoring purposes, we have been using MLFlow Tracking within Databricks extensively. (https://mlflow.org/docs/latest/tracking.html / https://docs.databricks.com/mlflow/tracking.html)

It allows us to simply track any experiment we run. Through tagging, we are able to differentiate prod runs from development / R&D runs.

MLFlow Tracking also allows you to encapsulate any artifact you’d like with your run. In our case we are logging a bunch of data frames and variables including clusters, main compositions, etc., useful to analyze runs a posteriori.

Training a model over evolving data exposes us to one common risk: drifting. Indeed, since we are re-training from a pre-trained model every day, we might be stuck in a local optima, and we might encounter scenarios where the underlying meta has evolved so much that our model is unable to correctly predict the newly coming observed compositions.

In order to track that phenomenon, we have been building ourselves simple charts that track models drifting over time. We query MLFlow experiments to get dated production experiments, for which we compare distances between centroids over time, as well as average error in order to measure some kind of accuracy evolution over time.

As you can see from the charts, some centroids do encounter some spikes as they’re being drastically moved to a different area of the space since meta evolves but most centroids are pretty consistent. The average distance with the previous day’s centroid and average norm are both stable over time.

Using Blitz Analyzed AI Comps

Thanks to joint work between Blitz Data & App Team, our analyzed comps are available through an intuitive UI that we hope you will like. We are already working on new features to give you more insights about how to play the best compositions for you to grind those LPs!

We strive to always set the bar and be the standard for others to follow. We leverage more data than our competitors, making our statistics more accurate than what you could find elsewhere.

We provide global and relative statistics for different aspects of the game, making our product a better go-to product than other websites for you to get better at the game, whether you are just getting started or you are facing stats dilemmas.

We like to think that our UI is also the most intuitive and the most complete UI you can find in the market, allowing you to quickly glance at all things you need during the game.

We provide unique advice regarding gold spending strategies by leveraging Big Data and our huge player base.

If you have any feedback feel free to reach out through our Discord (or the Send Feedback button in the App) — https://discord.gg/theblitzapp

Credits

Shout out to the whole Blitz Team that has been working really hard on this project, Alan Qiu (PM), Marcel Dold (SWE), Ted Li (AI/Data mentoring), Edward Guan (TFT expert, AI Engineer), and Rafael Carteret (TFT enthusiast, AI and Data Engineer).

Author: Rafael Cartenet (https://medium.com/@rafael.cartenet)

Follow us on social media:

Twitter | Instagram | TikTok | Facebook