Structure in football: putting formations into context

This article was published as a chapter in the FC Barcelona Innovation Hub Football Analytics Guide, 2020. The Guide is freely available for download by filling out the online form here. Links to other work cited in this chapter are given at the end.

Formations are the foundations of tactics in football. They provide structure to a team that helps the players to position themselves and broadly defines their specific roles in attack and defence. They are the means through which managers attempt to maintain control of high-value territory while denying their opponents the same.

The debate around formation tactics is as old as the game itself and is a central theme in the history of how the game is played. Innovations in formations have mirrored the evolving balance between defensive solidarity and attacking flair, discipline and freedom of expression and outcomes versus entertainment. From the early 2-3-5 ‘pyramid’ in the late 1800s to the more balanced W-M formation of Herbert Chapman in the 1930s, the 4-2-4 of Brazil in the 1950s, the 5-3-2 of Helenio Herrera’s Inter Milan in the 1960s, the zonal 4-4-2 of Arrigo Sacchi’s AC Milan and the 4-2-3-1 of recent decades, formations have rich story of action and reaction (I would strongly recommend reading Jonathan Wilson's book "Inverting the Pyramid" if you want to know more about the history of football tactics).

In the modern game, formations are not rigid and unchanging: they are dynamic, adapting to the specific circumstances on the field, changes in personnel and each team’s immediate objective. Managers frequently refer to the necessity of adopting different formations for different phases of the game and their importance to game management. Comprehensive analysis of an opponent must consider not only how a team is structured while defending their own goal, but also how they press higher up the field. In offense, it must consider their different configurations when playing the ball out from defence, progressing it up the field or attempting to break down a set defence in the final third.

The arrival of player tracking data (which measures the positions of every player on the pitch many times per second) has presented the opportunity to study team formations – and the transitions between formations – at an unprecedented level of detail. The data can reveal how a team has been instructed to organise themselves in different situations and against a variety of opponents. Detecting significant changes in formation during matches provides insights into how a coach reacts to certain game situations. With knowledge of the strategic framework within which players have been instructed to play we can attempt to understand their decision-making on an individual level, separating what have they been told to do and what do they do instinctively.

This chapter reviews some of the insights on formation tactics that data has revealed. In the next section I discuss the approaches that have been used to study formations in tracking data and highlight their findings. In the following section I present a case study that demonstrates how formations evolve between different game phases. In the final section, I review recent research into how teams might exploit knowledge of their opponent’s formations to exploit weaknesses.

Enter the Data

Measuring team formations with tracking data is a balancing act. At any particular instant a player may be a significant distance away from his or her formation position: covering for a teammate, chasing down an opponent, or making opportunistic runs into space. Formations must therefore be measured over a sufficiently long period of time that these deviations average out and we gain a more accurate understanding of each player’s position.

However, player roles also vary with game phase: a full-back may be aligned with the centre-backs while defending in their own third of the pitch, but level with the forwards in the attacking third. The distinct phases of a game must be identified, and formations measured in each phase separately. Furthermore, managers sometimes make wholesale tactical changes during a match; a complete change of formation (often accompanied by substitutions) to alter the flow of the match or close it out. We must detect these tactical changes and measure formations before and after them separately. So while tracking data should be aggregated over a period of time to average away the temporary departures of each player from their formation position, it must also be aggregated carefully, so that data from different game phases (or entirely different systems) are not mixed together, blurring the tactical picture.

A player’s role within a formation is typically defined by their position relative to their teammates, rather than their absolute position on the field, particularly when defending. At any given instant, the area encompassed by the outfield players collectively is a relatively small fraction of the total area of the pitch: players move coherently as a group to maintain their spatial configuration. For example, Figure 1 indicates the positions of the defending team at four instants during the first half of a match. It is clear that, while the team occupies different areas of the pitch at each instant, the players largely retain their relative positioning, maintaining a 4-3-3 formation (four defenders, three central midfielders and three forwards).

Figure 1 – The positions of the outfield players of the defending team at four different instants during a professional football match (shooting from bottom to top). The blue arrow indicates the average position of the team relative to the centre of the pitch.

Bialkowski et al. (2014) published one of the first quantitative analyses of team formations in football using tracking data (see also: Lucey et al. 2013, Bialkowski et al. 2016). They describe a role-identification methodology for measuring formations, iteratively refining estimates of the average spatial positions (and deviations from those positions) of 10 unique outfield roles throughout a match. At any given moment, player positions were measured relative to the average position of the team (as opposed to using their absolute positions on the pitch), accounting for co-ordinated team motions in the formation observations.

Bialkowski et al. (2014) applied their methodology to Prozone tracking data for a season of a 20-team professional league. A single formation observation was measured for each team in each half of every match; game phase information was not used and so the individual formation observations were a mixture of attacking and defensive configurations. Applying a clustering algorithm to their full set of formation observations, Bialkowski et al. (2014) identified 6 unique formation types: 4-4-2, 3-4-3, 4-4-1-1 and 4-1-4-1 are all visible in their results. To investigate how individual players exchanged roles throughout a game, they repeated their analysis, measuring formations in five-minute windows (rather than each half separately) throughout one of the matches in their sample. Their results showed how the midfielders – most notably the left and right wingers and the two central midfielders – exchanged positions throughout the match.

In a follow-up paper, Bialkowski et al. (2016) extended their analysis to measuring formations in and out of possession separately for each team, finding in-possession formations to have a similar, although slightly more expansive, structure to those measured when teams were out-of-possession. However, when they measured team formations in 5-minute periods and searched for distinct formation types within those observations, they found significant variations. Most notably, changes in formation coincided with the team’s proximity to their opponent’s goal, with the formations evolving to more aggressive configurations as the team advanced up the pitch. This was the first quantitative demonstration of the importance of game phase information to analysing team formations.

Shaw and Glickman (2019) presented a data-driven technique for measuring and classifying team formations as a function of game state, analysing the offensive and defensive configurations of each team separately and dynamically detecting major tactical changes during the course of a match. Using a season of tracking data from an elite professional league, they introduced a geometric approach to measuring formations, calculating the vectors between each pair of teammates at successive instants during a match and averaging these over a specified period of time to gain a clear measure of team formations in defence and attack. Defensive and offensive formations were measured separately by aggregating together consecutive periods of possession of the ball for each team into two-minute windows of in-play data, excluding periods of possession that lasted for less than five seconds (under the assumption that they are too short for either team to fully establish an offensive or defensive stance).

Figure 2 plots the full set of formation observations for one team during a single match (12 separate observations in possession and 9 observations out of possession). It is clear that, when out of possession (left plot) the team played with a 4-1-4-1 formation, with a single defensive central midfielder and a lone striker. In possession (right plot), the outside midfielders advanced to form a front three and the full backs moved level with the defensive midfielder. The right central midfielder played slightly deeper than the left central midfielder, introducing a small asymmetry to the team when attacking. While the relative positions of the defensive players in the team are clearly well constrained, the formation positions of the offensive players – particularly the striker – are much more broadly distributed, both in and out of possession, indicating greater freedom in their roles. Overall, the consistency of the observations indicates that the manager did not make a significant formation change during the match.

Figure 2: The full set of formation observations for one team throughout an entire match. The left plot indicates 9 defensive formation observations, the right plot indicates 12 offensive formation observations.

Shaw and Glickman (2019) applied their methodology to identify a set of twenty unique formations that teams adopted over the course of a season. These unique formation types were used as templates to classify the formations used by teams during matches, study transitions between defensive and offensive configurations and detect major changes in formations during a match (see also Beernaerts et al. 2018, Müller-Budack et al. 2019).

Figure 3 provides examples of defensive and offensive formations that were frequently paired together by the teams in their dataset. The left-hand side of the diagram shows two defensive formations identified in the data, while the right-hand side shows three offensive formations. The links between them indicate the formations that were regularly combined as possession was gained and lost.

Figure 3: Two examples of the typical pairings between defensive and offensive formations in a sample of 180 matches. All formations are orientated to shoot from left to right.

The example highlighted in blue indicates that teams that defended using the formation Def1 typically transitioned to the formation Off2 when in possession of the ball. The relationship between the two formations is clear: the outside defenders, or wingbacks, advanced when the teams gained possession and the two outside midfielders tucked in behind the two forwards.

The second example, highlighted in red, demonstrates that teams defending using Def2 would transition into either Off1 or Off3 when they gained possession. In the former, the outside forwards pushed wide and the full-backs advanced into midfield, whereas in the latter the front three remained narrow with the full-backs advancing further up the field to provide width to the team. These examples show that some defensive configurations seem to give more flexibility in terms of attacking options than others.

Game Phases

The results discussed above have reinforced the notion that a team’s formation will vary based on the game phase. The concept of game phases derives from the idea that any moment of a match can be categorised based on the immediate intentions of each team. While there is no universally accepted definition of games phases, an example is shown in Table 1.

Table 1: Example of game phase categorisation, based on discussions with analysts at the German Football Federation (the DFB).

In this example, matches are broken down into four phases – Offence, Defence, Transitions and Set Pieces – each of which consist of a number of phase types. The Offensive phases categorize periods of possession into ball retention (ensuring possession of the ball is maintained), progressing the ball towards the opponent’s goal, and active chance creation. Defensive phases effectively indicate how far up the pitch the team is attempting to defend. The Transition phases deal with the intentions of a team in the moments that follow an exchange of possession. Set pieces label all dead ball situations (and could be further refined into ‘first ball’, ‘second ball’, and so on).

Automated detection of game phase is a challenging technical problem and many current categorisations rely on human analysts’ interpretation rather than an algorithm. As game phases are loosely related to where a team is located on the pitch, a simple method is to base the definitions of game phases on the distance of the team from their own goal. In Figures 4 & 5, I show formations measured in different game phases for a major team in a cup semi-final.

Figure 4 shows formations measured in three simple phases of possession based on the vertical position of the team’s centroid: defensive third, middle third, and final third. While playing the ball out from their own third of the pitch, the two central defenders (#4 and #5) were typically separated by a distance of 30 meters, with the midfielders (#6 and #8) dropping deep and the full backs pushed further up the field. The structure is similar in midfield progression (middle plot), with the gap between the two central defenders narrowing to 20 meters. In the final third, the formation changes significantly, resembling a 2-4-4: the trio of attacking midfielders (#11, #10 and #7) advance level with single striker (#9) to form a single attacking line, with the central midfielders and full-backs forming a second line behind them.

Figure 4: Team formations in 3 basic phases of possession (first half only).

Figure 5: Team formations in 2 basic phases of defence (first half only).

Figure 5 shows the formations in two defensive phases: low-block (defined as periods when the team were defending in their own half) and high-press (defined as periods when the opponents were attempting to play the ball out of their defensive third). The low block is clearly a compact 4-2-3-1, with the trio of attacking midfielders remaining in advance of the two central midfielders and the back four positioned about ten meters outside their own D. In the high press, the data indicates that their formation resembled a 4-2-4 (or an aggressive 4-4-2), with the front four forming a single line and the two central midfielders sitting in front of the back four. Their opponents played with a 4-3-3 while in possession in their own third, with a single midfielder playing deep and two midfielders more advanced on either side of him. Players 6 and 8 may therefore have remained deep to help protect the defence against the front three should their opponents break through the first line.

Across the five game phases depicted in Figures 4 and 5, the players were configured in four related but distinct formations: 2-4-3-1, 2-4-4, 4-2-3-1 and 4-2-4. The clear structure to each measurement emphasizes how formations form the backbone of team tactics, while the changes from one phase to another demonstrate how the team adapted to different situations.

The results shown in Figures 4 and 5 were generated using tracking data from only the first half of the match. Once the entire game has been broken down into the constituent phases, the tracking data for each phase can be aggregated to create a single formation observation for each team in that phase. However, before the data is aggregated, it is necessary to check whether there was a major formation change at some stage of the game.

Figure 6: The impact of the half-time change in formation on the team’s low block.

In this case study, the team made a significant tactical change at half time. Figure 6 demonstrates how their formation in the low block changed from the 4-2-3-1 to a midfield diamond. Players 10 and 11 exchanged positions and, after 15 minutes of the second half, player 6 was substituted for player 55.

How did this formation change affect the flow of the game? Their opponents had been the superior team in the first half, but the second half was more balanced, both in terms of possession and chance creation. It is worth noting that, over the course of the season, the coach frequently changed his team’s formation during matches. Formation detection algorithms can be used to flag interesting tactical changes made by a manager over several seasons that can then be studied to anticipate how he or she may react in the future.

Formation Disruption

One of the immediate benefits of measuring formations is that the concept of tactical discipline can then be quantified. This is particularly pertinent to defensive organisation: when does the defensive shape of an opponent become disrupted? Football is a territorial game and formations are a strategic tool for ensuring that high-value territory is well-guarded. To create goalscoring opportunities, the attacking team must attempt to occupy space near their opponent’s goal long enough to produce a clear shot at goal. One way to achieve this is to create disorder in the defensive system by manipulating defenders away from the spaces they defend.

In their 2019 OptaPro forum talk, Mladen Sormaz and Dan Nichol explored the relationship between formation disruption and space creation, establishing a link between off-the-ball runs, defensive disorganisation and shooting opportunities (see also Memmert et al. 2017). They introduced a new metric, formation damage, that quantifies the degree to which the positions of the defending players have deviated from a reference formation (which was inferred from the data). The space created by off-the-ball runs – defined as periods of sustained acceleration in the opponent’s half – was quantified by measuring the maximum area controlled by a player over the duration of their run using Voronoi Tessellation.

Figure 7: Space captured by off-the-ball runs made by attacking players in the opposing half (x-axis), plotted against the degree of formation damage that occurred during these runs (y-axis). Small white dots represent individual runs made players in the data; larger red dots highlight runs made in an attacking play that produced a shot on goal. The colour scale indicates the results of a logistic regression of shot probability on space creation and formation damage (brighter colours indicate a higher shot probability). Taken from Sormaz and Nichol (2019), with permission from the authors.

Figure 7 shows that runs that both capture space and help to disrupt an opponent’s formation are more likely to lead to chance creation. The probability that an attacking play will produce a shot – as indicated by the colour scale – increases towards the upper-right hand corner of the plot, corresponding to off-the-ball runs that simultaneously capture large areas in the opponent’s half and damage their formation by forcing defenders out of position. The method introduced by Sormaz and Nichol provides a metric for assessing run quality: runs that achieve high scores in both metrics are related to an increase in attacking threat. A natural next step would be to demonstrate a clear causal link between player movement and formation damage – identifying the types of runs that drag players out of position – and to consider the disruptive effects of counterpressing.

At the 2020 MIT Sloan Sports Analytics conference, Sergio Llana, Pau Madrero and Javier Fernandez demonstrated how data can be used to identify strategic weaknesses in the positioning of defending players. In their method, they allocate zones to individual players in the defending team: each player is assumed to be responsible for covering his or her own unique zone. They then find instances in which the attacking team passed the ball into one of these zones whilst bypassing the defending player responsible for guarding it (who is assumed to be ‘out of position’). Llana et al. (2020) demonstrate how these situations cause other players in the defending team to move out of position in turn, propagating formation disruption throughout the team and creating space for the opponents. Video examples of these situations can be found here.

Using the Expected Possession Value framework presented by Fernandez et al. (2019), Llana et al. (2020) quantified the cost of a given player being caught out of position by calculating the probability that a pass into their zone would increase the chances of the possession resulting in a goal. In a case study of the UEFA Champion’s League group stage match between Tottenham Hotspur and FC Barcelona in the 2018-19 season, they showed how their methods revealed the high-value nature of Barcelona passes played beyond Tottenham’s full-backs Trippier and Davies and into the zones they were expected to cover.

Into the future

The discussion of formations is as old as the game itself, but data-driven formation analysis is still a relatively young field. With the increasing availability of tracking data, new methods for measuring formations in various match contexts are now being developed. Studies of formation discipline and the disruption of defensive blocks are providing clear demonstrations of how these methods can be used for opposition analysis and match preparation.

How might analytics influence the thinking of the next generation of coaches? Analysis of tracking data has demonstrated that there is much greater variety in the formations utilized by teams – even within a single game – than is discussed in standard pre- and post-match reports. In future, data will help to reveal answers to important questions: what are the typical strengths and weaknesses of different formations (especially when pitted against one another)? How do formations affect playing style, and do they enhance or supress a particular player’s abilities? Formations also influence the dynamic elements of a player’s role: the opponent’s they mark, the space they create or defend, the co-ordinated runs made by players; data can also be used to explore these aspects of team strategy.

For those of us who love to dissect and study the beautiful game, the different perspectives that data analysis can provide will hopefully lead to new insights that, when combined with the knowledge and experience of leading coaches, will help to drive the next steps of tactical innovation in football.

References

Wilson, J. (2009). Inverting the Pyramid. Nation Books.

Beernaerts, J., De Baets, B., Lenoir, M., De Mey, K., Van de Weghe, N., (2018). Analysing Team Formations in Football with the Static Qualitative Trajectory Calculus, In Proceedings of the 6th International Congress on Sport Sciences Research and Technology Support - Vol 1: icSPORTS, 15-22 (link)

Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Sridharan, S., and Matthews, I., (2014) Large-scale analysis of soccer matches using spatiotemporal tracking data. In: 2014 IEEE international conference on paper presented at the data mining (ICDM). 14–17 Dec 2014 (link)

Bialkowski, A., Lucey, P., Carr, P., Matthews, I., Sridharan, S., and Fookes, C. (2016) Discovering team structures in soccer from spatiotemporal data. In: IEEE Transactions on Knowledge and Data Engineering, 28(10), pp. 2596- 2605 (link)

Fernandez, J. and Bornn, L., (2018), Wide Open Spaces: A statistical technique for measuring space creation in professional soccer, In Sloan Sports Analytics Conference (link)

Fernandez, J., Bornn, L., and Cervone, D.,(2019), Decomposing the Immeasurable Sport: A deep learning expected possession value framework for soccer, In Sloan Sports Analytics Conference (link)

Llana, S., Madrero, P., and Fernández, J., (2020). The right place at the right time: Advanced off-ball metrics for exploiting an opponent’s spatial weaknesses in soccer, In: Sloan Sports Analytic Conference (link)

Lucey, P., Bialkowski, A., Carr, P., Morgan, S., Matthews, I., and Sheikh, Y., (2013). Representing and Discovering Adversarial Team Behaviors using Player Roles, in CVPR, 2013 (link)

Memmert, D., Lemmink, K. A. P. M., Sampaio., J., (2017). Current Approaches to Tactical Performance Analyses in Soccer Using Position Data. Sports Medicine, 47, 1. DOI 10.1007/s40279-016-0562-5 (link)

Müller-Budack, E., Theiner, J., Rein, R., & Ewerth, R. (2019). "Does 4-4-2 exist?"-- An Analytics Approach to Understand and Classify Football Team Formations in Single Match Situations. In Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports (pp. 25-33) (link)

Narizuka, T., Yamazaki, Y. (2019). Clustering algorithm for formations in football games. Sci Rep 9, 13172 (2019) (link)

Shaw, L., and Glickman, M., (2019). Dynamic analysis of team strategy in professional football. In: FC Barcelona Innovation Hub Sports Analytics Summit, pp. 1–13 (link)

Sormaz, M., and Nichol, D., (2019). Quantifying the impact of off-the-ball movement in football. In OptaPro Analytics Forum 2019 (link)

Comments

Etelligentcoaching20 January 2021 at 22:59
Thanks for the interesting post. With my academy team we no longer talk or use ‘formations’ like 4-4-2 and instead decide on the shape and structure we want in and out of possession and how we control fluid and static transitions to move between these structures. However, I’m still trying to find an analytical approach to understanding this not only in relation to distance between our players but also in relation to the distance and structure of the opposition. Is it possible to use tracking data to see how the positioning of opponents influences our own structure in certain phases of the game?
A-League1 February 2021 at 04:33
Thank you for a very educational read. I've written about football for a number of years for publications big and small, but always with an editorial or news based focus. It isn't often that I come across work as exceptionally informative and well thought out as yours.

I'll be bookmarking your blog and visiting it every so often. Again, thank you for brightening my day with your work!

EightyFivePoints