Extracting Match Stats From Halo Infinite Film Files

September 15, 2024 in Data by Den Delimarsky24 minutes

Extracting Match Stats From Halo Infinite Film Files

Diving into the Halo Infinite lesser-known post-match film data.

What you’ll learn

Introduction

One of the conversations on my blog comments led to a discussion about film files in Halo Infinite. In case you are not familiar with them, no worries - it’s a pretty obscure component of the match data that I haven’t gone in-depth yet on my blog or here, on the OpenSpartan blog.

The idea behind film files is simple - they aren’t your traditional video but rather a combination of game engine metadata that is captured during your gameplay. When you complete a match, a “film” (a recording of all match metadata) is captured and you end up with a whole bunch of binary content that is available through a dedicated API endpoint.

Before we go down this rabbit hole, I want to give a massive shout-out to Andy Curtis for doing quite a bit of work digging through film file structure 🙌

Finding the film files

Before we get to the film content, let’s figure out how we find them. To get started, first try to get your own matches from the Halo Infinite API. This will allow you get the match IDs that we can later use to query for film data. You can send a request to this endpoint to get the most recent matches:

https://halostats.svc.halowaypoint.com/hi/players/xuid({{XUID}})/matches?count=25

In the example above, {{XUID}} is the numeric identifier of your player ID. I talked about the process of converting a gamertag into a XUID in a separate blog post.

Note

You will need to make sure that you authenticate for the API call above to succeed (and all other API calls in this blog post). You can learn more about this in Halo Infinite Web API Authentication.

The match data you will get will be by default in JSON format, like this:

{
    "Start": 0,
    "Count": 25,
    "ResultCount": 25,
    "Results": [
        {
            "MatchId": "4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4",
            "MatchInfo": {
                "StartTime": "2024-09-16T02:35:15.505Z",
                "EndTime": "2024-09-16T02:42:08.144Z",
                "Duration": "PT6M31.0705518S",
                "LifecycleMode": 3,
                "GameVariantCategory": 9,
                "LevelId": "1216247c-bf6d-4740-8270-e800a114f231",
                "MapVariant": {
                    "AssetKind": 2,
                    "AssetId": "37a9b5f0-6be7-4a46-8010-1fe6f7ea5611",
                    "VersionId": "e1cbf812-4f4e-44fc-9ef8-dd9ab5c4e4cf"
                },
                "UgcGameVariant": {
                    "AssetKind": 6,
                    "AssetId": "0e198591-ac15-4f99-8ff2-dd390decad66",
                    "VersionId": "168e6c3a-fdf3-4edd-af79-c0ffe5475026"
                },
                "ClearanceId": "bb31018c-8ca3-4673-b870-5193cfdf18f5",
                "Playlist": {
                    "AssetKind": 3,
                    "AssetId": "1b1691dc-d8b9-4b1f-825d-cb1c065184c1",
                    "VersionId": "38ecf0d8-82ca-4831-b186-eda51653f2ba"
                },
                "PlaylistExperience": 2,
                "PlaylistMapModePair": {
                    "AssetKind": 7,
                    "AssetId": "6b7c20a9-5eed-476f-9716-6d20e2f37f1a",
                    "VersionId": "56c4ba81-a659-4168-bc02-8f4135e693f9"
                },
                "SeasonId": "Csr/Seasons/CsrSeason8-1.json",
                "PlayableDuration": "PT6M31.063S",
                "TeamsEnabled": true,
                "TeamScoringEnabled": true,
                "GameplayInteraction": 1
            },
            "LastTeamId": 1,
            "Outcome": 2,
            "Rank": 1,
            "PresentAtEndOfMatch": true
        },
        [...MORE MATCH DATA...]
      ]
}

This is all useful metadata, but we are looking specifically for the match ID captured in the MatchId property. In my case, the match I am looking for is 4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4, which is a recent Husky Raid game I’ve been a part of.

With the match ID in hand, we can now request the film chunks (every film has several “chunks” that are just binary data) by constructing the URL for another API endpoint, like this:

https://discovery-infiniteugc.svc.halowaypoint.com
  /hi
  /films
  /matches
  /4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4
  /spectate

If the call succeeds, the metadata you will get will look like this:

{
    "FilmStatusBond": 1,
    "CustomData": {
        "FilmLength": 403190,
        "Chunks": [
            {
                "Index": 0,
                "ChunkStartTimeOffsetMilliseconds": 0,
                "DurationMilliseconds": 11,
                "ChunkSize": 465309,
                "FileRelativePath": "/filmChunk0",
                "ChunkType": 1
            },
            {
                "Index": 1,
                "ChunkStartTimeOffsetMilliseconds": 0,
                "DurationMilliseconds": 19972,
                "ChunkSize": 47858,
                "FileRelativePath": "/filmChunk1",
                "ChunkType": 2
            },
            {
                "Index": 2,
                "ChunkStartTimeOffsetMilliseconds": 19973,
                "DurationMilliseconds": 20003,
                "ChunkSize": 122480,
                "FileRelativePath": "/filmChunk2",
                "ChunkType": 2
            },
            [...MORE CHUNKS...]
        ],
        "HasGameEnded": true,
        "ManifestRefreshSeconds": 30,
        "MatchId": "4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4",
        "FilmMajorVersion": 37
    },
    "BlobStoragePathPrefix": "https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage/film/1c7442bd-1f8d-4593-b7d0-1c95618c6876/e6796b9c-eb98-4c32-879a-5e5ab3d567f1/",
    "AssetId": "1c7442bd-1f8d-4593-b7d0-1c95618c6876"
}

The way Halo Infinite API handles films is by splitting them up into separate chunks that contain different classes of in-game metadata during different parts of the game. You will see those chunks yourself when you are in theater mode - the timeline is clearly split into them (see the black markers):

Film fragments in Theater mode in Halo Infinite.
Film fragments in Theater mode in Halo Infinite.

Film chunks are player-independent - they are recorded for the match itself and contain metadata about all players in them. To get the content of each chunk we will construct the URL based on the BlobStoragePathPrefix property and the FileRelativePath for each chunk:

https://blobs-infiniteugc.svc.halowaypoint.com
  /ugcstorage
  /film
  /1c7442bd-1f8d-4593-b7d0-1c95618c6876
  /e6796b9c-eb98-4c32-879a-5e5ab3d567f1
  /filmChunk0

Note

While this is not explicitly called out, the first GUID is the film asset ID and the second is the film asset version, similar to how game asset metadata is associated in the game CMS. If you have film IDs, you can get those directly without worrying about getting match IDs first.

With the URLs ready, we can now download every single chunk for a match and analyze them. If you are on Linux (or using Windows Subsystem for Linux) you can use this Bash script to quickly download all film chunks for a match (make sure to replace your token and clearance):

#!/bin/bash

# Check if match ID is provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <MATCH_ID>"
    exit 1
fi

MATCH_ID=$1

# Headers for the API request
AUTH_HEADER="x-343-authorization-spartan: v4=YOUR_TOKEN"
CLEARANCE_HEADER="343-clearance: CURRENT_CLEARANCE"
LANGUAGE_HEADER="Accept-Language: en-us"
ACCEPT_HEADER="accept: application/json"

echo "Fetching chunk information for match: ${MATCH_ID}..."
RESPONSE=$(curl --silent --location --request GET "https://discovery-infiniteugc.svc.halowaypoint.com/hi/films/matches/${MATCH_ID}/spectate" \
    --header "${AUTH_HEADER}" \
    --header "${CLEARANCE_HEADER}" \
    --header "${LANGUAGE_HEADER}" \
    --header "${ACCEPT_HEADER}" \
    -w "%{http_code}" -o response.json)

HTTP_STATUS="${RESPONSE}"
echo $HTTP_STATUS

# Check for successful response
if [[ "$HTTP_STATUS" != "200" ]]; then
    echo "Error fetching data: HTTP status $HTTP_STATUS"
    exit 1
fi

# Extract the base URL and film chunk paths
BASE_URL=$(jq -r '.BlobStoragePathPrefix' response.json)
CHUNK_PATHS=$(jq -r '.CustomData.Chunks[].FileRelativePath' response.json | sed 's|^/||')  # Remove leading slashes

# Clean up response file
rm response.json

# Loop through each chunk and download it
for CHUNK_PATH in $CHUNK_PATHS; do
    # Construct the full URL
    FULL_URL="${BASE_URL}${CHUNK_PATH}"
    COMPRESSED_FILE="compressed${CHUNK_PATH##*/}"
    DECOMPRESSED_FILE="DECOMPRESSED_${CHUNK_PATH##*/}"

    # Download the compressed chunk
    echo "Downloading chunk from ${FULL_URL}..."
    curl --location --request GET "${FULL_URL}" \
        --header "${AUTH_HEADER}" \
        --header "${CLEARANCE_HEADER}" \
        --header "${LANGUAGE_HEADER}" \
        --header "${ACCEPT_HEADER}" \
        --output "${COMPRESSED_FILE}"

    # Decompress the chunk
    echo "Decompressing ${COMPRESSED_FILE}..."
    python3 -c "import zlib, sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < "${COMPRESSED_FILE}" > "${DECOMPRESSED_FILE}.bin"

    # Clean up compressed file
    rm "${COMPRESSED_FILE}"
    echo "Decompressed chunk saved as ${DECOMPRESSED_FILE}."
done

echo "All chunks downloaded and decompressed!"

You can make the script executable with chmod +x yourscript.sh and then run it by passing the match GUID as the first argument:

./yourscript.sh 1C5F57D3-1418-4BDE-A970-F8FAB6DFE110

This script helpfully decompresses the chunks as well, but we’ll get to that a bit later in this post.

As you look at the metadata for each chunk you will notice that individual chunks have a type. From what I can infer, they break down like this:

Chunk typeDescription
1Game bootstrap metadata
2In-game event captures
3Game summary metadata

We’ll be using every single one of them in our explorations.

Dissecting chunk metadata

Looking at existing chunks, we see that the ones that have the type of 1 or 2 have very sparse event data, at least on the surface. However, they contain valuable information that we will need. To explore the content, let’s download a random chunk for an existing match:

https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage
  /film
  /1c7442bd-1f8d-4593-b7d0-1c95618c6876
  /e6796b9c-eb98-4c32-879a-5e5ab3d567f1
  /filmChunk3

Opening it in a hex editor produces this result:

Binary content for a Halo Infinite film chunk.
Binary content for a Halo Infinite film chunk.

Not exactly “human-readable”, and that’s because we’re missing a core step here - decompression. The clue for that are the first two bytes of the chunk file 78 5E, which is an indicator of zlib Fast Compression. You can read more about it in the official RFC. Looks like we’re dealing with compressed data, and therefore need to make sure that we “extract” it before attempting to read the data.

Let’s do this a bit differently then - we’re going to download the binary file with cURL and then decompress it with Python. Assuming that you are not already using the script I shared earlier to download every chunk, our first step is this:

curl --location --request GET 'https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage/film/1c7442bd-1f8d-4593-b7d0-1c95618c6876/e6796b9c-eb98-4c32-879a-5e5ab3d567f1/filmChunk3' --header 'x-343-authorization-spartan: v4=YOUR_AUTH_HEADER' --header '343-clearance: YOUR_CLEARANCE' --header 'Accept-Language: en-us' --header 'accept: application/json' --output chunk-compressed.bin

And then, we can run a bit of inline Python magic to decompress the content we just downloaded into its own file - decompressed_output.bin:

python3 -c "import zlib, sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < chunk-compressed.bin > decompressed_output.bin
Uncompressed binary content for a Halo Infinite film chunk.
Uncompressed binary content for a Halo Infinite film chunk.

This looks a bit more promising because we actually see repeating patterns. It’s even more promising if we look up events inside the chunk by the XUID for a given player that existed in a match. Because I am using a hex editor, I can easily look up the UInt64 value (all XUIDs are unsigned 64-bit integers), leading me to this:

7:B1E0h  00 00 00 00 00 00 00 00 00 00 00 00 5A 00 65 00  ............Z.e. 
7:B1F0h  42 00 6F 00 6E 00 64 00 00 00 00 00 00 00 00 00  B.o.n.d......... 
7:B200h  00 00 00 00 00 00 00 00 00 00 00 00 00 E5 DE DE  .............åÞÞ 
7:B210h  03 00 00 09 00 2D C0 00 00 00 04 58 00 00 00 00  .....-À....X.... 

Because Halo Infinite is generally known to use quite a bit of Bond-encoded data, I wanted to pass the content of the file through my tool - bond-reader. Doing that was fruitless, though, as it turned out that the data is not Bond-formatted (at least not that I could tell from some short-term digging). I guess we’ll have to stick with proper inference of binary data based on vanilla binary pattern analysis.

Another wrench thrown into our plans was also detected by Andy Curtis the fact that data is not necessarily byte-aligned in the film chunks. That is - if you use a hex editor to spot all existing patterns you might find some but there is quite a bit of data “hiding” in plain sight because it just isn’t properly positioned for a hex editor to render it.

Decoding unaligned data

Because we can’t count on just our hex editor to find the data, we can write some custom code to find the things we want that are not aligned with our expectations 😎

To do that, here is a complete C# application that does just that - if you give it a byte pattern to search for (disregard the actual example pattern - it’s just a demo), it will try to find it regardless of how the data is actually aligned in the file:

namespace ComponentSearchByteAlign
{
    internal class Program
    {
        public static void Main(string[] args)
        {
            byte[] data = File.ReadAllBytes(@"PATH_TO_YOUR_DECOMPRESSED_BIN_FILE");

            // This can be a XUID or a gamertag to easily spot the data sequences
            byte[] pattern = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };

            List<int> matchPositions = FindPattern(data, pattern);

            if (matchPositions.Count > 0)
            {
                Console.WriteLine($"Pattern found at bit positions ({matchPositions.Count} total):");
                foreach (int position in matchPositions)
                {
                    Console.WriteLine(position);
                }
            }
            else
            {
                Console.WriteLine("Pattern not found.");
            }
        }

        public static List<int> FindPattern(byte[] data, byte[] pattern)
        {
            List<int> matchPositions = [];
            int dataBitLength = data.Length * 8;
            int patternBitLength = pattern.Length * 8;

            for (int bitPos = 0; bitPos <= dataBitLength - patternBitLength; bitPos++)
            {
                if (IsBitMatch(data, pattern, bitPos))
                {
                    matchPositions.Add(bitPos);
                }
            }
            return matchPositions;
        }

        public static bool IsBitMatch(byte[] data, byte[] pattern, int bitOffset)
        {
            // Calculates the number of whole bytes to skip.
            // We divide bitOffset by 8 because there are 8 bits per byte.
            int byteOffset = bitOffset / 8;

            // Calculates how far into the byte (number of bits) we need to start.
            // It's the remainder when bitOffset is divided by 8, giving the bit position within the byte.
            int bitShift = bitOffset % 8;

            // On the above, a good example to visualize the behavior:
            // If bitOffset = 10, byteOffset = 1 (skip 1 full byte) and bitShift = 2 (start at the 3rd bit in the second byte - we skip 2).

            // We now iterate through every byte in the pattern that is given to
            // us when the function is called.
            for (int i = 0; i < pattern.Length; i++)
            {
                // Get the data byte that alligns with the current
                // pattern byte and shifts the bits to the left by the
                // calculated bit shift value earlier.
                byte dataByte = (byte)(data[byteOffset + i] << bitShift);

                // If bitShift > 0, include bits from the next byte. This is
                // important for scenarios where, for example, we're shifting
                // by 3 bits, meaning that part of the data will come from the
                // next byte.
                if (byteOffset + i + 1 < data.Length && bitShift > 0)
                {
                    // Shifts the next byte to the right by the delta between 8
                    // and the calcualted bit shift value, aligning it with the
                    // remaining part of the data byte.
                    // Note: bitwise OR (|=) is used to combine the shifted parts
                    // so that we can perform a full byte comparison.
                    dataByte |= (byte)(data[byteOffset + i + 1] >> (8 - bitShift));
                }

                // Compare dataByte with the current byte in the pattern
                if (dataByte != pattern[i])
                {
                    // Not matching at position. No point in
                    // continuing.
                    return false;
                }
            }

            // All bits match
            return true;
        }
    }
}

Running this code will enable us to quickly detect the positions of data sequences that contain relevant information. For example, one of the observations about the film file is that we can spot XUID references by looking at the 0x2D 0xC0 pattern. If we use this pattern and run the tool across a set of film chunks we’ll see quite a few results:

Result of running the segment detection application.
Result of running the segment detection application.

How bit shifting works

Before we go any further, though, let me explain a bit the “magic” of bit shifting that you might’ve noticed in the program above. Let’s say we have a data array like this:

Byte IndexHex ValueBinary
00xAB10101011
10xCD11001101
20xEF11101111
30x1200010010

The pattern we want to look for is this:

Byte IndexHex ValueBinary
00xCD11001101
10xEF11101111

Let’s pick a random bit offset - 10. that means that we’re starting at the 10th bit in the data array. If we look at the IsBitMatch function, it takes the bit offset as an argument.

That means that if we pass 10 as the value, we get a byteOffset of 1, meaning that we skip an entire byte (just one) when looking for the data.

Now, keep in mind that when calculating byteOffset it was not a “clean” division - we have a remainder, that is helpfully captured by bitShift, and that remainder is equal to 2, which means that with the byte at index 1 (remember, we skipped the one at 0), we start with the third bit (skip first two, as bitShift tells us).

That can be visualized in a table like this:

Byte IndexHex ValueBinaryComment
00xAB10101011We’re skipping this entirely.
10xCD11001101We start comparing from the third bit.
20xEF11101111We’ll use the data from this bit to make sure we can build a full byte.
30x1200010010Used in comparison later.

Now, I mentioned that we start our parsing with the byte at index 1 at the third bit. Look at the binary representation for that byte:

11001101

We skip the first two bits, and shift the bits left, padding the “missing” bits with zeroes at the end:

00110100

Now, instead of using the zeroes, we can steal the two leading bits from the next byte in our sequence (at index 2 - that is, 0xEF). We shift it right by six bits to the right to get the top 2 bits (because that’s all we need to complete it), so that:

11101111

Becomes:

00000011

So now from the shifted bytes we have these two values:

00110100
00000011

Combining them gives us:

00110111

This binary value does not match the first value of our pattern (11001101), so the search will move on from the next offset, and so on.

Digging through the chunks

So now that we have an idea on how to look for data we can start looking at individual “envelopes” that contain player details. As I mentioned above, there are many chunks that are usually provided for a given film; however, the ones that capture specific events, like deaths, kills, or medal awards, are all aggregated in the last film chunk file, with the ChunkType of 3.

Within the very last chunk (of type 3) the events are usually structured like this:

HeaderGamertag (Unicode)PaddingTypeTimestampPaddingMedal MarkerPaddingMetadata (Medal Type)
12 bytes32 bytes15 bytes1 byte4 bytes3 bytes1 byte3 bytes1 byte

Note

Be careful with assuming that a gamertag is unique within a match. There were cases where the same match had a gamertag like MyGamertag and another MsMyGamertag - you can’t search just for MyGamertag as that will produce some unexpected results. You need to check that there are 12 preceding bytes of “header” (arbitrary given that I don’t know what they represent, but consistent for individual gamertags) exist and then the headers before that are 0x00 (I limit to 3 zero bytes). That way you can ensure that you are extracting a properly offset event.

Note

Some matches may not have a chunk of type 3 - that’s very likely a bug in the API. Without this chunk there is no timeline you can parse as easily. Additionally, it’s entirely possible that the chunk of type 3 doesn’t contain gamertag-associated data. Additional investigation is needed to understand that behavior.

If you are using a tool like 010 Editor and extract the binary data on a per-file basis (i.e., find the bit positions for the gamertag start and then extract the bytes into its own file from there), you can use the following extremely basic binary template to highlight the sequences for easier parsing:

struct HEADER
{
    char bytes[12];
};

struct GAMERTAG
{
    char bytes[32];
};

struct TYPE
{
    char bytes[1];
};

struct TIMESTAMP
{
    char bytes[4];
};

struct BUFF_PADDING
{
    char bytes[15];
};

struct PADDING
{
    char bytes[3];
};

struct MEDAL_MARKER
{
    char bytes[1];
};

local int offset = 0;

HEADER header <bgcolor=0x659157>;
offset += sizeof(HEADER);
FSeek(offset);

GAMERTAG gt <bgcolor=cGreen>;
offset += sizeof(GAMERTAG);
FSeek(offset);

BUFF_PADDING bp <bgcolor=cBlue>;
offset += sizeof(BUFF_PADDING);
FSeek(offset);

TYPE type <bgcolor=cYellow>;
offset += sizeof(TYPE);
FSeek(offset);

TIMESTAMP ts <bgcolor=cRed>;
offset += sizeof(TIMESTAMP);
FSeek(offset);

PADDING padding <bgcolor=cBlue>;
offset += sizeof(PADDING);
FSeek(offset);

MEDAL_MARKER mm <bgcolor=0xF7AF9D>;
offset += sizeof(MEDAL_MARKER);
FSeek(offset);

PADDING padding <bgcolor=cBlue>;
offset += sizeof(PADDING);
FSeek(offset);

MEDAL_MARKER mtype <bgcolor=0xFFC0CB>;
offset += sizeof(MEDAL_MARKER);
FSeek(offset);

The structure above is consistent across matches - I’ve extracted thousands of my own games and ran into minimal issues (with the exception of a few stray gamertags).

Extracting timeline metadata

Out of all the fields above, the most interesting to me is the metadata one. The metadata field (i.e., the medal type) is capturing numeric values that represent medals. The values are different from the medal mapping. There is no clear mapping between those and a human-readable JSON representation, so we need to infer them by looking at medal volume here and correlate with medals earned per match or through a player’s career. Andy Curtis did the heavy lifting on this for some medals in his SPNKr project (a few are pending additional research).

The following medals are currently known:

Medal IDMedal
0Double Kill
1Triple Kill
2Overkill
3Killtacular
4Killtrocity
5Killamanjaro
6Killtastrophe
7Killpocalypse
8Killionaire
9Killing Spree
10Killing Frenzy
11Running Riot
12Rampage
13Perfection
26Killjoy
27Nightmare
28Boogeyman
29Grim Reaper
30Demon
31Flawless Victory
32Steaktacular
36Stopped Short
37Flag Joust
38Goal Line Stand
39Necromancer
43Ace
44Extermination
45Sole Survivor
46Untainted
47Blight
48Disease
49Plague
51Pestilence
53Culling
54Cleansing
55Purge
56Purification
57Divine Intervention
58Zombie Slayer
59Undead Hunter
60Hell’s Janitor
61The Sickness
62Spotter
63Treasure Hunter
64Saboteur
65Wingman
66Wheelman
67Gunner
68Driver
69Pilot
70Tanker
71Rifleman
72Bomber
73Grenadier
74Boxer
75Warrior
76Gunslinger
77Scattergunner
78Sharpshooter
79Marksman
80Heavy
81Bodyguard
82Back Smack
83Nuclear Football
84Boom Block
85Bulltrue
86Cluster Luck
87Dogfight
88Harpoon
89Mind the Gap
90Ninja
91Odin’s Raven
92Pancake
93Quigley
94Remote Detonation
95Return to Sender
96Rideshare
97Skyjack
98Stick
99Tag & Bag
100Whiplash
101Kong
102Autopilot Engaged
103Sneak King
104Windshield Wiper
105Reversal
106Hail Mary
107Nade Shot
108Snipe
109Perfect
110Bank Shot
111Fire & Forget
112Ballista
113Pull
114No Scope
115Achilles Spine
116Grand Slam
117Guardian Angel
118Interlinked
119Death Race
120Chain Reaction
121360
122Combat Evolved
123Deadly Catch
124Driveby
125Fastball
126Flyin’ High
127From the Grave
128From the Void
129Grapple-jack
130Hold This
131Last Shot
132Lawnmower
133Mount Up
134Off the Rack
135Quick Draw
137Pineapple Express
138Ramming Speed
139Reclaimer
140Shot Caller
141Yard Sale
142Special Delivery
146Fumble
148Straight Balling
151Always Rotating
152Hill Guardian
153Clock Stop
154Secure Line
156Splatter
162All That Juice
163Great Journey
165Breacher
166Mounted & Loaded
167Monopoly
168Counter-snipe
174Driving Spree
175Death Cabbie
176Immortal Chauffeur
177Blind Fire
178Hang Up
179Call Blocked
180Clear Reception

The event type, also captured in the envelope, can be one of the following:

Type (Decimal)Description
10Mode-specific events (e.g., captured the flag, killed the carrier, stole the flag)
20Death
50Kill

Note

Any other type identifier (such as 51, 100, or 250) that you may see here, when associated with a medal, is representative of the medal sorting weight. It maps 1:1 to the information that you can get from the medal metadata endpoint.

Timestamp data is represented in milliseconds from the start of the match. You can obtain a readable value with a C# snippet like this:

Array.Reverse(timestampBytes);
var timestamp = BitConverter.ToUInt32(timestampBytes, 0);

One thing that I haven’t yet figured out is how assists are tracked within the event batch. It’s likely captured as a XUID reference further in the event envelope that I didn’t get to. This will be a topic for another blog post in the future as we dig more through the film file format.

Finding the gamertags

Notice that to extract all events from the last chunk one specific thing is still needed - we need to start with knowing the gamertags for which the events should be extracted. And because gamertags are technically arbitrary text, we need to find an index somewhere. To do that, we can look inside all other chunks (other than ones of type 3). That’s right, for us to get the list of gamertags that were involved in a given game we need to download and parse all existing film chunks other than the very last one that has ChunkType set to 3.

The last chunk contains information on all players in the game but doesn’t seem to contain a very clear XUID and Gamertag combination that will allow us to extract them cleanly. Luckily, inside all other chunks (where ChunkType is either 1 or 2), the gamertags and XUIDs can be found by looking at the pattern: 0x2D 0xC0. From that pattern, we can deduce the following structure:

Gamertag (Unicode)PaddingXUIDMarker 1Marker 2
Dynamic length (32 bytes max)21 bytes8 bytes0x2D0xC0

Note

Keep in mind that gamertags are stored as Unicode (UTF-16) text. This means that the padding can be deceiving if you are looking at the binary file - you might think that there are 22 0x00 bytes before the gamertag value, when in fact the last zero byte is just the trailing byte for the gamertag text. Make sure to be careful when parsing the values.

We can scan all film chunks for this pattern by identifying the markers, getting the XUID, checking that the preceding 21 bytes are 0x00 (padding zero bytes), and then grab 32 bytes of the gamertag data that can be parsed as a Unicode string. There are more safeguards we can put in place for this logic, but ultimately it’s good enough to extract the basic data.

Once the data is extracted into, say, a dictionary, we can use that as a starting point to look up gamertags in the final (summary) chunk.

Note

As I mentioned earlier, depending on the matches that you are getting, some of them might not have a chunk with ChunkType equal to 3. Others can return HTTP 404 (blob does not exist) errors when attempting to download a chunk. The former may be a bug. The latter is likely caused by the folks at 343 occasionally cleaning up the storage from older matches.

In C#, the extraction logic can be formalized as such:

public static byte[] ExtractBitsFromPosition(byte[] data, int startBitPosition, int bitLength)
{
    // Calculate the actual end bit position
    int endBitPosition = startBitPosition + bitLength - 1;

    // Validate input parameters
    if (startBitPosition < 0 || endBitPosition < 0 || startBitPosition >= data.Length * 8 || endBitPosition >= data.Length * 8 || startBitPosition > endBitPosition)
    {
        throw new ArgumentOutOfRangeException("Bit positions are out of range or invalid.");
    }

    // Calculate the byte offset and bit shift for the start position
    int startByteOffset = startBitPosition / 8;
    int startBitShift = startBitPosition % 8;

    // Calculate the byte offset and bit shift for the end position
    int endByteOffset = endBitPosition / 8;
    int endBitShift = endBitPosition % 8;

    // Calculate the number of bytes to extract
    int byteCount = endByteOffset - startByteOffset + 1;

    // If there's no bit shift, we can return from the byte offset onward
    if (startBitShift == 0 && endBitShift == 0)
    {
        byte[] result = new byte[byteCount];
        Array.Copy(data, startByteOffset, result, 0, byteCount);
        return result;
    }

    // Otherwise, we need to shift the bits manually
    byte[] extractedData = new byte[byteCount];

    // Go byte by byte, shift and copy
    for (int i = 0; i < byteCount - 1; i++)
    {
        // Shift the current byte and take bits from the next byte if needed
        extractedData[i] = (byte)((data[startByteOffset + i] << startBitShift) | (data[startByteOffset + i + 1] >> (8 - startBitShift)));
    }

    // Handle the last byte (since it has no next byte to pull from)
    extractedData[byteCount - 1] = (byte)(data[startByteOffset + byteCount - 1] << startBitShift);

    // Mask the last byte to only include bits up to endBitShift
    extractedData[byteCount - 1] &= (byte)(0xFF >> (7 - endBitShift));

    return extractedData;
}

Recall that the data may or may not be byte-aligned so we need to operate on individual bits. In turn, once we find the marker pattern in film segment chunks (as we try to spot the gamertag and XUID combos), we can extract it with a function like this (where pattern is set to 0x2D 0xC0):

public static void ProcessData(byte[] data, byte[] pattern)
{
    List<int> patternPositions = FindPattern(data, pattern);

    foreach (int patternPosition in patternPositions)
    {
        int xuidStartPosition = patternPosition - 8 * 8;
        byte[] xuid = ExtractBitsFromPosition(data, xuidStartPosition, 8*8);
        var convertedXuid = ConvertBytesToInt64(xuid);

        if (convertedXuid != 0)
        {
            int prePatternPosition = xuidStartPosition - 21 * 8;
            var bytePrefixValidated = AreAllBytesZero(data, prePatternPosition, 21 * 8);

            if (bytePrefixValidated)
            {
                Console.WriteLine($"XUID: {convertedXuid}");
                byte[] undefinedData = ExtractBitsFromPosition(data, prePatternPosition - 32 * 8, 32 * 8);
                Console.WriteLine($"Undefined Data (until 0x00 0x00): {ConvertBytesToText(undefinedData)}");
            }
        }
    }

    Console.ReadLine();
}

To simplify how I extract the data, I built a tool called OpenSpartan/film-event-extractor which will let you log in with your Xbox Live ID and aggregate all match data within a local SQLite database. The entire parsing logic is very much in flux (feel free to follow the discussion on this), but once it stabilizes I can see integrating this better in OpenSpartan Workshop.

For my own account, having played more than seven thousand matches, the entire aggregation took around 48 hours. I haven’t yet optimized (and parallelized) the code, so this can be attributed to also me building a slower-than-needed tool, but it works for now and I can start analyzing the data.

The data that is available through the API is mostly good as-is, but an expanded dataset that accounts for film-based details enables me to see two things more clearly:

  • Mapping between gamertags and XUIDs at the time of the match (gamertags are mutable as users can change them, XUIDs are immutable). This way I don’t need to worry about doing out-of-band conversion to get an understanding of who I played against, since the match details API only returns XUIDs.
  • Times when specific events occur in-game. I can see how quickly I earn the first medal in the game, or how quickly I get to the first kill or death.

What’s next

There are are a few improvements that I want to make to both the open-source tool that I built as well as to my understanding of the film files. I alluded to assists earlier - that’s a data point that I definitely want to cover. Additionally, film files may contain the data required for us to build heatmaps of map movement. For that, we need to better try and replicate behaviors in the game - that is, understand how binary data changes with movement, weapon switches, use of grenades, and so on. Something tells me it will be a much more protracted project than I initially anticipated 🤔