How to extract every Nth frame from a video for ML training

60 videos. 210,000 frames. About 0.5% of them matter.

That’s the reality of building a goal detector from highlight footage. The class imbalance is brutal — roughly half a percent of frames show a goal animation. The other 99.5% are exactly what you don’t want to trigger your lights.

This is how I got the frames out.

Why not save every frame

A 10-minute highlight reel at 30fps produces 18,000 frames. Across 60 videos that’s over a million frames if you save everything. At even modest file sizes that’s hundreds of gigabytes of data, most of it identical or near-identical.

Sampling every 10th frame cuts that down to a manageable 210,000 without losing meaningful coverage. Goal animations last ~5 seconds, which at 30fps with a 10-frame interval still gives you around 15 frames of the animation per goal. That’s enough.

If you’re working with faster-moving events you might need a tighter interval. For a relatively slow-changing broadcast graphic, every 10th frame is fine.

Why crop instead of saving the full frame

The full frame at 1280×720 contains a lot of information the classifier doesn’t need. The ice, the players, the crowd — none of that tells you whether a goal was just scored. The goal animation appears in the top-left corner of the Sportsnet broadcast, overlaid on the score bug area.

So instead of saving the full frame, I crop a 400×100 pixel region from the top-left corner — coordinates x=10, y=10, w=400, h=100. That covers the score area plus a bit of surrounding background.

This does a few useful things. It dramatically reduces file size. It focuses the classifier on the region that actually matters. And it means the model never has to learn to ignore the ice surface, the crowd, or anything else that varies wildly between frames.

These coordinates are specific to the Sportsnet broadcast. If you’re working with a different broadcaster or resolution you’ll need to adjust them. The easiest way is to extract a single frame, open it in an image viewer that shows pixel coordinates, and click around the score bug area until you have the right bounds.

The script

import cv2
import os

# Root directory holding video folders
ROOT_DIR = "Highlights/Sportsnet"

# Frame interval (save every Nth frame)
FRAME_INTERVAL = 10

# ROI coordinates — Sportsnet 720p score bug area
x, y, w, h = 10, 10, 400, 100

def extract_frames_from_video(video_path, output_base_dir):
    """Extract frames from a single video."""
    video_name = os.path.splitext(os.path.basename(video_path))[0]
    output_dir = os.path.join(output_base_dir, video_name)
    os.makedirs(output_dir, exist_ok=True)

    print(f"\n=== Processing video: {video_name} ===")

    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"❌ Error: Could not open {video_path}")
        return

    frame_count = 0
    saved_count = 0

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        if frame_count % FRAME_INTERVAL == 0:
            cropped = frame[y:y+h, x:x+w]

            filename = os.path.join(
                output_dir,
                f"{video_name}_frame_{saved_count:06d}.jpg"
            )

            cv2.imwrite(filename, cropped)
            saved_count += 1

        frame_count += 1

    cap.release()
    print(f"✅ {saved_count} frames saved for {video_name}")


def main():
    print("=== Batch Frame Extractor ===")

    OUTPUT_ROOT = "Highlights/Sportsnet/Frames"
    os.makedirs(OUTPUT_ROOT, exist_ok=True)

    videos = [
        f for f in os.listdir(ROOT_DIR)
        if f.lower().endswith(".mp4")
    ]

    if not videos:
        print("❌ No videos found.")
        return

    for video in videos:
        video_path = os.path.join(ROOT_DIR, video)
        extract_frames_from_video(video_path, OUTPUT_ROOT)

    print("\n🎉 All videos processed!")


if __name__ == "__main__":
    main()

A few things worth noting in the structure:

Each video gets its own subfolder inside Frames/ named after the video file. That means if something goes wrong mid-run you can see exactly where it stopped, and you can re-run a single video without touching the rest.

The saved_count zero-padded to six digits (frame_{saved_count:06d}) keeps the filenames sortable. Useful when you’re manually reviewing frames later.

The script processes whatever .mp4 files it finds in ROOT_DIR — so the organized folder structure from the download script feeds directly into this one.

What I ended up with

60 Sportsnet highlight reels produced just over 210,000 cropped frames. Of those, roughly 200,000 are negative — no goal, just scoreboard, gameplay, or replay. The rest are positive — the goal animation visible in the crop.

That’s approximately 0.5% positive. One in two hundred frames shows what I’m trying to detect.

That imbalance is going to be a real problem when it comes to training. A model that predicts “no goal” for every single frame would be right 99.5% of the time — and completely useless. Handling that is the next problem.

Next: sorting and labelling 210,000 frames — and how I dealt with the class imbalance problem.