Computer Vision · Python · 2026

Barbell Speed Tracking

A Computer Vision Investigation

Velocity-based training uses barbell speed as a real-time proxy for fatigue and effort. Commercial apps exist, but are paywalled or limited in scope. This project is an investigation into building a free, open alternative — starting from raw video and arriving at per-rep velocity estimates using pose detection, signal smoothing, and automated phase detection. I wanted something fast, free, and mine. So I built it.

The pipeline — click a stage to jump
Results✓ Deadlift — working⚠ Squat — constrained⚠ Bench press — constrained
View on GitHub

Simplifying the Problem: Start With the Deadlift

The three powerlifting movements present different challenges for computer vision. In a squat, the lifter's wrists can be occluded by the barbell plates or rack posts. In a bench press, the lifter is horizontal and their head may be out of frame. But in a deadlift, the athlete is fully upright and visible throughout the lift. The barbell path is nearly vertical. The wrists stay in view.

Starting with the deadlift was a deliberate choice to simplify the problem. A clean baseline on the easiest movement is more useful than a fragile solution stretched across all three. Once the deadlift pipeline was working, generalizing would reveal exactly which assumptions broke down for the other lifts. This turned out to be exactly right.

YOLO Pose inference on a deadlift — wrist keypoints overlaid on the lifter
01 / 05Detection

Finding the Right Detector

YOLO Pose wrist keypoint tracking overlay on a 5-rep deadlift set

Rather than tracking the barbell directly, the approach shifted to tracking the lifter's wrists. YOLO Pose is a variant of YOLO trained to detect human keypoints — joints including wrists, elbows, shoulders, hips, knees, and ankles — from a single frame. It produces a keypoint confidence score per joint per frame, so the pipeline degrades gracefully under occlusion: it flags rather than guesses.

YOLO Pose tracks 17 body keypoints per frame. Wrists grip the bar and move with it near-exactly — a stable proxy without needing to detect the barbell itself. Testing across model sizes showed that stability improved significantly at larger sizes; since this runs on offline video, inference speed is no constraint.

What I tried first
YOLO (direct) ✗
The bounding box jumped frame-to-frame and misfired onto equipment. Heavy post-processing still left it unreliable. Tracking the barbell directly as an object proved unstable.
SAM2 ✗
Point it at the barbell once, let it track through the set — an appealing idea. In practice, on real gym footage, tracking would drift, lose the object, or latch onto the wrong region mid-lift. Variable lighting and cluttered backgrounds were enough to break it.
YOLO Pose → wrists ✓
Pose models are trained on millions of humans in every orientation — vastly more robust than object-tracking on a barbell-shaped object in a gym environment. Wrists grip the bar and move with it almost exactly.
02 / 05Smoothing

Taming the Signal

Raw vs. smoothed wrist position signal with concentric phases highlighted

Raw keypoint positions jitter frame-to-frame — noise that gets amplified when differentiated into velocity. Velocity is the finite difference of position: how far the wrist moved between frames, divided by the time between them. Any noise in position becomes a spike in velocity.

The Savitzky-Golay filter works by fitting a polynomial to a sliding window of data points, then using the fitted value at the center as the smoothed output. Think of it as a local curve-fitter rather than a blunt averager. The key advantage over a simple moving average is that it preserves peak shape. A moving average flattens peaks by averaging them with their neighbors — systematically underestimating peak bar speed, exactly the metric VBT depends on. Savitzky-Golay avoids this.

03 / 05Phase Detection

Isolating the Concentric Phase

A deadlift session video contains more than just the upward pulls — there are setup movements, pauses at lockout, the descent, and idle time between sets. For VBT, only the concentric phase matters: the upward portion of each rep where the bar is actively lifted.

PELT was applied not to the raw position signal but to the velocity signal — the first difference of smoothed position. Applied to position, a sustained upward movement looks like many small constant-mean pieces, so PELT over-segments. On velocity, a clean concentric phase is a single near-zero-mean segment, making it one coherent piece to find.

Three post-processing steps follow the raw PELT output: merge adjacent candidates above velocity threshold, extend phase boundaries while velocity supports it, and reject candidates below 50% of median candidate velocity.

Velocity signal with PELT-detected concentric phases as shaded regions

Each shaded region is one rep's concentric phase — automatically detected, no manual labelling.

Why not scipy peak-trough detection?

The obvious approach was scipy's peak-trough detection — find local maxima and minima in the position signal, infer lifting phases from the intervals between them. It works in principle, but proved sensitive to noise in the signal and required careful parameter tuning tied to signal properties that shift between sessions. PELT on the velocity signal is more robust: it finds statistically distinct constant-mean pieces rather than relying on absolute extrema.

How PELT changepoint detection works

PELT (Pruned Exact Linear Time) segments a signal into statistically distinct constant-mean pieces. A penalty term controls how many segments are allowed — lower penalty allows more segments; higher forces fewer, coarser divisions.

Applied to position, a single upward pull looks like many small constant-mean pieces — PELT over-segments. Applied to velocity, a clean concentric phase is a single near-zero-mean segment (bar moving at roughly steady speed) — exactly one piece for PELT to find.

04 / 05Output

Extracting Per-Rep Velocity

Per-rep velocity (px/sec) across a working set — bar chart showing velocity trend across reps

The pipeline produces velocity in pixels per second — how fast the wrist, and by proxy the barbell, is moving in the frame. This is not directly comparable across different lifters, camera distances, or angles. But for a single lifter using a consistent setup, it is a perfectly valid relative metric: Rep A was faster than Rep B, Set 3 was slower than Set 1. That is the signal VBT cares about.

For now, rep time in seconds also serves as a sufficient proxy. If the lift setup is held constant, rep time correlates directly with velocity — with the caveat that variations with different ranges of motion (sumo vs. conventional deadlift) are not comparable at equal bar speeds.

The m/s calibration problem

Converting px/sec to meters per second requires knowing how many pixels correspond to a real-world meter in the frame — the camera's scale factor. This varies with camera distance, focal length, and lens distortion.

The clean solution is to film with a calibration reference: a checkerboard pattern of known dimensions placed in the frame. Standard practice in camera calibration — it works, but adds friction to every recording session.

Without calibration, a rougher proxy is the lifter's known range of motion. If the lifter knows their pull is 60 cm and the bar travels 480 pixels, the scale factor follows. But this requires measuring or estimating range of motion per session, and doesn't account for lens distortion. This remains an open problem.

05 / 05Results

Results

Deadlift

Per-rep velocity for deadlift set — rep 2 is a false positive from the lifter stepping away from the bar

Rep 2 is detected but corresponds to the lifter stepping away from the bar, not a working set rep — a false positive from the phase detection step.

The deadlift is the right baseline: the lifter is fully upright and visible throughout the lift, the barbell path is nearly vertical, and the wrists stay in view from setup to lockout. There are no occlusion events and no ambiguity about which phase is the concentric pull.

The pipeline is reliable on this movement. YOLO Pose tracks the wrist keypoints stably across frames, Savitzky-Golay smoothing preserves the velocity peak without flattening it, and PELT cleanly segments each rep's concentric phase with minimal post-processing intervention.

Deadlift pipeline output — annotated wrist keypoint overlay

Squat

Concentric phase detection during occlusion failure

The squat pipeline worked, with one important constraint: the lifter's wrists must remain in frame throughout the lift. At the bottom of a squat, the wrists can be occluded by the barbell plates or rack posts depending on camera angle. When that happens, the YOLO Pose model loses confidence on the wrist keypoints, and tracking falters for those frames.

The practical fix is a camera placement that keeps the wrists visible throughout — typically slightly elevated and to the side. This is a reasonable constraint for a controlled training setup.

Squat — wrist occlusion failure

Bench press

Per-rep velocity for bench press — side-on camera failure

The bench press exposed a deeper problem. When a lifter is lying down on the bench, the standard side-on camera angle means the lifter's head is usually not in frame. This matters more than it initially seems.

YOLO Pose is a top-down detector: it first finds a bounding box around a person, then runs keypoint estimation within that box. The person detector is trained predominantly on upright or near-upright humans. A lifter lying horizontal, seen from the side, with no visible head, is substantially out-of-distribution for the detector. The failure happens before keypoint extraction even runs — the person bounding box is not found reliably, so no keypoints are extracted.

The fix was a high camera angle: positioning the camera overhead (or at a steep diagonal) so that the lifter's head and wrists are both visible throughout the lift. This brings the body orientation back into distribution for the person detector.

Bench press — detection failure on lying-down athlete