返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

mediapipe-pose-detection

MediaPipe姿态检测专业知识。在调试地标跟踪、调整置信度阈值、修复姿态检测问题、使用pose.py和video_io.py,或通过手动观察验证姿态检测时使用。

person作者: jakexiaohubgithub

MediaPipe Pose Detection

Key Landmarks for Jump Analysis

Lower Body (Primary for Jumps)

| Landmark | Left Index | Right Index | Use Case | | -------- | ---------- | ----------- | --------------------------- | | Hip | 23 | 24 | Center of mass, jump height | | Knee | 25 | 26 | Triple extension, landing | | Ankle | 27 | 28 | Ground contact detection | | Heel | 29 | 30 | Takeoff/landing timing | | Toe | 31 | 32 | Forefoot contact |

Upper Body (Secondary)

| Landmark | Left Index | Right Index | Use Case | | -------- | ---------- | ----------- | ------------------ | | Shoulder | 11 | 12 | Arm swing tracking | | Elbow | 13 | 14 | Arm action | | Wrist | 15 | 16 | Arm swing timing |

Reference Points

| Landmark | Index | Use Case | | --------- | ----- | ---------------- | | Nose | 0 | Head position | | Left Eye | 2 | Face orientation | | Right Eye | 5 | Face orientation |

Confidence Thresholds

Default Settings

min_detection_confidence = 0.5  # Initial pose detection
min_tracking_confidence = 0.5   # Frame-to-frame tracking

Quality Presets (auto_tuning.py)

| Preset | Detection | Tracking | Use Case | | ---------- | --------- | -------- | ---------------------------------- | | fast | 0.3 | 0.3 | Quick processing, tolerates errors | | balanced | 0.5 | 0.5 | Default, good accuracy | | accurate | 0.7 | 0.7 | Best accuracy, slower |

Tuning Guidelines

  • Increase thresholds when: Jittery landmarks, false detections
  • Decrease thresholds when: Missing landmarks, tracking loss
  • Typical adjustment: ±0.1 increments

Common Issues and Solutions

Landmark Jitter

Symptoms: Landmarks jump erratically between frames

Solutions:

  1. Apply Butterworth low-pass filter (cutoff 6-10 Hz)
  2. Increase tracking confidence
  3. Use One-Euro filter for real-time applications
# Butterworth filter (filtering.py)
from kinemotion.core.filtering import butterworth_filter
smoothed = butterworth_filter(landmarks, cutoff=8.0, fps=30)

# One-Euro filter (smoothing.py)
from kinemotion.core.smoothing import one_euro_filter
smoothed = one_euro_filter(landmarks, min_cutoff=1.0, beta=0.007)

Left/Right Confusion

Symptoms: MediaPipe swaps left and right landmarks mid-video

Cause: Occlusion at 90° lateral camera angle

Solutions:

  1. Use 45° oblique camera angle (recommended)
  2. Post-process to detect and correct swaps
  3. Use single-leg tracking when possible

Tracking Loss

Symptoms: Landmarks disappear for several frames

Causes:

  • Athlete moves out of frame
  • Fast motion blur
  • Occlusion by equipment/clothing

Solutions:

  1. Ensure full athlete visibility throughout video
  2. Use higher frame rate (60+ fps)
  3. Interpolate missing frames (up to 3-5 frames)
# Simple linear interpolation for gaps
import numpy as np
def interpolate_gaps(landmarks, max_gap=5):
    # Fill NaN gaps with linear interpolation
    for i in range(landmarks.shape[1]):
        mask = np.isnan(landmarks[:, i])
        if mask.sum() > 0 and mask.sum() <= max_gap:
            landmarks[:, i] = np.interp(
                np.arange(len(landmarks)),
                np.where(~mask)[0],
                landmarks[~mask, i]
            )
    return landmarks

Low Confidence Scores

Symptoms: Visibility scores consistently below threshold

Causes:

  • Poor lighting (backlighting, shadows)
  • Low contrast clothing vs background
  • Partial occlusion

Solutions:

  1. Improve lighting (front-lit, even)
  2. Ensure clothing contrasts with background
  3. Remove obstructions from camera view

Video Processing (video_io.py)

Rotation Handling

Mobile videos often have rotation metadata that must be handled:

# video_io.py handles this automatically
# Reads EXIF rotation and applies correction
from kinemotion.core.video_io import read_video_frames

frames, fps, dimensions = read_video_frames("mobile_video.mp4")
# Frames are correctly oriented regardless of source

Manual Rotation (if needed)

# FFmpeg rotation options
ffmpeg -i input.mp4 -vf "transpose=1" output.mp4  # 90° clockwise
ffmpeg -i input.mp4 -vf "transpose=2" output.mp4  # 90° counter-clockwise
ffmpeg -i input.mp4 -vf "hflip" output.mp4        # Horizontal flip

Frame Dimensions

Always read actual frame dimensions from first frame, not metadata:

# Correct approach
cap = cv2.VideoCapture(video_path)
ret, frame = cap.read()
height, width = frame.shape[:2]

# Incorrect (may be wrong for rotated videos)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

Coordinate Systems

MediaPipe Output

  • Normalized coordinates: (0.0, 0.0) to (1.0, 1.0)
  • Origin: Top-left corner
  • X: Left to right
  • Y: Top to bottom
  • Z: Depth (relative, camera-facing is negative)

Conversion to Pixels

def normalized_to_pixel(landmark, width, height):
    x = int(landmark.x * width)
    y = int(landmark.y * height)
    return x, y

Visibility Score

Each landmark has a visibility score (0.0-1.0):

  • 0.5: Likely visible and accurate

  • < 0.5: May be occluded or estimated
  • = 0.0: Not detected

Debug Overlay (debug_overlay.py)

Skeleton Drawing

# Key connections for jump visualization
POSE_CONNECTIONS = [
    (23, 25), (25, 27), (27, 29), (27, 31),  # Left leg
    (24, 26), (26, 28), (28, 30), (28, 32),  # Right leg
    (23, 24),                                  # Hips
    (11, 23), (12, 24),                       # Torso
]

Color Coding

| Element | Color (BGR) | Meaning | | -------------- | ------------- | ------------------------- | | Skeleton | (0, 255, 0) | Green - normal tracking | | Low confidence | (0, 165, 255) | Orange - visibility < 0.5 | | Key angles | (255, 0, 0) | Blue - measured angles | | Phase markers | (0, 0, 255) | Red - takeoff/landing |

Performance Optimization

Reducing Latency

  1. Use model_complexity=0 for fastest inference
  2. Process every Nth frame for batch analysis
  3. Use GPU acceleration if available
import mediapipe as mp

pose = mp.solutions.pose.Pose(
    model_complexity=0,      # 0=Lite, 1=Full, 2=Heavy
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5,
    static_image_mode=False  # False for video (uses tracking)
)

Memory Management

  • Release pose estimator after processing: pose.close()
  • Process videos in chunks for large files
  • Use generators for frame iteration

Integration with kinemotion

File Locations

  • Pose estimation: src/kinemotion/core/pose.py
  • Video I/O: src/kinemotion/core/video_io.py
  • Filtering: src/kinemotion/core/filtering.py
  • Smoothing: src/kinemotion/core/smoothing.py
  • Auto-tuning: src/kinemotion/core/auto_tuning.py

Typical Pipeline

Video → read_video_frames() → pose.process() → filter/smooth → analyze

Manual Observation for Validation

During development, use manual frame-by-frame observation to establish ground truth and validate pose detection accuracy.

When to Use Manual Observation

  1. Algorithm development: Validating new phase detection methods
  2. Parameter tuning: Comparing detected vs actual frames
  3. Debugging: Investigating pose detection failures
  4. Ground truth collection: Building validation datasets

Ground Truth Data Collection Protocol

Step 1: Generate Debug Video

uv run kinemotion cmj-analyze video.mp4 --output debug.mp4

Step 2: Manual Frame-by-Frame Analysis

Open debug video in a frame-stepping tool (QuickTime, VLC with frame advance, or video editor).

Step 3: Record Observations

For each key phase, record the frame number where the event occurs:

=== MANUAL OBSERVATION: PHASE DETECTION ===

Video: ________________________
FPS: _____ Total Frames: _____

PHASE DETECTION (frame numbers)
| Phase | Detected | Manual | Error | Notes |
|-------|----------|--------|-------|-------|
| Standing End | ___ | ___ | ___ | |
| Lowest Point | ___ | ___ | ___ | |
| Takeoff | ___ | ___ | ___ | |
| Peak Height | ___ | ___ | ___ | |
| Landing | ___ | ___ | ___ | |

LANDMARK QUALITY (per phase)
| Phase | Hip Visible | Knee Visible | Ankle Visible | Notes |
|-------|-------------|--------------|---------------|-------|
| Standing | Y/N | Y/N | Y/N | |
| Countermovement | Y/N | Y/N | Y/N | |
| Flight | Y/N | Y/N | Y/N | |
| Landing | Y/N | Y/N | Y/N | |

Phase Detection Criteria

Standing End: Last frame before downward hip movement begins

  • Look for: Hip starts descending, knees begin flexing

Lowest Point: Frame where hip reaches minimum height

  • Look for: Deepest squat position, hip at lowest Y coordinate

Takeoff: First frame where both feet leave ground

  • Look for: Toe/heel landmarks separate from ground plane
  • Note: May be 1-2 frames after visible liftoff due to detection lag

Peak Height: Frame where hip reaches maximum height

  • Look for: Hip at highest Y coordinate during flight

Landing: First frame where foot contacts ground

  • Look for: Heel or toe landmark touches ground plane
  • Note: Algorithm may detect 1-2 frames late (velocity-based)

Landmark Quality Assessment

For each landmark, observe:

| Quality | Criteria | | ----------- | ---------------------------------------------------- | | Good | Landmark stable, positioned correctly on body part | | Jittery | Landmark oscillates ±5-10 pixels between frames | | Offset | Landmark consistently displaced from actual position | | Lost | Landmark missing or wildly incorrect | | Swapped | Left/right landmarks switched |

Recording Observations Format

When validating, provide structured data:

## Ground Truth: [video_name]

**Video Info:**
- Frames: 215
- FPS: 60
- Duration: 3.58s
- Camera: 45° oblique

**Phase Detection Comparison:**

| Phase | Detected | Manual | Error (frames) | Error (ms) |
|-------|----------|--------|----------------|------------|
| Standing End | 64 | 64 | 0 | 0 |
| Lowest Point | 91 | 88 | +3 (late) | +50 |
| Takeoff | 104 | 104 | 0 | 0 |
| Landing | 144 | 142 | +2 (late) | +33 |

**Error Analysis:**
- Mean absolute error: 1.25 frames (21ms)
- Bias detected: Landing consistently late
- Accuracy: 2/4 perfect, 4/4 within ±3 frames

**Landmark Issues Observed:**
- Frame 87-92: Hip jitter during lowest point
- Frame 140-145: Ankle tracking unstable at landing

Acceptable Error Thresholds

At 60fps (16.67ms per frame):

| Error Level | Frames | Time | Interpretation | | ----------- | ------ | ----- | --------------------------------- | | Perfect | 0 | 0ms | Exact match | | Excellent | ±1 | ±17ms | Within human observation variance | | Good | ±2 | ±33ms | Acceptable for most metrics | | Acceptable | ±3 | ±50ms | May affect precise timing metrics | | Investigate | >3 | >50ms | Algorithm may need adjustment |

Bias Detection

Look for systematic patterns across multiple videos:

| Pattern | Meaning | Action | | -------------------- | ----------------------- | ------------------------ | | Consistent +N frames | Algorithm detects late | Adjust threshold earlier | | Consistent -N frames | Algorithm detects early | Adjust threshold later | | Variable ±N frames | Normal variance | No action needed | | Increasing error | Tracking degrades | Check landmark quality |

Integration with basic-memory

Store ground truth observations:

# Save validation results
write_note(
    title="CMJ Phase Detection Validation - [video_name]",
    content="[structured observation data]",
    folder="biomechanics"
)

# Search previous validations
search_notes("phase detection ground truth")

# Build context for analysis
build_context("memory://biomechanics/*")

Example: CMJ Validation Study Reference

See basic-memory for complete validation study:

  • biomechanics/cmj-phase-detection-validation-45deg-oblique-view-ground-truth
  • biomechanics/cmj-landing-detection-bias-root-cause-analysis
  • biomechanics/cmj-landing-detection-impact-vs-contact-method-comparison

Key findings from validation:

  • Standing End: 100% accuracy (0 frame error)
  • Takeoff: ~0.7 frame mean error (excellent)
  • Lowest Point: ~2.3 frame mean error (variable)
  • Landing: +1-2 frame consistent bias (investigate)