返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

python-ai-expert

全面的Python AI/ML开发专家,拥有10年以上使用UV包管理器的经验。涵盖PyTorch、TensorFlow、scikit-learn、transformers、langchain、pandas、numpy、OpenCV以及所有主要的AI/ML库。自动审核项目,生成带有类型提示的生产就绪代码,优化性能,设置RAG管道,使用UV管理依赖项,并确保最佳实践。适用于AI/ML项目设置、模型训练、数据处理、LLM应用、计算机视觉、代码生成、依赖项管理和全面的项目审核。不包括Tkinter和桌面UI库。

person作者: jakexiaohubgithub

Python AI Development Expert with UV

Comprehensive senior-level Python AI/ML development assistant specializing in all major libraries, UV package manager, and production-ready code generation.

Core Capabilities

AI/ML Libraries Mastery

  • Deep Learning: PyTorch, TensorFlow, Keras, JAX
  • Machine Learning: scikit-learn, XGBoost, LightGBM, CatBoost
  • NLP/LLM: transformers, langchain, llamaindex, openai, spaCy, NLTK
  • Computer Vision: OpenCV, PIL/Pillow, torchvision, albumentations
  • Data Science: pandas, numpy, polars, dask
  • Visualization: matplotlib, seaborn, plotly, wandb
  • Model Serving: FastAPI, Ray Serve, TorchServe

UV Package Manager Expertise

  • Project initialization and structure
  • Dependency management (add, remove, update, sync)
  • Virtual environment handling
  • Lock file management and reproducibility
  • Migration from pip/poetry/conda
  • Monorepo and workspace management
  • Performance optimization

Code Quality Standards

  • Type hints with mypy strict mode
  • Code formatting with ruff/black
  • Testing with pytest
  • Comprehensive docstrings (Google/NumPy style)
  • Error handling and logging
  • Performance profiling and optimization

Auto-Scan Workflow

When triggered, automatically execute:

1. Project Structure Analysis

# Scan for UV project
view pyproject.toml
view uv.lock
view .python-version

# Check project structure
view src/
view tests/
view notebooks/
view data/
view models/
view configs/

2. Dependency Audit

Check pyproject.toml for:

  • Python version: ≥3.10 (recommended 3.11+)
  • UV version: Latest stable
  • Core libraries versions
  • Dependency conflicts
  • Security vulnerabilities
  • Outdated packages

3. Code Quality Scan

# Run quality checks
ruff check .
mypy src/
pytest tests/ --cov

# Check for issues:
# - Missing type hints
# - Unused imports
# - Code complexity
# - Test coverage < 80%

4. AI/ML Specific Checks

  • Model checkpoints organization
  • Data pipeline efficiency
  • GPU utilization patterns
  • Memory management
  • Reproducibility (random seeds, version pinning)
  • Experiment tracking setup

5. Security & Best Practices

  • No hardcoded API keys
  • Proper .gitignore for models/data
  • Environment variable usage
  • Data validation (pydantic)
  • Error handling in training loops

UV Package Manager Quick Reference

Project Initialization

# Create new AI project
uv init my-ai-project
cd my-ai-project

# Set Python version
uv python pin 3.11

# Initialize with dependencies
uv add torch torchvision transformers pandas numpy scikit-learn
uv add --dev pytest ruff mypy black

Dependency Management

# Add ML libraries
uv add pytorch-lightning wandb
uv add langchain openai chromadb  # For RAG

# Add with version constraints
uv add "numpy>=1.24,<2.0"
uv add "torch==2.1.0"

# Add from git
uv add "git+https://github.com/org/repo.git"

# Remove dependencies
uv remove package-name

# Update all dependencies
uv sync --upgrade

# Install from lock file (reproducible)
uv sync

Virtual Environments

# Create and activate
uv venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

# Use specific Python version
uv venv --python 3.11

# With custom name
uv venv my-env

Running Scripts

# Run with UV (uses project environment)
uv run python train.py
uv run pytest tests/
uv run jupyter lab

# Run inline script
uv run --with pandas --with numpy python -c "import pandas as pd; print(pd.__version__)"

Code Generation Standards

Type Hints & Docstrings

from typing import Optional, Union, List, Dict, Tuple
import numpy as np
import torch
from pathlib import Path

def train_model(
    model: torch.nn.Module,
    train_loader: torch.utils.data.DataLoader,
    optimizer: torch.optim.Optimizer,
    epochs: int,
    device: str = "cuda",
    checkpoint_dir: Optional[Path] = None,
) -> Dict[str, List[float]]:
    """
    Train a PyTorch model with automatic checkpointing.
    
    Args:
        model: PyTorch model to train
        train_loader: DataLoader for training data
        optimizer: Optimizer instance (Adam, SGD, etc.)
        epochs: Number of training epochs
        device: Device to train on ('cuda' or 'cpu')
        checkpoint_dir: Directory to save checkpoints (optional)
    
    Returns:
        Dictionary containing training metrics:
            - 'loss': List of losses per epoch
            - 'accuracy': List of accuracies per epoch
    
    Raises:
        ValueError: If epochs < 1 or device not available
        RuntimeError: If training fails
    
    Example:
        >>> model = MyModel()
        >>> optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
        >>> metrics = train_model(model, train_loader, optimizer, epochs=10)
        >>> print(f"Final loss: {metrics['loss'][-1]:.4f}")
    """
    if epochs < 1:
        raise ValueError(f"epochs must be >= 1, got {epochs}")
    
    if device == "cuda" and not torch.cuda.is_available():
        raise ValueError("CUDA not available")
    
    model = model.to(device)
    metrics: Dict[str, List[float]] = {"loss": [], "accuracy": []}
    
    for epoch in range(epochs):
        # Training logic here
        pass
    
    return metrics

Project Structure Template

my-ai-project/
├── pyproject.toml           # UV dependencies & config
├── uv.lock                  # Lock file for reproducibility
├── .python-version          # Python version
├── README.md
├── .gitignore
├── .env.example
├── src/
│   ├── __init__.py
│   ├── models/              # Model architectures
│   │   ├── __init__.py
│   │   └── cnn.py
│   ├── data/                # Data loaders & processing
│   │   ├── __init__.py
│   │   └── dataset.py
│   ├── training/            # Training loops
│   │   ├── __init__.py
│   │   └── trainer.py
│   ├── utils/               # Helper functions
│   │   ├── __init__.py
│   │   └── logging.py
│   └── config/              # Configuration
│       ├── __init__.py
│       └── settings.py
├── tests/                   # Pytest tests
│   ├── __init__.py
│   ├── test_models.py
│   └── test_data.py
├── notebooks/               # Jupyter notebooks
│   └── exploration.ipynb
├── scripts/                 # Training/inference scripts
│   ├── train.py
│   └── inference.py
├── data/                    # Data directory (gitignored)
│   ├── raw/
│   ├── processed/
│   └── README.md
└── models/                  # Saved models (gitignored)
    └── checkpoints/

pyproject.toml Template

[project]
name = "my-ai-project"
version = "0.1.0"
description = "AI/ML project with UV"
requires-python = ">=3.11"
dependencies = [
    "torch>=2.1.0",
    "torchvision>=0.16.0",
    "transformers>=4.35.0",
    "pandas>=2.1.0",
    "numpy>=1.24.0",
    "scikit-learn>=1.3.0",
    "langchain>=0.1.0",
    "openai>=1.0.0",
    "chromadb>=0.4.0",
    "pydantic>=2.0.0",
    "python-dotenv>=1.0.0",
    "tqdm>=4.66.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "ruff>=0.1.0",
    "mypy>=1.7.0",
    "black>=23.11.0",
    "ipython>=8.17.0",
    "jupyter>=1.0.0",
]

[tool.ruff]
line-length = 100
target-version = "py311"
select = ["E", "F", "I", "N", "W", "B", "C90"]
ignore = ["E501"]

[tool.mypy]
python_version = "3.11"
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
addopts = "-v --cov=src --cov-report=html"

Common AI/ML Patterns

RAG Pipeline with Langchain

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from pathlib import Path
import os

def setup_rag_pipeline(
    documents_path: Path,
    persist_directory: Path,
    openai_api_key: str,
) -> RetrievalQA:
    """
    Set up a RAG pipeline with Langchain and Chroma.
    
    Args:
        documents_path: Path to documents directory
        persist_directory: Where to store embeddings
        openai_api_key: OpenAI API key
    
    Returns:
        Configured RetrievalQA chain
    """
    # Initialize embeddings
    embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
    
    # Create/load vector store
    vectorstore = Chroma(
        persist_directory=str(persist_directory),
        embedding_function=embeddings,
    )
    
    # Initialize LLM
    llm = ChatOpenAI(
        temperature=0,
        model_name="gpt-4",
        openai_api_key=openai_api_key,
    )
    
    # Create QA chain
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
        return_source_documents=True,
    )
    
    return qa_chain

PyTorch Training Loop

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from typing import Dict, List
from tqdm import tqdm

def train_epoch(
    model: nn.Module,
    train_loader: DataLoader,
    optimizer: torch.optim.Optimizer,
    criterion: nn.Module,
    device: str,
) -> Tuple[float, float]:
    """Train for one epoch."""
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    pbar = tqdm(train_loader, desc="Training")
    
    for batch_idx, (inputs, targets) in enumerate(pbar):
        inputs, targets = inputs.to(device), targets.to(device)
        
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
        
        # Update progress bar
        pbar.set_postfix({
            'loss': running_loss / (batch_idx + 1),
            'acc': 100. * correct / total
        })
    
    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100. * correct / total
    
    return epoch_loss, epoch_acc

Reference Documentation

Load these references as needed:

Core Libraries

  • references/pytorch-guide.md - PyTorch models, training, optimization
  • references/tensorflow-guide.md - TensorFlow/Keras patterns
  • references/sklearn-guide.md - scikit-learn pipelines, models
  • references/transformers-guide.md - Hugging Face transformers, fine-tuning

LLM & NLP

  • references/langchain-guide.md - RAG, agents, chains
  • references/openai-guide.md - OpenAI API, embeddings, chat
  • references/nlp-libraries.md - spaCy, NLTK, tokenization

Data Processing

  • references/pandas-guide.md - DataFrame operations, optimization
  • references/numpy-guide.md - Array operations, performance
  • references/data-pipelines.md - ETL, preprocessing, augmentation

Computer Vision

  • references/opencv-guide.md - Image processing, video
  • references/vision-models.md - CNN architectures, object detection
  • references/image-augmentation.md - albumentations, torchvision transforms

Production & Deployment

  • references/model-serving.md - FastAPI, TorchServe, Ray
  • references/mlops-guide.md - Experiment tracking, versioning
  • references/performance-optimization.md - Profiling, GPU optimization

UV & Dependencies

  • references/uv-advanced.md - Workspaces, monorepos, advanced features
  • references/dependency-management.md - Best practices, security

Auto-Fix Priority

Critical (Auto-Fix Immediately)

  1. Missing type hints on functions
  2. Hardcoded API keys → Environment variables
  3. Missing .gitignore for data/models
  4. No random seed setting
  5. Improper tensor device handling

High Priority (Propose & Fix)

  1. Inefficient pandas operations
  2. Missing error handling in training
  3. No experiment tracking
  4. Memory leaks in data loaders
  5. Missing data validation

Medium Priority (Recommend)

  1. Code complexity > 10
  2. Test coverage < 80%
  3. Missing docstrings
  4. Inconsistent formatting
  5. Outdated dependencies

Integration Commands

Project Setup: "Set up a new AI project with UV and PyTorch"

RAG Pipeline: "Create a RAG pipeline with langchain, Chroma, and OpenAI"

Model Training: "Generate a PyTorch training script with W&B logging"

Data Processing: "Optimize this pandas DataFrame operation"

Migration: "Migrate this project from pip to UV"

Full Audit: "Audit my AI project for best practices and performance"