**CVAT-YOLO Workflow Documentation** *AI-Powered PDF Book Processing Pipeline* # Welcome to the CVAT-YOLO Workflow System Transform thousands of PDF pages into AI training data in minutes! This system combines cloud storage (S3), computer vision annotation (CVAT), and AI object detection (YOLO) into one smooth workflow. **What you'll be able to do:** - Process entire PDF books automatically - Use AI to detect objects (LEGO parts, components, etc.) - Review and correct AI predictions efficiently - Build training datasets for better AI models !!! TIP The playground environment is pre-configured and ready to use. Just set your credentials and start processing! # Quick Start: Your First Complete Workflow Loop Ready to process your first batch of images? Let's go from raw PDFs to AI-validated training data in under 30 minutes! ## Prerequisites (One-Time Setup) ```bash # Set your credentials as environment variables (playground defaults shown) export AWS_ACCESS_KEY_ID="your_access_key" export AWS_SECRET_ACCESS_KEY="your_secret_key" export CVAT_TOFU_PASSWORD="your_password" # Optional: Save to .env file for permanent setup cp .env.example .env # Edit .env with your credentials ``` That's it! The system will use these credentials automatically. ## Step-by-Step Guided Tour ### Step 1: Launch Control Center ```bash python main_menu.py ``` You'll see a menu with 6 options. Let's use them in order! ### Step 2: Create Tasks from Your Images Select **Option 1** - The script automatically finds your images in S3 and creates CVAT tasks. ``` Creating task: 10295-1_1of2 (42 images) ✅ Task created successfully ``` ### Step 3: Run AI Detection (GPU Server) ```bash ssh tofu@yolo.beantip.ca python main_menu.py # Select Option 3 ``` The AI automatically finds your new tasks and runs object detection: ``` Processing task: 10295-1_1of2 Running YOLO on 42 images... ✅ Uploaded results to S3 ``` ### Step 4: Import AI Results Back on your local machine, select **Option 4**. The AI predictions are automatically imported into CVAT: ``` Processing: 10295-1_1of2.zip ✅ Annotations imported to CVAT ``` ### Step 5: Review & Fix in CVAT Open `https://cvat.lasey.beantip.ca` in your browser and review the AI predictions. Fix any errors, then mark the task as "validation completed". ### Step 6: Export Training Data Select **Option 5** to export your validated annotations: ``` Exporting validation completed tasks... ✅ Training data synced to S3 ``` **That's it!** You've completed your first workflow cycle. The validated data is now ready to train a better AI model. # Understanding the Workflow Components ## System Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ PDF Books │────▶│ S3 Storage │────▶│ CVAT │ │ (Source) │ │ (Images) │ │ (Annotation) │ │ │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ YOLO Model │────▶│ Human Review │ │ (Inference) │ │ (Validation) │ │ │ │ │ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ │ │ Training Data │ │ (YOLO Format) │ │ │ └─────────────────┘ ``` ## Core Components Explained ### CVAT (Computer Vision Annotation Tool) - Web-based annotation platform - Manages tasks, jobs, and annotations - Provides UI for human review and correction - Integrates with cloud storage (S3) ### S3 Storage Structure ``` lasey-media-drop/ # Default bucket ├── CVAT/ │ ├── images/ # Source images from PDFs │ │ ├── 10295-1_1of2_0001.jpg │ │ ├── 10295-1_1of2_0002.jpg │ │ └── ... │ └── yolo_annotations/ # AI-generated annotations │ ├── 10295-1_1of2.zip │ └── ... └── yolo/ # Training datasets ├── train/ │ ├── images/ │ └── labels/ ├── val/ │ ├── images/ │ └── labels/ └── test/ ├── images/ └── labels/ ``` ### YOLO Model - Object detection AI model - Trained on LEGO parts or other objects - Default model: `legojuly2025.pt` - Generates bounding box predictions ### File Naming Convention ``` {bookID}_{partInfo}_{pageNumber}.jpg Examples: 10295-1_1of2_0001.jpg │ │ │ └─ Page number (4 digits) │ │ └─────── Part info (1of2, 2of2, etc.) │ └──────────── Book/Set ID └─────────────────── Separator underscore ``` # Frequently Asked Questions (FAQ) ## General Questions **Q: What is this system for?** A: This system processes PDF books (typically LEGO instruction manuals), extracts images, uses AI to detect objects (parts), and manages the annotation and validation workflow for training better AI models. **Q: Do I need programming knowledge?** A: No! Just basic command-line skills. The main menu guides you through everything. **Q: What credentials do I need?** A: Three environment variables: - `AWS_ACCESS_KEY_ID` - Your S3 access key - `AWS_SECRET_ACCESS_KEY` - Your S3 secret key - `CVAT_TOFU_PASSWORD` - Your CVAT password (username is `tofu` for playground) ## Setup and Configuration **Q: How do I set credentials?** A: Best practice - set environment variables: ```bash export AWS_ACCESS_KEY_ID="your_key" export AWS_SECRET_ACCESS_KEY="your_secret" export CVAT_TOFU_PASSWORD="your_password" ``` For permanent setup, add them to `.env` file: ```bash cp .env.example .env nano .env # Add your credentials ``` **Q: Can I use a different CVAT server or S3 endpoint?** A: Yes! Just set additional environment variables: ```bash export CVAT_URL=https://your-cvat-server.com export S3_ENDPOINT=your-s3-endpoint.com export S3_BUCKET=your-bucket-name ``` **Q: What if I don't have GPU access?** A: The inference step (Option 3) requires GPU. You can: 1. Use the provided GPU server (yolo.beantip.ca) 2. Skip inference and manually annotate in CVAT 3. Set up your own GPU environment ## Workflow Questions **Q: What order should I run the scripts?** A: Follow this sequence: 1. Create tasks (Option 1) 2. Run inference (Option 3) - on GPU server 3. Upload results (Option 4) 4. Manual review in CVAT web 5. Export for training (Option 5) **Q: Can I skip the AI inference step?** A: Yes, you can manually annotate everything in CVAT, but AI inference saves significant time by providing initial predictions. **Q: How do I know which tasks need processing?** A: Use Option 2 (Task Manager) to list all tasks and their current status. Tasks in "new" state need processing. **Q: What happens if inference fails?** A: Check: - GPU server has S3 mounted at `/mnt/s3.lasey.beantip.ca` - YOLO model file exists (`legojuly2025.pt`) - Images are properly formatted JPEGs - Task is in correct state ("new" with "annotation" stage) ## Data Management **Q: How are images organized in S3?** A: Images go in `CVAT/images/` with naming: `{bookID}_{part}_{page}.jpg` **Q: What's the difference between public and private S3 endpoints?** A: - Public endpoint (`s3.lasey.beantip.ca`): For general access - Private endpoint (`vmbr1.s3.lasey.beantip.ca`): Used by CVAT for cloud storage **Q: How do I check data consistency?** A: Use Option 6 (Consistency Checker) to find: - Images without labels - Labels without images - Training data issues **Q: Can I delete tasks and start over?** A: Yes, use Option 2 (Task Manager) to delete tasks. Be careful - this is permanent! ## Training and Export **Q: What format does YOLO training need?** A: YOLO requires: - Images in `images/` directory - Labels in `labels/` directory (same filenames, .txt extension) - Each label file contains: `class_id x_center y_center width height` (normalized 0-1) **Q: How is data split for training?** A: Default split: - 70% training - 20% validation - 10% test This is handled automatically by the export script. **Q: Where does training data go?** A: Exported to S3 bucket under `yolo/train/`, `yolo/val/`, and `yolo/test/` directories. # Advanced Usage ## Environment Variables The system is configured entirely through environment variables for smooth automation: ### Primary Credentials (Required) ```bash # These three are essential for playground operation export AWS_ACCESS_KEY_ID="your_access_key" export AWS_SECRET_ACCESS_KEY="your_secret_key" export CVAT_TOFU_PASSWORD="your_password" ``` ### Setting Credentials Permanently **Method 1: .env File (Best for Development)** ```bash cp .env.example .env nano .env # Add your credentials ``` **Method 2: Shell Profile (Best for Production)** ```bash # Add to ~/.bashrc or ~/.zshrc export AWS_ACCESS_KEY_ID="your_key" export AWS_SECRET_ACCESS_KEY="your_secret" export CVAT_TOFU_PASSWORD="your_password" ``` **Method 3: System Environment (Best for CI/CD)** ```bash # Add to /etc/environment or systemd service files AWS_ACCESS_KEY_ID=your_key AWS_SECRET_ACCESS_KEY=your_secret CVAT_TOFU_PASSWORD=your_password ``` ## Complete Environment Variable Reference | Variable | Default | Description | |----------|---------|-------------| | **CVAT_URL** | `https://cvat.lasey.beantip.ca` | CVAT server URL | | **CVAT_USERNAME** | `tofu` | CVAT username | | **CVAT_TOFU_PASSWORD** | (none) | CVAT password (prompted if not set) | | **S3_ENDPOINT** | `s3.lasey.beantip.ca` | Public S3 endpoint | | **S3_PRIVATE_ENDPOINT** | `vmbr1.s3.lasey.beantip.ca` | Private S3 endpoint for CVAT | | **S3_BUCKET** | `lasey-media-drop` | S3 bucket name | | **AWS_ACCESS_KEY_ID** | (none) | AWS access key | | **AWS_SECRET_ACCESS_KEY** | (none) | AWS secret key | ## Understanding the Guided Tour Steps ### What Really Happens in Each Step **Step 2: Create Tasks from S3 Images** - Scans S3 bucket for images in `CVAT/images/` directory - Groups images by book ID pattern (e.g., all `10295-1_1of2_*.jpg`) - Creates CVAT task for each group with "partlist" label - Sets job state to "new" and stage to "annotation" for inference pipeline **Step 3: Run AI Inference** - Queries CVAT for tasks with jobs in "new" state AND "annotation" stage - Mounts S3 directly at `/mnt/s3.lasey.beantip.ca` (no download needed!) - Runs YOLO model (`legojuly2025.pt`) on images - Generates YOLO format annotations (.txt files) - Packages as ZIP and uploads to S3 cloud storage **Step 4: Import AI Results** - Scans S3 for YOLO annotation ZIP files - Matches ZIPs to CVAT tasks by name pattern - Imports annotations directly from cloud storage URL - Updates task with AI predictions ready for review **Step 5: Manual Review** - Human validates AI predictions in CVAT web interface - Fixes incorrect bounding boxes, adds missed objects - Changes task status to "validation completed" when done **Step 6: Export for Training** - Finds all "validation completed" tasks - Exports as YOLO format labels (no images needed) - Splits into train/val/test (70/20/10 ratio) - Syncs to S3 `yolo/` directory structure ## Script Deep Dive ### poomer-cvat-create-tasks-from-s3.py **Purpose**: Scans S3 and creates CVAT tasks for image sets **Key Features**: - Automatic image grouping by book ID pattern - Skips existing tasks to avoid duplicates - Configures "partlist" label automatically - Sets initial job state to "new" for inference pipeline **Advanced Options**: ```python # Customize in script: TARGET_STATE = "new" # Initial job state LABEL_NAME = "partlist" # Object class name ``` ### poomer-run-inference-on-s3-jpegs-and-write-out-zips.py **Purpose**: Runs YOLO inference on GPU server **Key Features**: - Auto-detects tasks ready for inference - Direct S3 mount access (no download needed) - Batch processing for efficiency - Confidence threshold: 0.25 - IoU threshold: 0.45 **Requirements**: - GPU server with CUDA support - S3 mounted at `/mnt/s3.lasey.beantip.ca` - YOLO model file available **Customization**: ```python # Modify inference parameters: model.predict( source=image_pattern, imgsz=640, # Image size conf=0.25, # Confidence threshold iou=0.45, # IoU threshold device='cuda:0' # GPU device ) ``` ### poomer-validation-completed-to-yolo-txt.py **Purpose**: Exports validated annotations for training **Key Features**: - Filters tasks by validation status - Smart train/val/test splitting - Handles missing images gracefully - Bulk S3 sync for efficiency - Detailed analysis reports **Data Split Logic**: ```python # Default split ratios: TRAIN_RATIO = 0.7 VAL_RATIO = 0.2 TEST_RATIO = 0.1 # Custom split by filename pattern: # Files ending in 0,1,2,3,4,5,6 → train # Files ending in 7,8 → val # Files ending in 9 → test ``` ### poomer-cvat-task-manager.py **Purpose**: Advanced task management interface **Operations**: - List tasks with filtering - Bulk delete operations - Status updates - Job state modifications - Export task data **Filter Examples**: ```bash # List only completed tasks Filter: status=completed # List tasks matching pattern Filter: name=10295* # List large tasks Filter: size>100 ``` ### poomer-yolo-consistency-checker.py **Purpose**: Validates dataset quality **Checks Performed**: 1. **Image-label pairing**: Every image should have a label file 2. **Label orphans**: Label files without images 3. **Format validation**: Correct YOLO annotation format 4. **Class ID validation**: Valid class indices 5. **Coordinate validation**: Bounding boxes within 0-1 range **Output Example**: ``` Checking dataset consistency... ✅ train/images: 1,234 files ✅ train/labels: 1,234 files ⚠️ val/images: 342 files ❌ val/labels: 338 files (4 missing) Missing labels for: - image_0234.jpg - image_0567.jpg - image_0891.jpg - image_1023.jpg ``` ## Automation and Scripting ### Batch Processing Multiple Books Create a script to process multiple books: ```bash #!/bin/bash # batch_process.sh # List of book IDs to process BOOKS="10295-1 10296-1 10297-1" for BOOK in $BOOKS; do echo "Processing $BOOK..." # Create tasks python -c " from poomer-cvat-create-tasks-from-s3 import * create_tasks_for_pattern('${BOOK}*') " # Run inference (on GPU server) ssh tofu@yolo.beantip.ca " cd /path/to/project python -c ' from poomer-run-inference-on-s3-jpegs-and-write-out-zips import * process_task(\"${BOOK}\") ' " # Upload results python poomer-post-inference-yolozip-batch-uploader-to-cvat.py echo "✅ Completed $BOOK" done ``` ### Monitoring Task Progress Create a monitoring script: ```python #!/usr/bin/env python3 # monitor_progress.py from common import get_cvat_client def get_task_statistics(): client = get_cvat_client() stats = { 'new': 0, 'in_progress': 0, 'completed': 0, 'validation': 0 } page = 1 while True: tasks_page, _ = client.api_client.tasks_api.list(page=page) if not tasks_page.results: break for task in tasks_page.results: if task.status in stats: stats[task.status] += 1 if not tasks_page.next: break page += 1 return stats if __name__ == "__main__": stats = get_task_statistics() print("Task Statistics:") for status, count in stats.items(): print(f" {status}: {count}") ``` ## Performance Optimization ### S3 Access Optimization 1. **Use S3 mount for large operations**: ```bash # Mount S3 using goofys or s3fs goofys lasey-media-drop /mnt/s3 ``` 2. **Batch operations**: ```python # Instead of individual uploads for file in files: upload_to_s3(file) # Use batch sync subprocess.run(['s3cmd', 'sync', '--delete-removed', 'local_dir/', 's3://bucket/dir/']) ``` 3. **Parallel processing**: ```python from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor(max_workers=5) as executor: executor.map(process_task, tasks) ``` ### CVAT API Optimization 1. **Pagination handling**: ```python # Request larger pages page_size = 100 # Default is often 10 tasks_page, _ = client.api_client.tasks_api.list( page=1, page_size=page_size ) ``` 2. **Caching client connections**: ```python # Reuse client instance _client = None def get_cached_client(): global _client if _client is None: _client = get_cvat_client() return _client ``` ### YOLO Inference Optimization 1. **Batch size tuning**: ```python # Process multiple images at once results = model.predict( source=images, batch=16, # Adjust based on GPU memory imgsz=640 ) ``` 2. **Model optimization**: ```python # Use TensorRT for faster inference model.export(format='engine') model = YOLO('model.engine') ``` 3. **Multi-GPU processing**: ```python # Distribute across GPUs device = 'cuda:0,1' # Use multiple GPUs model.predict(source=images, device=device) ``` ## Troubleshooting ### Common Issues and Solutions **Problem: "CVAT password incorrect"** - Solution: Reset password in `.env` or delete environment variable - Check: Username matches (default: `tofu`) **Problem: "S3 access denied"** - Solution: Verify AWS credentials are correct - Check: Bucket permissions and endpoint URL **Problem: "No GPU available for inference"** - Solution: Run on GPU server or use CPU mode (slower) - Check: CUDA installation and driver versions **Problem: "Task not found in CVAT"** - Solution: Verify task was created successfully - Check: Task name matches expected pattern **Problem: "Images not found during inference"** - Solution: Ensure S3 is properly mounted - Check: Image paths and naming convention ### Debug Mode Enable verbose output for troubleshooting: ```python # In any script, add: import logging logging.basicConfig(level=logging.DEBUG) # Or set environment variable: export DEBUG=1 ``` ### Log Analysis Check logs for detailed error information: ```bash # CVAT logs docker logs cvat_server # S3 sync logs s3cmd --debug sync ... # Python script logs python script.py 2>&1 | tee debug.log ``` # Best Practices ## Data Management 1. **Regular Backups**: Always backup validated annotations before bulk operations 2. **Consistent Naming**: Stick to the established naming convention 3. **Version Control**: Tag YOLO models with training date and dataset version 4. **Quality Checks**: Run consistency checker before training ## Workflow Efficiency 1. **Batch Processing**: Group similar books/tasks together 2. **Parallel Review**: Multiple annotators can work on different tasks simultaneously 3. **Progressive Training**: Train incrementally as more data becomes available 4. **Smart Sampling**: Focus manual review on low-confidence predictions ## Security 1. **Credential Management**: Never commit credentials to version control 2. **Access Control**: Use read-only credentials where possible 3. **Audit Trail**: Keep logs of all operations 4. **Data Privacy**: Ensure compliance with data handling policies ## Collaboration 1. **Task Assignment**: Use CVAT's assignee feature for team coordination 2. **Annotation Guidelines**: Document labeling standards for consistency 3. **Review Process**: Implement peer review for critical datasets 4. **Communication**: Use task comments in CVAT for context # Conclusion You've now learned the complete CVAT-YOLO workflow system! From this foundation, you can: - Process hundreds of PDF books efficiently - Train increasingly accurate YOLO models - Build custom automation for your specific needs - Scale the system for production use Remember: Start small with one book, master the workflow, then scale up. The system is designed to grow with your expertise. For additional support: - Check the script help: `python script_name.py --help` - Review error messages carefully - they often contain solutions - Keep this documentation handy as a reference Happy annotating! --- *Documentation Version 1.0* *Last Updated: 2025*