Back to skills
extension
Category: Development & EngineeringNo API key required

data-cleaning

Data cleaning, preprocessing, and quality assurance techniques

personAuthor: jakexiaohubgithub

Data Cleaning Skill

Overview

Master data cleaning and preprocessing techniques essential for reliable analytics.

Topics Covered

  • Missing value handling (imputation, deletion)
  • Outlier detection and treatment
  • Data type conversion and validation
  • Duplicate identification and removal
  • String cleaning and normalization

Learning Outcomes

  • Clean messy datasets
  • Handle missing data appropriately
  • Detect and treat outliers
  • Ensure data quality

Error Handling

| Error Type | Cause | Recovery | |------------|-------|----------| | Memory error | Dataset too large | Use chunking or sampling | | Type conversion failed | Invalid data format | Apply preprocessing first | | Encoding issues | Wrong character encoding | Detect and specify encoding | | Validation failure | Data doesn't meet schema | Review and adjust validation rules |

Related Skills

  • programming (for automation)
  • foundations (for data quality concepts)
  • databases-sql (for SQL-based cleaning)