Skip to content

sudoingX/tailorbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TailorBench

An open benchmark for evaluating AI understanding of garment construction.

License: MIT


The Problem

Current fashion AI benchmarks test aesthetics—whether an image "looks like" a suit.

None test construction—whether AI understands how garments are actually built.

This matters because:

  • AI generates physically impossible garments (lapels that can't roll, impossible button stances)
  • Fashion recommendations lack fit understanding
  • No standard exists for evaluating construction knowledge
  • Models trained on Pinterest can't distinguish surgeon cuffs from decorative buttons

What We Test

Category Examples Why It Matters
Identification Lapel types, vent styles, shoulder construction Basic recognition
Reasoning Why single vent suits seated clients Understanding mechanics
Error Detection Spotting impossible construction in AI images Practical application

Difficulty Levels

  • Basic — Anyone who wears suits occasionally should know this
  • Intermediate — Enthusiasts, salespeople, MTM customers
  • Expert — Tailors, pattern makers, bespoke clients

A model that scores 90% on basic but 10% on expert doesn't understand construction. It's pattern matching surface features.

Taxonomy Overview

Jackets/Blazers

  • Lapels: Type, width, gorge height, roll
  • Shoulders: Structured, natural, roped, Neapolitan
  • Construction: Full canvas, half canvas, fused
  • Details: Vents, button stance, pockets, surgeon cuffs

Trousers

  • Waist: Waistband style, closures, side adjusters
  • Pleats: Flat front, single, double, direction
  • Leg: Break, cuffs, taper

Shirts

  • Collar: Point, spread, cutaway, button-down
  • Cuffs: Barrel, French, convertible
  • Body: Placket, yoke, back pleats

See benchmark/taxonomy/ for the complete element breakdown.

Benchmark Structure

tailorbench/
├── benchmark/
│   ├── taxonomy/           # What we test
│   │   ├── construction_elements.md
│   │   ├── difficulty_levels.md
│   │   └── question_formats.md
│   ├── dataset/            # Images and questions
│   │   ├── images/         # Jacket, trouser, shirt images
│   │   └── questions/      # JSON question files
│   ├── evaluation/         # How we score
│   │   └── metrics.md
│   └── baselines/          # Model results (coming)
└── research/
    ├── gap_analysis.md     # What's missing in fashion AI
    └── sources.md          # Reference materials

Evaluation Approach

Scoring

  • Overall Accuracy: Simple correct/total
  • Weighted Accuracy: Expert questions count 3x, Intermediate 2x, Basic 1x
  • Category Breakdown: Separate scores for identification, reasoning, error detection

Question Formats

  1. Multiple choice (identification)
  2. True/False
  3. Count/Measure
  4. Error detection
  5. Open-ended reasoning
  6. Comparison
  7. Recommendation

See benchmark/evaluation/metrics.md for full scoring methodology.

Current Status

  • Taxonomy defined (50+ construction elements)
  • Evaluation framework (weighted scoring, 7 question formats)
  • Gap analysis documented
  • Dataset structure ready (images + JSON questions)
  • Dataset v0.1 population (in progress)
  • Baseline testing (GPT-4V, Gemini, Claude)
  • Public leaderboard

Why This Exists

Built by practitioners, not just researchers.

We come from production tailoring—real garments, real customers, real fit data. Fashion AI needs domain expertise that image datasets can't provide.

The gap between "looks like a suit" and "understands suit construction" is where current AI fails. This benchmark measures that gap.

Contributing

See CONTRIBUTING.md for how to help.

Priority areas:

  • Construction element additions (especially regional variations)
  • Question contributions with clear correct answers
  • Baseline testing on additional models
  • Translations (terminology varies by region)

Citation

If you use TailorBench in research:

@misc{tailorbench2025,
  title={TailorBench: A Benchmark for Garment Construction Understanding in AI},
  author={FashionX},
  year={2025},
  url={https://github.com/fashionx-ai/tailorbench}
}

License

MIT License — see LICENSE

Contact


Version 0.1 — December 2025

About

An open benchmark for evaluating AI understanding of garment construction

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published