Schema Deduplication
Demonstrates schema deduplication strategies when merging OpenAPI specifications using the joiner package.
What You'll Learn
- How to identify structurally identical schemas across documents
- Using
deduplicate-equivalentstrategy for same-named schema collisions - Using
semantic-deduplicationfor different-named equivalent schemas - Understanding when each approach applies
Prerequisites
- Go 1.24+
Quick Start
Expected Output
Schema Deduplication Strategies
================================
Scenario: Both APIs have error schemas with IDENTICAL structure
- users-api.yaml: UserError {code, message, details}
- products-api.yaml: ProductError {code, message, details}
These are structurally equivalent but have different names.
[1/3] Baseline: No deduplication
-----------------------------------------
Result: Success
Schemas in merged doc: [Product ProductError User UserError]
Note: Both UserError and ProductError exist in output.
This is wasteful since they're structurally identical!
[2/3] Strategy: deduplicate-equivalent
-----------------------------------------
This strategy handles SAME-named collisions.
When two schemas named 'Error' collide:
- If structurally equivalent -> keep one
- If different -> fail
Configuration:
SchemaStrategy: deduplicate
EquivalenceMode: deep
Use case: When teams independently define the same schema
with the same name - common with shared types like Error.
[3/3] Strategy: semantic-deduplication
-----------------------------------------
Result: Success
Schemas in merged doc: [Product ProductError User]
UserError was deduplicated to ProductError
(ProductError < UserError alphabetically)
Warnings:
- semantic deduplication: consolidated 1 duplicate schema(s)
Configuration:
SemanticDeduplication: true
The joiner identified that UserError = ProductError
and consolidated them. All $refs are automatically rewritten.
=========================================
Key Takeaway:
- deduplicate-equivalent: Merges SAME-named schemas if equivalent
- semantic-deduplication: Finds DIFFERENT-named equivalent schemas
and consolidates to canonical name
Files
| File | Purpose |
|---|---|
| main.go | Demonstrates the schema deduplication workflow |
| specs/users-api.yaml | User service spec with User and UserError schemas |
| specs/products-api.yaml | Product service spec with Product and ProductError schemas |
Key Concepts
Deduplication Strategies
| Strategy | When to Use |
|---|---|
deduplicate-equivalent |
Same-named schemas colliding (e.g., both have Error) |
semantic-deduplication |
Different-named but identical schemas (e.g., UserError vs ProductError) |
How deduplicate-equivalent Works
This is a collision strategy (SchemaStrategy) that triggers when two schemas have the same name:
- When
UserAPIandProductAPIboth defineError - The joiner compares their structure using
EquivalenceMode(shallow or deep) - If equivalent -> keeps one, discards the duplicate
- If different -> fails with collision error
How Semantic Deduplication Works
This is a post-merge optimization that finds schemas with different names but identical structure:
- After merging, scans all schemas
- Groups schemas by structural equivalence
- Selects canonical name (alphabetically first)
- Removes duplicates and rewrites all
$refpointers
Equivalence Modes
| Mode | Comparison Depth |
|---|---|
none |
Disabled (no equivalence checking) |
shallow |
Top-level properties only |
deep |
Full recursive comparison including nested schemas |
Use Cases
- Shared error schemas - Multiple microservices define identical error responses
- Common data types - Teams independently define equivalent Address, Money, or Timestamp schemas
- API consolidation - Merging specs that evolved separately but converged on structure
- Spec cleanup - Reducing schema bloat in merged specifications
Next Steps
- Multi-API Merge - Basic multi-spec merging workflow
- Breaking Change Detection - Compare merged versions
- Joiner Deep Dive - Complete joiner documentation
Generated for oastools