Corpus Analysis: 10 Real-World OpenAPI Specs
Generated 2026-02-15 using only the oastools MCP server tools (parse, walk_operations, walk_schemas, walk_parameters, walk_responses, walk_security, walk_headers, walk_refs) with the group_by aggregation feature from #321.
The corpus lives in testdata/corpus/ and contains specs from major API providers spanning Swagger 2.0 through OAS 3.1.
Parse Summary
| Spec |
OAS |
Format |
Paths |
Operations |
Schemas |
Tags |
| Petstore |
2.0 |
JSON |
14 |
20 |
6 |
3 |
| Google Maps |
3.0.3 |
JSON |
17 |
17 |
75 |
9 |
| NWS Weather |
3.0.3 |
JSON |
60 |
60 |
103 |
0 |
| Asana |
3.0.0 |
YAML |
158 |
217 |
244 |
42 |
| Discord |
3.1.0 |
JSON |
137 |
227 |
492 |
0 |
| Plaid |
3.0.0 |
YAML |
334 |
324 |
2,179 |
1 |
| DigitalOcean |
3.0.0 |
YAML |
356 |
544 |
658 |
48 |
| Stripe |
3.0.0 |
JSON |
415 |
588 |
1,321 |
0 |
| GitHub |
3.0.3 |
JSON |
720 |
1,078 |
904 |
46 |
| MS Graph |
3.0.4 |
YAML |
10,405 |
16,098 |
4,294 |
457 |
| TOTALS |
|
|
12,616 |
19,173 |
10,276 |
|
The corpus spans 3 orders of magnitude in every dimension (20 to 16,098 operations), covers all major OAS versions (2.0, 3.0.0, 3.0.3, 3.0.4, 3.1.0), and includes both JSON (6 specs) and YAML (4 specs).
Operations by Method
Using walk_operations with group_by=method:
| Spec |
GET |
POST |
PUT |
PATCH |
DELETE |
| Petstore |
8 |
7 |
2 |
- |
3 |
| Google Maps |
16 |
1 |
- |
- |
- |
| NWS Weather |
60 |
- |
- |
- |
- |
| Asana |
101 |
75 |
21 |
- |
20 |
| Discord |
97 |
42 |
20 |
31 |
37 |
| Plaid |
2 |
322 |
- |
- |
- |
| DigitalOcean |
283 |
110 |
55 |
12 |
84 |
| Stripe |
263 |
293 |
- |
- |
32 |
| GitHub |
568 |
171 |
112 |
61 |
166 |
| MS Graph |
8,473 |
3,361 |
179 |
1,976 |
2,109 |
Patterns
- NWS is read-only: all 60 operations are GET. A pure data-retrieval API.
- Plaid is POST-only (99.4%): the RPC-over-HTTP pattern where every endpoint is a command, not a resource operation.
- Stripe leans POST > GET (293 vs 263): payment domain is write-heavy. Zero PUT/PATCH -- all mutations use POST.
- MS Graph favors PATCH over PUT (1,976 vs 179): OData convention for partial updates.
- GitHub uses all 5 methods with the widest distribution, reflecting classic REST design.
Component Schema Types
Using walk_schemas with group_by=type and component=true:
| Spec |
object |
string |
array |
integer |
boolean |
number |
(none) |
nullable unions |
| Petstore |
6 |
14 |
2 |
9 |
1 |
- |
1 |
- |
| Google Maps |
64 |
161 |
63 |
15 |
25 |
44 |
74 |
- |
| NWS |
78 |
228 |
60 |
12 |
4 |
6 |
225 |
- |
| Asana |
262 |
590 |
85 |
30 |
57 |
16 |
416 |
- |
| Discord |
425 |
451 |
165 |
330 |
145 |
10 |
1,634 |
771 |
| Plaid |
1,674 |
2,978 |
567 |
317 |
211 |
384 |
3,624 |
- |
| DigitalOcean |
786 |
1,511 |
391 |
423 |
161 |
46 |
1,169 |
- |
| Stripe |
1,439 |
3,277 |
296 |
685 |
394 |
23 |
2,746 |
- |
| GitHub |
2,508 |
20,123 |
661 |
2,810 |
3,157 |
64 |
2,636 |
- |
| MS Graph |
6,660 |
7,747 |
2,835 |
4 |
1,394 |
1,018 |
10,007 |
- |
Patterns
- Discord is the only OAS 3.1 spec and uniquely shows nullable union types (
string, null: 312, integer, null: 158, etc. -- 771 total). This is the 3.1 way of expressing nullability via JSON Schema's type: [string, null] instead of 3.0's nullable: true.
- GitHub has a massive string bias (20,123 string schemas, 63% of all component schemas). Many enums and scalar properties are expanded as individual schemas.
- Typeless schemas
"" are pervasive: every OAS 3.0 spec has large counts of schemas without an explicit type. These are typically allOf/anyOf/oneOf compositions or $ref wrappers.
- MS Graph uses almost zero integers (only 4!) but 1,018
number types. Their OData convention prefers number even for ID/count fields.
Response Status Codes
Using walk_responses with group_by=status_code:
| Spec |
2xx |
3xx |
4xx |
5xx |
default |
other |
| Petstore |
9 |
- |
20 |
- |
4 |
3 |
| Google Maps |
17 |
- |
3 |
- |
- |
- |
| NWS |
59 |
3 |
- |
- |
60 |
7 |
| Asana |
217 |
- |
886 |
218 |
- |
34 |
| Discord |
237 |
- |
454 |
- |
- |
2 |
| Plaid |
325 |
- |
1 |
- |
279 |
- |
| DigitalOcean |
544 |
- |
993 |
1,088 |
541 |
277 |
| Stripe |
588 |
- |
- |
- |
588 |
- |
| GitHub |
1,081 |
33 |
1,269 |
141 |
- |
410 |
| MS Graph |
16,098 |
- |
16,098 |
16,098 |
- |
1,157 |
Error-handling styles
- Stripe:
default catch-all -- every operation has exactly 200 + default. Simplest pattern.
- MS Graph: wildcard ranges -- uses
2XX, 4XX, 5XX on every operation. Most systematic but least specific.
- Discord: mixed wildcards --
4XX wildcard alongside exact 429. Combines range and exact codes.
- GitHub: most granular -- 25 distinct status codes including rare ones like
405, 406, 413. Best client error-handling guidance.
- Asana: exhaustive error codes -- every operation specifies
400, 401, 403, 404, 500 individually.
Parameters by Location
Using walk_parameters with group_by=location:
| Spec |
path |
query |
header |
body/formData |
(unresolved $ref) |
| Petstore |
9 |
4 |
1 |
11 |
- |
| Google Maps |
- |
77 |
- |
- |
114 |
| NWS |
44 |
73 |
2 |
- |
92 |
| Asana |
39 |
299 |
- |
- |
414 |
| Discord |
170 |
91 |
- |
- |
- |
| Plaid |
1 |
- |
2 |
- |
5 |
| DigitalOcean |
142 |
103 |
4 |
- |
719 |
| Stripe |
436 |
951 |
- |
- |
- |
| GitHub |
169 |
302 |
- |
- |
2,832 |
| MS Graph |
21,825 |
13,471 |
2,611 |
- |
15,304 |
Patterns
- Empty location
"" = unresolved $ref: parameters that reference #/components/parameters/... show empty location until resolve_refs=true is used. GitHub has 2,832 of these.
- Petstore is the only spec with
body and formData: these are Swagger 2.0 parameter locations, replaced by requestBody in OAS 3.0.
- MS Graph has 2,611 header parameters: OData-standard headers like
ConsistencyLevel, $top, $filter.
- Stripe is query-heavy (951 query params): filtering and pagination options on list endpoints.
Security Schemes
Using walk_security:
| Spec |
Scheme(s) |
Type(s) |
| Petstore |
api_key, petstore_auth |
apiKey (header), OAuth2 |
| Google Maps |
ApiKeyAuth |
apiKey (query) |
| NWS |
apiKeyAuth, userAgent |
apiKey (header) x2 |
| Asana |
oauth2, personalAccessToken |
OAuth2, HTTP Bearer |
| Discord |
BotToken, OAuth2 |
apiKey (header), OAuth2 |
| Plaid |
clientId, plaidVersion, secret |
apiKey (header) x3 |
| DigitalOcean |
bearer_auth |
HTTP Bearer |
| Stripe |
basicAuth, bearerAuth |
HTTP Basic, HTTP Bearer |
| GitHub |
(none defined) |
- |
| MS Graph |
(none defined) |
- |
Patterns
- GitHub and MS Graph define zero security schemes despite being authenticated APIs. Auth is handled outside the spec.
- Google Maps puts the API key in the query string -- the only spec to do this.
- NWS uses User-Agent as a security scheme -- creative abuse tracking via a required header.
- Plaid requires 3 simultaneous header keys (clientId + secret + plaidVersion) -- multi-key auth.
Using walk_headers with group_by=name:
| Spec |
Total Headers |
Top Header |
Occurrences |
| Discord |
1,200 |
X-RateLimit-Bucket/Limit/Remaining/Reset/Reset-After |
240 each |
| DigitalOcean |
1,069 |
ratelimit-limit/remaining/reset |
354 each |
| GitHub |
244 |
Link |
193 |
| NWS |
137 |
X-Correlation-Id / X-Request-Id / X-Server-Id |
44 each |
| Stripe |
0 |
- |
- |
| Asana |
0 |
- |
- |
| Plaid |
0 |
- |
- |
Patterns
- Discord and DigitalOcean document rate-limiting headers on every response: Discord has 5 rate-limit headers per response (1,200 total across 240 operations).
- GitHub's
Link header appears 193 times: the HATEOAS pagination mechanism (rel="next", rel="last").
- Stripe documents zero response headers despite having rate limits in practice.
- GitHub has a casing inconsistency: both
Link and link, Location and location appear as separate header names. HTTP headers are case-insensitive, so these should be merged.
Top $ref Hotspots
Using walk_refs (top 10 per spec, ranked by reference count):
| Spec |
#1 Most-Referenced |
Count |
Type |
| Stripe |
schemas/error |
588 |
schema |
| Discord |
schemas/SnowflakeType |
554 |
schema |
| DigitalOcean |
responses/server_error |
544 |
response |
| GitHub |
responses/not_found |
487 |
response |
| Plaid |
schemas/APIClientID |
324 |
schema |
| Asana |
responses/BadRequest |
216 |
response |
| NWS |
responses/Error |
60 |
response |
| Google Maps |
schemas/LatLngLiteral |
13 |
schema |
Patterns
- Error responses dominate: the most-referenced component in 4/8 specs is an error response. This validates extracting errors into reusable
$ref components.
- Discord's SnowflakeType is referenced 554 times: their Snowflake ID system permeates every schema.
- GitHub's
owner and repo path parameters are referenced 480 and 479 times -- nearly every endpoint is scoped to a repository.
- Plaid's
APIClientID (324 refs) reflects their triple-key auth pattern embedded in every request schema.
GitHub -- Full Top 10
| Ref |
Count |
Type |
| responses/not_found |
487 |
response |
| parameters/owner |
480 |
parameter |
| parameters/repo |
479 |
parameter |
| schemas/simple-user |
399 |
schema |
| parameters/org |
330 |
parameter |
| responses/forbidden |
318 |
response |
| schemas/organization-simple-webhooks |
263 |
schema |
| schemas/simple-installation |
252 |
schema |
| parameters/per-page |
241 |
parameter |
| schemas/enterprise-webhooks |
234 |
schema |
Discord -- Full Top 10
| Ref |
Count |
Type |
| schemas/SnowflakeType |
554 |
schema |
| headers/X-RateLimit-Bucket |
239 |
header |
| headers/X-RateLimit-Limit |
239 |
header |
| headers/X-RateLimit-Remaining |
239 |
header |
| headers/X-RateLimit-Reset |
239 |
header |
| headers/X-RateLimit-Reset-After |
239 |
header |
| responses/ClientErrorResponse |
227 |
response |
| responses/ClientRatelimitedResponse |
227 |
response |
| schemas/UserResponse |
49 |
schema |
| schemas/MessageComponentTypes |
39 |
schema |
Tag Distribution
Using walk_operations with group_by=tag (top 15):
| Tag |
Operations |
| repos |
204 |
| actions |
184 |
| orgs |
104 |
| issues |
49 |
| codespaces |
48 |
| users |
47 |
| apps |
37 |
| activity |
32 |
| teams |
32 |
| packages |
27 |
| pulls |
27 |
| projects |
26 |
| dependabot |
22 |
| migrations |
22 |
| code-scanning |
21 |
| Tag |
Operations |
| GradientAI Platform |
84 |
| Databases |
69 |
| Monitoring |
61 |
| Apps |
34 |
| Kubernetes |
28 |
| Container Registries |
19 |
| Droplets |
19 |
| Container Registry |
18 |
| Firewalls |
11 |
| Uptime |
11 |
| Load Balancers |
10 |
| VPCs |
10 |
| Block Storage |
9 |
| Functions |
9 |
| Partner Network Connect |
9 |
| Tag |
Operations |
| Tasks |
27 |
| Projects |
19 |
| Goals |
12 |
| Portfolios |
12 |
| Custom fields |
8 |
| Tags |
8 |
| Users |
8 |
| Sections |
7 |
| Teams |
7 |
| Time tracking entries |
6 |
| Workspaces |
6 |
| Allocations |
5 |
| Budgets |
5 |
| Goal relationships |
5 |
| Memberships |
5 |
Corpus Fingerprint
| Dimension |
Smallest |
Largest |
Ratio |
| Operations |
Petstore (20) |
MS Graph (16,098) |
805x |
| Schemas |
Petstore (6) |
MS Graph (4,294) |
716x |
| Paths |
Petstore (14) |
MS Graph (10,405) |
743x |
| $ref targets |
Petstore (6) |
Plaid (2,049) |
341x |
| Response headers |
Stripe (0) |
Discord (1,200) |
-- |
Coverage matrix
| Dimension |
Values in Corpus |
| OAS versions |
2.0, 3.0.0, 3.0.3, 3.0.4, 3.1.0 |
| Formats |
JSON (6), YAML (4) |
| API styles |
REST (GitHub), RPC-over-HTTP (Plaid), OData (MS Graph), read-only (NWS, Google Maps) |
| Auth types |
OAuth2, Bearer, Basic, API Key (header & query), multi-key, User-Agent, undeclared |
| Error patterns |
default catch-all, wildcard ranges, exhaustive codes, mixed |