JSON vs YAML vs TOML — When to Use Which Format

Q: Why does YAML interpret 'NO' as false?

YAML 1.1 defines y, Y, yes, no, NO, true, false, on, off and many capitalizations as boolean literals. The Norwegian country code is collateral damage. YAML 1.2 narrowed booleans to true and false only, but most parsers still default to 1.1. Quote string values that might collide with keywords.

Q: Is TOML's nesting syntax really that bad?

TOML supports dotted keys (server.database.host = "...") and table arrays ([[products]]). For 1-2 levels, dotted keys are clean. Beyond that, the syntax gets verbose — every nested table needs its own [a.b.c] header. YAML's indentation is genuinely better for deeply nested data.

Q: How do I safely parse YAML in Python?

Use yaml.safeload(), never yaml.load(). The unsafe loader can construct arbitrary Python objects, which is a remote code execution vulnerability. The safe loader returns plain dicts, lists, and scalars. Same in Ruby: YAML.safeload.

A team once spent an afternoon debugging a Kubernetes manifest because someone wrote country: NO and YAML turned Norway's country code into the boolean false. That's a real category of bug — and it's the reason picking the right config format matters more than developers admit. JSON, YAML, and TOML look interchangeable on the surface, but their syntax models, tooling, and footguns are all different. Pick wrong and you'll be debugging whitespace at midnight.

What Each Format Optimizes For

flowchart LR
  subgraph JSON["JSON (2001)"]
    J1["Optimized for<br/>machines"]
    J2["No comments<br/>No trailing commas<br/>Strict, ~16-page spec"]
  end
  subgraph YAML["YAML (2001-2003)"]
    Y1["Optimized for<br/>human readability"]
    Y2["Indentation matters<br/>Comments + anchors<br/>Implicit type coercion"]
  end
  subgraph TOML["TOML (2013)"]
    T1["Optimized for<br/>config files"]
    T2["Sections + key=value<br/>Comments<br/>Flat, line-oriented"]
  end
  classDef j fill:#1f1f1f,stroke:#60a5fa,color:#e4e4e4;
  classDef y fill:#1f1f1f,stroke:#fb923c,color:#e4e4e4;
  classDef t fill:#1f1f1f,stroke:#4ade80,color:#e4e4e4;
  class JSON j
  class YAML y
  class TOML t

The three formats were designed for different audiences and you can read their priorities right off the spec.

JSON was extracted from JavaScript in 2001 as a wire format for APIs. Its priority is machine readability and unambiguous parsing — every modern language ships a JSON parser in stdlib, and the spec (RFC 8259) is about 16 pages. No comments, no trailing commas, minimal type ambiguity.

YAML (YAML Ain't Markup Language) emerged 2001-2003 with a different goal: human readability and editability. Indentation is meaningful, comments are allowed, and the same data structures as JSON can be expressed with less syntactic noise. YAML 1.2 is technically a superset of JSON, but the YAML world rarely uses that fact directly.

TOML (Tom's Obvious, Minimal Language), created by Tom Preston-Werner in 2013, was a reaction to YAML's complexity. The toml.io tagline is "config file format for humans" — but unlike YAML it's flat, line-oriented, and unambiguous. Cargo (Rust), pyproject.toml (Python), and Hugo all picked it for that reason.

Syntax Side-by-Side

The same data, expressed in all three formats. A config for an imagined web service:

{
  "service": {
    "name": "api-gateway",
    "port": 8080,
    "tls": true
  },
  "database": {
    "host": "localhost",
    "port": 5432,
    "max_connections": 50
  },
  "tags": ["production", "us-east", "v2"]
}

In YAML, the same structure is meaningfully shorter because indentation replaces braces and arrays use dashes:

service:
  name: api-gateway
  port: 8080
  tls: true

database:
  host: localhost
  port: 5432
  max_connections: 50

tags:
  - production
  - us-east
  - v2

In TOML, the same structure uses bracketed sections and key = value pairs — no nesting through indentation:

[service]
name = "api-gateway"
port = 8080
tls = true

[database]
host = "localhost"
port = 5432
max_connections = 50

tags = ["production", "us-east", "v2"]

YAML wins on character count for shallow structures. JSON wins on parse simplicity. TOML wins on "where am I in this file" — every section header tells you which scope you're editing.

Flip between formats with the JSON to YAML converter, validate either with the JSON Formatter and YAML Validator.

The Norway Problem and Other YAML Footguns

YAML's design choice to infer types from unquoted values is the source of an entire genre of production bugs. The most famous is the Norway problem:

countries:
  - GB
  - IE
  - NO  # Norway? Or boolean false?
  - SE

In YAML 1.1, NO, Yes, On, and Off parse as booleans. YAML 1.2 removed most of these, but many parsers still default to 1.1 behavior — including older PyYAML and Ruby parsers. The fix is to quote: "NO". The deeper fix is to never trust unquoted strings for anything that might collide with a YAML keyword.

It gets worse. Version numbers like 1.10 parse as the float 1.1 unless quoted. 09:00 may parse as a sexagesimal number depending on parser version. MAC addresses, hex strings, dates, and times all have type-coercion edge cases.

YAML's anchors and aliases feature is useful for DRY config but a known DoS vector. The billion laughs attack uses recursive aliases to make a tiny file expand to gigabytes in memory. Most modern parsers ship a "safe" loader that disables the dangerous features — but you have to remember to use it. In Python: yaml.safe_load() not yaml.load().

JSON's strictness is occasionally annoying (no trailing commas, no comments) but it never silently changes the meaning of your data.

Comments — TOML and YAML Have Them, JSON Doesn't

JSON's biggest weakness for human-edited files is the lack of comments. The original spec excluded them because comments tend to grow into hidden parser directives that break interoperability. For a wire format that's correct. For a config file it's painful.

YAML uses # for comments:

# Production database
database:
  host: db.prod.internal
  port: 5432
  # Connection pool tuned for peak load
  max_connections: 50

TOML uses the same #:

# Production database
[database]
host = "db.prod.internal"
port = 5432
# Connection pool tuned for peak load
max_connections = 50

Workarounds exist but each is messy. JSONC is used by VS Code's settings.json but isn't interoperable — generic JSON parsers reject it. JSON5 is a more aggressive superset (comments, trailing commas, unquoted keys) but requires a non-standard parser. Wrapping _comment keys pollutes the data. If a human will edit the file regularly, comments aren't optional — pick YAML or TOML.

Schema and Validation Support

JSON wins decisively here. JSON Schema is a mature standard with broad language support and tooling built into most modern IDEs. You can validate, generate types, and get editor autocomplete from a single schema file. Try the JSON Schema Validator or convert structures with JSON to TypeScript.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "port": { "type": "integer", "minimum": 1, "maximum": 65535 },
    "tls": { "type": "boolean" }
  },
  "required": ["port"]
}

YAML can borrow JSON Schema since YAML is a semantic JSON superset — but tooling is less polished. You typically parse YAML to JSON, then validate. CUE and Dhall offer more YAML-native validation but neither has reached mainstream adoption.

TOML has no equivalent schema language. Ecosystems like Cargo enforce structure through code, but there's no general-purpose toml-schema. For machine-validated configs, JSON Schema is the only real option.

Tooling and Language Coverage

All three have parsers in every major language, but quality varies.

JSON is universal — stdlib in every language, streaming parsers exist for huge files. Browsers ship JSON.parse() natively. See our JSON Lines explainer for streaming .jsonl files.

YAML has good Python and Ruby support (first-class in those ecosystems), good Go support (Kubernetes' yaml.v3 library is solid), and adequate JavaScript support. The catch: 1.1 vs 1.2 parser inconsistencies mean a file may behave differently across languages. The Norway problem exists for exactly this reason.

TOML is best in Rust, excellent in Python (tomllib in 3.11+), and serviceable in Go and Node. JS and Java support is fine but rarely first-class.

For multi-language configs, JSON has the fewest surprises. Kubernetes/Ansible: YAML is forced. Rust/Python projects: TOML is excellent.

Performance Characteristics

JSON parses an order of magnitude faster than YAML. For configs loaded once at startup the difference is irrelevant; for hot pipelines it dominates.

Configs are small enough that performance rarely matters — but for megabytes of data, the difference is real.

JSON is fastest by a wide margin: native parsers do 1-5 GB/sec on modern hardware; SIMD parsers (simdjson) push further.

YAML is significantly slower — typically 10-100x slower than JSON. Implicit typing, anchors, multi-document support, and Unicode normalization all cost cycles. PyYAML in pure Python is famously slow; libyaml-backed parsers are 5-10x faster but still trail JSON.

TOML is in between — slower than JSON because of the type system, faster than YAML because the grammar is line-oriented.

For runtime config loaded once at startup, none of this matters. For pipelines re-parsing the same file repeatedly, cache the parsed result.

Decision Matrix

flowchart TD
  Start([What's the use case?])
  Wire{API request /<br/>response, machine<br/>generated?}
  Forced{Framework /<br/>tool dictates<br/>format?}
  Format[Use whatever<br/>it requires]
  Comments{File edited<br/>by humans, needs<br/>comments?}
  Nested{Deeply nested<br/>structure or<br/>anchors needed?}
  Schema{Need machine<br/>schema validation?}
  J[JSON +<br/>JSON Schema]
  Y[YAML]
  T[TOML]
  J2[JSON Lines /<br/>plain JSON]
  Start --> Wire
  Wire -- yes --> J2
  Wire -- no --> Forced
  Forced -- yes --> Format
  Forced -- no --> Comments
  Comments -- no --> Schema
  Comments -- yes --> Nested
  Nested -- yes --> Y
  Nested -- no --> T
  Schema -- yes --> J
  Schema -- no --> T
  classDef j fill:#1f1f1f,stroke:#60a5fa,color:#e4e4e4;
  classDef y fill:#1f1f1f,stroke:#fb923c,color:#e4e4e4;
  classDef t fill:#1f1f1f,stroke:#4ade80,color:#e4e4e4;
  class J,J2 j
  class Y y
  class T t

Use case	Best choice	Why
REST API request/response	JSON	Universal, fast, every client handles it
Kubernetes manifest, GitHub Actions, Ansible	YAML	Forced by the ecosystem
Cargo.toml, pyproject.toml, Hugo config	TOML	Forced by the ecosystem
App config edited by humans	TOML	Comments, sections, no whitespace traps
Cross-language schema-validated config	JSON + JSON Schema	Only mature schema tooling
Multi-document streaming (logs, events)	JSON Lines	Newline-delimited, easy to grep
Heavily nested data with comments	YAML	TOML nesting gets clunky beyond 2 levels
Config under 50 lines, flat structure	TOML	Most readable for humans
Config feeding a Rust/Python tool	TOML	Native parser, rich ecosystem

A practical heuristic: machine-generated → JSON. Human edits more than once a quarter → TOML or YAML. Framework dictates (Kubernetes = YAML, Cargo = TOML) → use that.

For deeper dives, see JSON Basics and Syntax, YAML Explained, and the TOML Config Format. To convert quickly, the JSON to YAML tool handles round-trips.

FAQ

Is YAML actually a superset of JSON?

YAML 1.2 is, semantically — every valid JSON document is also valid YAML 1.2. But many parsers default to 1.1 behavior, where type coercion rules differ enough to break edge cases. Treat them as separate formats with overlapping syntax.

Why does YAML interpret 'NO' as false?

YAML 1.1 defines y, Y, yes, no, NO, true, false, on, off and many capitalizations as boolean literals. The Norwegian country code is collateral damage. YAML 1.2 narrowed booleans to true and false only, but most parsers still default to 1.1. Quote string values that might collide with keywords.

Can I add comments to JSON?

Not in standard JSON. JSONC (VS Code's settings.json) and JSON5 add comment support, but the resulting files aren't valid JSON and stdlib parsers reject them. If you need comments, pick TOML or YAML.

When should I prefer TOML over YAML?

When the config is mostly key-value pairs with shallow nesting and a human edits it. TOML's section headers make large files easier to navigate than indentation-only YAML. Pick YAML when you need deep nesting (more than 2-3 levels) or anchors for DRY config.

Is TOML's nesting syntax really that bad?

TOML supports dotted keys (server.database.host = "...") and table arrays ([[products]]). For 1-2 levels, dotted keys are clean. Beyond that, the syntax gets verbose — every nested table needs its own [a.b.c] header. YAML's indentation is genuinely better for deeply nested data.

What about JSON5 and HJSON?

Both are JSON supersets with comments, trailing commas, and unquoted keys. They have parser libraries but aren't standard JSON — generic parsers reject them. Use inside a single tool that bundles its own parser (VS Code, some build tools), but don't ship them as wire formats.

How do I safely parse YAML in Python?

Use yaml.safe_load(), never yaml.load(). The unsafe loader can construct arbitrary Python objects, which is a remote code execution vulnerability. The safe loader returns plain dicts, lists, and scalars. Same in Ruby: YAML.safe_load.

Which format is fastest to parse?

JSON, by an order of magnitude. SIMD-accelerated parsers reach multiple GB/sec. YAML is slowest because of type inference, anchors, and multi-document support. TOML sits in the middle. For configs loaded once at startup, the difference is irrelevant; for re-parsing pipelines, JSON is the only sensible choice.