What YAML Is and Where You'll Find It
YAML (YAML Ain't Markup Language — yes, it's a recursive acronym) started as a data serialization format designed for human readability. The first spec was published in 2001. It was initially pitched as a general-purpose alternative to XML, but it found its real home in configuration files.
You've almost certainly written YAML if you've used any of these:
- Docker Compose —
docker-compose.ymldefines services, volumes, and networks - Kubernetes — every manifest is YAML: Deployments, Services, ConfigMaps
- GitHub Actions —
.github/workflows/*.ymldefines CI/CD pipelines - Ansible — playbooks and inventory files
- Ruby on Rails —
database.yml,config/application.yml - Jest / ESLint / many Node.js tools — accept YAML config files
The pitch: config files are written by humans and should be readable by humans. YAML delivers on that in simple cases. It's when things get complex that the quirks start to bite.
The Indentation Rules
YAML uses indentation to represent structure. There are no braces or brackets in block style — the nesting is entirely determined by whitespace. This rule is absolute:
Tabs are forbidden. YAML only allows spaces for indentation. Mixing tabs and spaces causes a parse error. Most editors configured for YAML convert tabs to spaces automatically, but if you're copying from somewhere that uses tabs, expect cryptic errors.
The indentation level must be consistent within a block, but can change between blocks. Conventionally two spaces per level is standard.
database:
host: localhost
port: 5432
credentials:
username: app_user
password: secret
This is a mapping (YAML's term for key-value dictionary) nested three levels deep. Each level adds two spaces. The same structure in JSON would be:
{
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "app_user",
"password": "secret"
}
}
}
Sequences (arrays) use a hyphen-space prefix:
services:
- web
- worker
- scheduler
And you can mix them:
servers:
- host: web1.example.com
port: 80
tags:
- primary
- loadbalanced
- host: web2.example.com
port: 80
Data Types and the "Norway Problem"
YAML infers types automatically. Convenient, and also responsible for some legendarily painful bugs.
The basic types work as you'd expect:
name: Alice # string
age: 30 # integer
score: 9.5 # float
active: true # boolean
nothing: null # null (also ~)
The Norway problem (also called the "yes" problem) comes from YAML 1.1's over-eager boolean coercion. In YAML 1.1, yes, no, on, off, true, false (and their uppercase variants) are all booleans. This means:
country_codes:
NO: Norway # YAML 1.1 parses NO as false!
YES: Yemen # Parses as true
This caused real bugs in applications with country codes, feature flags, and config keys named things like on or off. YAML 1.2 fixed this — only true and false are booleans. But many parsers (including older versions of PyYAML) still implement 1.1 behavior.
The safe defense: quote anything that could be misinterpreted.
status: "on"
country: "NO"
feature_enabled: "true" # or just use actual true/false for booleans
Similarly, watch out for bare strings that look like other types:
version: 1.0 # float: 1.0
version: "1.0" # string: "1.0" — be explicit if you need the string
port: 8080 # integer
zip_code: 01234 # integer: 1234 (leading zero stripped!) — quote it
Block vs Flow Style
YAML supports two styles: block (the indented multi-line form) and flow (the compact inline form). Flow style uses JSON-like syntax and is valid YAML.
# Block style
colors:
- red
- green
- blue
# Flow style (valid YAML)
colors: [red, green, blue]
# Block mapping vs flow mapping
person:
name: Alice
age: 30
person: {name: Alice, age: 30}
Flow style is useful for short lists and mappings that would waste vertical space in block style. Most style guides prefer block style for readability in config files. Flow is also what YAML dumps fall back to for deeply nested structures.
Multiline Strings
YAML has two multiline string operators, and getting them confused causes subtle bugs.
The literal block scalar (|) preserves newlines exactly:
script: |
#!/bin/bash
set -e
npm install
npm test
This produces the string "#!/bin/bash\nset -e\nnpm install\nnpm test\n". The trailing newline is included by default (| behavior). Use |- to strip it.
The folded block scalar (>) replaces newlines with spaces (except blank lines, which become newlines):
description: >
This is a long description that
wraps across multiple lines for
readability in the source file.
A blank line creates a paragraph break.
This produces "This is a long description that wraps across multiple lines for readability in the source file.\nA blank line creates a paragraph break.\n". Use >- to strip the final newline.
In GitHub Actions, the | operator is ubiquitous for inline shell scripts:
- name: Run tests
run: |
npm ci
npm test
npm run lint
Anchors and Aliases
YAML lets you define a value once and reuse it elsewhere in the same file with anchors (&) and aliases (*). The merge key (<<:) extends this to merge mappings.
# Define an anchor
defaults: &defaults
timeout: 30
retries: 3
environment: production
# Use the anchor in other keys
api_service:
<<: *defaults
port: 8080
worker_service:
<<: *defaults
port: 8081
timeout: 120 # override just this field
This is particularly useful in Docker Compose for sharing common service configuration, and in GitHub Actions for reusing step definitions. The <<: key merges all keys from the referenced mapping; individual keys defined after the merge override the anchored values.
Note that anchors are a YAML-level feature — they're resolved during parsing, so what you get in your application is the fully merged result. The anchor names don't appear in the parsed data structure.
Common YAML Gotchas
Tabs causing parse errors. Most common in files copy-pasted from a source that uses tab indentation. The error message is usually something like found character that cannot start any token. Check your editor's "show whitespace" feature.
Unquoted special characters. A colon followed by a space (: ) is the key-value separator, so it must not appear unquoted inside a value. A URL like http://example.com/path is technically safe because :// has no space after the colon — but any value containing : (colon-space) anywhere will break parsing. The safest habit is to quote any value that contains a colon.
# Dangerous — colon-space inside the value breaks parsing
redirect: http://example.com/path?foo: bar
# Safe — always quote values containing colons
url: "http://example.com/path"
redirect: "http://example.com/path?foo: bar"
Accidental type coercion. Already covered — version strings, zip codes, country codes, on/off values.
Indentation-based scope bugs. A single extra space at the beginning of a line puts that key in the wrong parent object. Easy to introduce by accident, sometimes hard to spot visually.
YAML in GitHub Actions is YAML 1.1. The actions/runner uses a YAML 1.1 parser. Boolean coercion gotchas apply. Always quote values like on, off, yes, no in Actions workflows.
YAML vs JSON Equivalence
Any JSON document is valid YAML 1.2 (flow style covers all of JSON's syntax). But not all YAML is valid JSON — block style, comments, anchors, and the richer type system are all YAML-only.
The JSON to YAML tool converts between the formats in your browser. It's handy when you have a JSON config and need to migrate it to YAML, or when a Kubernetes manifest needs to round-trip to JSON for a tool that requires it. For validating the resulting JSON structure, JSON Formatter highlights syntax errors and lets you explore the tree.
For a broader look at data format tradeoffs, JSON Basics and Syntax covers the fundamentals of JSON, and XML vs JSON compares the two most common structured formats.
The official YAML specification is the authoritative reference, and the YAML 1.2 changelog documents the specific differences from 1.1 — worth reading if you're debugging type coercion issues.