How to Read and Write JSONL (JSON Lines) — The Streaming Format

How to Read and Write JSONL (JSON Lines) — The Streaming Format

If you've worked with log files, ML training datasets, or data pipelines, you've probably encountered JSONL without knowing its name. Each line is a valid JSON object, and the format solves a set of problems that regular JSON arrays handle poorly at scale.

What JSONL Actually Is

JSONL stands for JSON Lines. Each line in the file is a complete, self-contained JSON value — usually an object. Lines are separated by newlines (\n).

{"id": 1, "event": "click", "ts": "2024-01-15T10:00:00Z", "user": "alice"}
{"id": 2, "event": "view", "ts": "2024-01-15T10:00:03Z", "user": "bob"}
{"id": 3, "event": "purchase", "ts": "2024-01-15T10:01:12Z", "user": "alice", "amount": 49.99}

That's it. No wrapping array, no commas between objects, no outer brackets. Each line is independently parseable.

The format has a few aliases you'll see in the wild: NDJSON (Newline Delimited JSON), JSON Lines, and occasionally LDJSON (Line Delimited JSON). They all describe the same thing. The official JSON Lines site uses the .jsonl extension, and that's become the de facto standard file extension.

Why It Exists: The Problems with JSON Arrays

A regular JSON array looks like this:

[
  {"id": 1, "event": "click"},
  {"id": 2, "event": "view"},
  {"id": 3, "event": "purchase"}
]

Fine for small datasets. But at scale, it has real problems:

You can't stream it without parsing the whole thing. A JSON parser typically needs the complete document. To read record 500,000 from a 10GB JSON array, you have to load (or at least scan) all preceding bytes.

You can't append to it. Adding a new record means rewriting the file — at minimum, removing the closing ], adding a comma, adding the new object, and adding ] back. Atomic appends are impossible.

Tools like grep, wc, and sort can't work on individual records. Because a JSON array has structure spanning multiple lines, line-oriented Unix tools become useless for inspection.

JSONL eliminates all three. Each line is complete. You can read line by line, append with a simple file write, and grep '{"event": "purchase"' events.jsonl just works.

Reading and Writing JSONL in Python

Python makes JSONL trivial:

import json

# Writing JSONL
records = [
    {"id": 1, "event": "click", "user": "alice"},
    {"id": 2, "event": "view", "user": "bob"},
]

with open("events.jsonl", "w") as f:
    for record in records:
        f.write(json.dumps(record) + "\n")

# Reading JSONL
with open("events.jsonl", "r") as f:
    for line in f:
        line = line.strip()
        if line:  # skip empty lines
            record = json.loads(line)
            print(record)

# Appending a new record
with open("events.jsonl", "a") as f:
    f.write(json.dumps({"id": 3, "event": "purchase"}) + "\n")

The if line guard is important — trailing newlines or blank separator lines in some JSONL files will cause json.loads("") to raise an exception.

Reading and Writing JSONL in Node.js

const fs = require('fs');
const readline = require('readline');

// Reading JSONL (streaming, line by line)
async function readJsonl(filePath) {
  const fileStream = fs.createReadStream(filePath);
  const rl = readline.createInterface({ input: fileStream });

  for await (const line of rl) {
    if (line.trim()) {
      const record = JSON.parse(line);
      console.log(record);
    }
  }
}

// Writing JSONL
function writeJsonl(filePath, records) {
  const lines = records.map(r => JSON.stringify(r)).join('\n') + '\n';
  fs.writeFileSync(filePath, lines, 'utf8');
}

// Appending a single record
function appendJsonl(filePath, record) {
  fs.appendFileSync(filePath, JSON.stringify(record) + '\n', 'utf8');
}

The readline interface reads the file as a stream — memory usage stays constant regardless of file size. Try to read a 10GB JSONL file with JSON.parse(fs.readFileSync(...)) and you'll run out of memory. With readline, you process one line at a time.

Where JSONL Is Actually Used

Log files and event streams. Application logs that need structure beyond plain text are often JSONL. Each log entry is one line, parseable independently. Tools like Fluentd, Logstash, and Vector can process JSONL streams natively.

Machine learning training data. OpenAI's fine-tuning API accepts training data as JSONL. Hugging Face datasets are often distributed as JSONL. The format handles millions of examples with constant memory overhead, which matters when you're processing training sets that won't fit in RAM.

Data pipelines and ETL. When you're moving data between systems, JSONL is a natural intermediate format. It's easy to produce, easy to consume, and each record can have different fields without breaking the parser.

Database exports. MongoDB's mongoexport produces JSONL — one document per line. (mongodump produces binary BSON files, not JSONL.) JSONL is also a natural output format for tools like jq in pipeline mode.

Comparing JSONL to a JSON Array

JSONL JSON Array
Streaming-friendly Yes No
Appendable Yes No
grep-friendly Yes No
Full-file parse required No Yes
Valid JSON Each line is The whole file is
Human readability Good Good
Size Slightly smaller (no outer brackets) Slightly larger

For data interchange between two APIs, a JSON array is often fine — the payload is small and you want the whole thing at once. For anything large, append-heavy, or streaming, JSONL wins cleanly.

Gotchas to Watch For

Trailing newlines. Most well-formed JSONL files end with a newline after the last record. Some don't. Always guard against empty lines when reading, as shown in the examples above.

No standard MIME type. JSONL doesn't have an officially registered MIME type. application/x-ndjson and application/jsonl are both used in practice. Pick one and document it.

Not the same as pretty-printed JSON. A multi-line pretty-printed JSON object is not JSONL — each complete record must be on a single line. If your records span multiple lines, they're not JSONL-parseable.

Mixed types per line. JSONL doesn't require all lines to have the same schema. In practice most JSONL files are homogeneous, but there's nothing in the spec that requires it.

For inspecting the structure of JSONL records, paste a single line into our JSON Formatter to explore and validate it. If you need to convert between JSONL and tabular formats, JSON to CSV and CSV to JSON handle the conversion.

For the basics of JSON itself, see JSON Basics and Syntax. And if you're working with tabular data formats, CSV and TSV: The Universal Data Format covers the other side of the equation.

JSONL is a simple idea that solves real problems. If you're building anything that involves logs, datasets, or streaming data, it belongs in your toolkit.