URL Encoding Explained: Percent-Encoding and Why It Matters

URL Encoding Explained: Percent-Encoding and Why It Matters

URLs can only contain a limited set of characters. That's not a quirk or oversight — it's a deliberate constraint baked into the spec. When your data contains characters outside that set, percent-encoding is what bridges the gap. Understanding how it works saves you from mysterious broken links, malformed API calls, and the classic + vs %20 confusion.

Why URLs Have a Limited Character Set

A URL is transmitted as plain text across systems that were originally designed around ASCII. Characters like spaces, angle brackets, or non-Latin letters have no safe, unambiguous representation in a raw URL. The URI specification (RFC 3986) formalizes this by dividing characters into two groups:

Unreserved characters — always safe, never encoded: A-Z a-z 0-9 - _ . ~

Reserved characters — have special meaning in URL syntax: : / ? # [ ] @ ! $ & ' ( ) * + , ; =

Everything else — spaces, accented characters, emoji, control characters — must be encoded before it appears in a URL.

How %XX Actually Works

Percent-encoding represents a byte as a percent sign followed by two hexadecimal digits. The byte value is derived from the character's UTF-8 encoding.

A space character is 0x20 in ASCII, so it becomes %20. The euro sign encodes to three UTF-8 bytes: 0xE2 0x82 0xAC, giving you %E2%82%AC. The formula is mechanical: encode each byte of the UTF-8 sequence, prepend %.

Space     → %20
/         → %2F
?         → %3F
#         → %23
€         → %E2%82%AC
😀        → %F0%9F%98%80

You can verify any character by looking up its UTF-8 byte values and converting to hex. Or just use the URL Encoder to do it instantly.

Path Encoding vs Query String Encoding

These two contexts have different rules, and mixing them up is a common source of bugs.

Path segments — everything between the slashes in the path — must encode any character that would be misread as a URL delimiter. A slash inside a path segment must become %2F, otherwise the parser will split it into two segments. Spaces become %20.

/files/my document.pdf       ← broken
/files/my%20document.pdf     ← correct

Query strings follow the same percent-encoding rules but have an additional layer: HTML form submissions historically encoded spaces as + instead of %20. This is defined in the application/x-www-form-urlencoded content type, not in RFC 3986.

?q=hello world    ← ambiguous
?q=hello%20world  ← RFC 3986 compliant
?q=hello+world    ← form-encoded (+ means space here)

The + as space convention only applies inside query strings, and only under application/x-www-form-urlencoded. In a URL path, a literal + is just a plus sign — nothing special.

The + vs %20 Ambiguity

This trips up a lot of developers. Here's the rule of thumb:

  • If you're building a URL for a browser address bar or an API endpoint path, use %20 for spaces.
  • If you're encoding HTML form data (Content-Type: application/x-www-form-urlencoded), use + for spaces — that's what URLSearchParams produces.
  • When in doubt, use %20. It's unambiguous everywhere.

The danger is receiving a + in a query string and passing it to a context that doesn't decode it as a space. Some server-side decoders only understand %20 — they'll leave the + as a literal plus character, and you'll spend an hour wondering why search queries are broken.

encodeURI vs encodeURIComponent in JavaScript

JavaScript ships two built-in encoding functions, and they cover different use cases.

encodeURI() is designed for encoding a complete URL. It leaves reserved characters like /, ?, #, and & alone because they're assumed to be meaningful URL structure.

encodeURI('https://example.com/search?q=hello world&lang=en')
// → 'https://example.com/search?q=hello%20world&lang=en'

encodeURIComponent() is for encoding a single component — a query parameter value, a path segment, a fragment. It encodes reserved characters too, because inside a component those characters lose their structural meaning.

encodeURIComponent('hello world & goodbye')
// → 'hello%20world%20%26%20goodbye'

// Building a query string safely:
const q = encodeURIComponent(userInput);
const url = `https://example.com/search?q=${q}`;

The mistake to avoid: using encodeURI on a value you're embedding in a query string. If the value contains & or =, encodeURI leaves them unencoded and the parser treats them as delimiters. Always use encodeURIComponent for values.

Double-Encoding: A Common Pitfall

Double-encoding happens when you encode something that's already encoded. The % sign encodes to %25, so %20 becomes %2520 after a second pass — and now your URL is broken in a way that's genuinely confusing to debug.

Original:        hello world
Encoded once:    hello%20world
Encoded twice:   hello%2520world   ← broken

It usually happens when:

  • A framework encodes input that was already encoded by the developer.
  • URL-encoded data is stored and retrieved, then encoded again before use.
  • Middleware layers each add their own encoding without coordinating.

The fix: encode at the last possible moment before sending, and only ever encode raw (unencoded) values. If you're unsure whether a value is already encoded, try decoding it with decodeURIComponent first and compare.

When You'd Actually Need This

Beyond query parameters, percent-encoding shows up in several practical scenarios:

Webhooks and redirects — if you're building a redirect URL that includes a return path as a parameter, the path must be encoded: ?next=%2Fdashboard%2Fsettings.

File downloads — the Content-Disposition header uses a form of encoding for filenames with spaces or non-ASCII characters.

API integrations — REST APIs that accept resource identifiers in the path (like /users/{id}) require the id to be encoded if it could contain slashes or other delimiters.

OAuth — OAuth signature mechanisms require extremely precise encoding of both keys and values, where even the ~ character handling varies between implementations.

Decoding for Display

When showing a URL back to a user, you generally want to decode it. decodeURIComponent handles this in JavaScript. Be careful not to decode URLs before passing them to fetch or XMLHttpRequest — the browser expects them encoded.

// Display a decoded path to the user
const display = decodeURIComponent(window.location.pathname);
document.querySelector('#current-path').textContent = display;

If you're working with base64 strings inside URLs, note that base64 uses +, /, and = which all need encoding. URL-safe base64 (RFC 4648) replaces + with - and / with _ to avoid this. Read more in Base64 Encoding Explained.

The RFC Reality

RFC 3986 (URIs) and the WHATWG URL Standard (what browsers actually implement) differ subtly. The WHATWG spec is more permissive in some areas — it auto-encodes characters that RFC 3986 would reject. For most practical work, the difference doesn't matter. But if you're writing a URL parser or security-sensitive code that validates URLs, read both. The WHATWG URL Standard is the living reference for browser behavior.

Try It Yourself

The URL Encoder on UtilityKit handles both encoding and decoding, with separate modes for full URLs and individual components. If you need to encode binary data for a URL context, pair it with the Base64 Encoder — base64 + URL encoding is a common combination for passing structured data through query parameters.