What Is URL Encoding? Percent-Encoding Explained

URL encoding (percent-encoding) is the mechanism that replaces characters that cannot be used directly in a URL with % followed by two hexadecimal digits. The standard is defined in RFC 3986: the target characters are converted to a UTF-8 byte sequence, and each byte is written as %XX. For example, あ becomes %E3%81%82 and a space becomes %20. This article lays out, accurately, which characters are usable in a URL, why encoding is needed, how it works, the difference between encodeURI and encodeURIComponent, how forms handle it, and common pitfalls.

The bottom line first: use encodeURIComponent when building a component of a URL, such as a query value; use encodeURI when you only need to lightly tidy up an entire URL. Once you grasp that the unit of encoding is not the character but the UTF-8 byte, and that each byte becomes %XX, most of it falls into place.

1. Which characters are usable in a URL — reserved and unreserved (RFC 3986)

The characters usable in a URL (more precisely, a URI) are restricted to a limited set of ASCII. RFC 3986 broadly divides them into unreserved characters and reserved characters.

Any other character (a space, non-ASCII characters such as Japanese, control characters, and so on) cannot be written directly in a URL. Percent-encoding is the means of carrying these safely.

2. Why encoding is needed — delimiters and multibyte characters

There are two main reasons encoding becomes necessary.

The key is to distinguish a symbol you want to use as a delimiter from a symbol you want to use as data. Even for the same &, you leave it raw when it is a parameter separator, but encode it to %26 when it is part of a value.

3. How percent-encoding works — % plus hex, per UTF-8 byte

The rule of percent-encoding is simple. Convert the character you want to encode into a UTF-8 byte sequence, and write each byte as % followed by two hexadecimal digits (uppercase recommended).

For example, a space is one byte, 0x20, so it is %20. The character あ is three bytes in UTF-8, 0xE3 0x81 0x82, so it becomes %E3%81%82, three %XX groups in a row.

CharacterUTF-8 bytesPercent-encoded
space20%20
&26%26
/2F%2F
=3D%3D
E3 81 82%E3%81%82
é (as in café)C3 A9%C3%A9

Unreserved characters (A-Z a-z 0-9 - . _ ~) mean the same thing whether or not they are encoded, so they are normally left as is.

%20 and + for a space are not the same. In a path, a space is fundamentally %20. Treating + as a space is a historical convention limited to query strings and form submission (application/x-www-form-urlencoded); in a path, + is just a literal +. Confusing them shifts the decoded result.

4. The difference between encodeURI and encodeURIComponent

JavaScript has two encoding functions, and the decisive difference is whether they leave delimiters intact.

Input characterencodeURIencodeURIComponent
/left as /%2F
?left as ?%3F
&left as &%26
=left as =%3D
space%20%20
%E3%81%82%E3%81%82

The rule is clear. Use encodeURIComponent to encode a query value or a single component, and use encodeURI only to tidy up an already-assembled, whole URL. If you use encodeURI on a value, & and = remain and break the parameters.

5. How query strings and form submission handle it

A query string takes the form ?key1=value1&key2=value2, separating parameters with & and joining a key to its value with =. That is exactly why you must encode each key and value with encodeURIComponent before assembling them.

In other words, understanding the dual nature — "in a path a space is %20, but in the form/query convention + is also possible" — lets you avoid mismatches when decoding.

6. Common pitfalls

Finally, here are the points people most often stumble over in practice.

Free Tool Try it for real with the URL Encoder / Decoder Percent-encode and decode text in your browser. With encodeURIComponent-equivalent conversion, it is handy for building and checking query values.

Frequently Asked Questions (FAQ)

What is URL encoding?

URL encoding (percent-encoding) is a scheme that replaces characters that cannot be used directly in a URL with a percent sign followed by two hexadecimal digits. It is defined in RFC 3986: the target characters are first converted to a UTF-8 byte sequence, and each byte is represented in the form %XX. For example, the character あ becomes %E3%81%82 and a space becomes %20. It is used to send data safely without confusing it with delimiters such as ?, &, and /.

What is the difference between encodeURI and encodeURIComponent?

encodeURI is meant to encode an entire URL, so it leaves delimiters such as :, /, ?, #, &, and = unencoded. encodeURIComponent, on the other hand, is meant to encode a component such as a query value or part of a path, and it converts those delimiters as well. As a rule, use encodeURIComponent when building the values of a query string. If you use encodeURI on a value, characters such as & and = remain and break the parameters.

Why does Japanese text turn into a string of %xx?

Because only a subset of ASCII can be used directly in a URL, non-ASCII characters such as Japanese cannot be placed as is. So each character is encoded into a UTF-8 byte sequence, and each byte is represented as %XX. Most Japanese characters are three bytes each, so a single character yields three %XX groups. For example, あ is three bytes (0xE3 0x81 0x82), so it becomes %E3%81%82.

← Back to the Tech Blog list