URL encoding (percent-encoding) is the mechanism that replaces characters that cannot be used directly in a URL with % followed by two hexadecimal digits. The standard is defined in RFC 3986: the target characters are converted to a UTF-8 byte sequence, and each byte is written as %XX. For example, あ becomes %E3%81%82 and a space becomes %20. This article lays out, accurately, which characters are usable in a URL, why encoding is needed, how it works, the difference between encodeURI and encodeURIComponent, how forms handle it, and common pitfalls.
encodeURIComponent when building a component of a URL, such as a query value; use encodeURI when you only need to lightly tidy up an entire URL. Once you grasp that the unit of encoding is not the character but the UTF-8 byte, and that each byte becomes %XX, most of it falls into place.
1. Which characters are usable in a URL — reserved and unreserved (RFC 3986)
The characters usable in a URL (more precisely, a URI) are restricted to a limited set of ASCII. RFC 3986 broadly divides them into unreserved characters and reserved characters.
- Unreserved characters: the letters
A-Z a-z, the digits0-9, and the four symbols-._~. These may be used as is and do not need to be encoded. - Reserved characters:
: / ? # [ ] @(general delimiters) and! $ & ' ( ) * + , ; =(sub-delimiters). Because these carry a special meaning as delimiters, you must encode them when you want to use them as data.
Any other character (a space, non-ASCII characters such as Japanese, control characters, and so on) cannot be written directly in a URL. Percent-encoding is the means of carrying these safely.
2. Why encoding is needed — delimiters and multibyte characters
There are two main reasons encoding becomes necessary.
- To avoid collisions with delimiters: a URL expresses its structure with symbols such as
?(start of the query),&(parameter separator),/(path separator), and#(fragment). If these appear raw inside a value, they are mistaken for part of the structure. For example, if a search term istea&coffee, writing?q=tea&coffeecausescoffeeto be interpreted as a separate parameter. - To carry non-ASCII (multibyte) characters: a URL can only directly represent a subset of ASCII. Japanese, emoji, accented characters, and the like cannot be placed as is, so they must first be converted to a byte sequence and then expressed as
%XX.
&, you leave it raw when it is a parameter separator, but encode it to %26 when it is part of a value.
3. How percent-encoding works — % plus hex, per UTF-8 byte
The rule of percent-encoding is simple. Convert the character you want to encode into a UTF-8 byte sequence, and write each byte as % followed by two hexadecimal digits (uppercase recommended).
For example, a space is one byte, 0x20, so it is %20. The character あ is three bytes in UTF-8, 0xE3 0x81 0x82, so it becomes %E3%81%82, three %XX groups in a row.
| Character | UTF-8 bytes | Percent-encoded |
|---|---|---|
| space | 20 | %20 |
& | 26 | %26 |
/ | 2F | %2F |
= | 3D | %3D |
| あ | E3 81 82 | %E3%81%82 |
| é (as in café) | C3 A9 | %C3%A9 |
Unreserved characters (A-Z a-z 0-9 - . _ ~) mean the same thing whether or not they are encoded, so they are normally left as is.
%20 and + for a space are not the same. In a path, a space is fundamentally %20. Treating + as a space is a historical convention limited to query strings and form submission (application/x-www-form-urlencoded); in a path, + is just a literal +. Confusing them shifts the decoded result.
4. The difference between encodeURI and encodeURIComponent
JavaScript has two encoding functions, and the decisive difference is whether they leave delimiters intact.
encodeURI: meant to encode an entire URL. It leaves delimiters such as: / ? # [ ] @ & = + $ , ;intact. It is for lightly tidying up an already-assembled URL.encodeURIComponent: meant to encode a component (part of a path, a query value, and so on). It converts those delimiters as well. Use it to embed a value safely.
| Input character | encodeURI | encodeURIComponent |
|---|---|---|
/ | left as / | %2F |
? | left as ? | %3F |
& | left as & | %26 |
= | left as = | %3D |
| space | %20 | %20 |
| あ | %E3%81%82 | %E3%81%82 |
The rule is clear. Use encodeURIComponent to encode a query value or a single component, and use encodeURI only to tidy up an already-assembled, whole URL. If you use encodeURI on a value, & and = remain and break the parameters.
5. How query strings and form submission handle it
A query string takes the form ?key1=value1&key2=value2, separating parameters with & and joining a key to its value with =. That is exactly why you must encode each key and value with encodeURIComponent before assembling them.
- Example: for key
qand valuetea & coffee, encode the value totea%20%26%20coffee, giving?q=tea%20%26%20coffee. - When an HTML form is submitted, the browser builds the body (or the URL) in the
application/x-www-form-urlencodedformat. This is the samekey=value&key=valueformat as a query string. - In this format, a space is historically encoded as
+. The receiving side restores+to a space and then decodes the%XXsequences.
In other words, understanding the dual nature — "in a path a space is %20, but in the form/query convention + is also possible" — lets you avoid mismatches when decoding.
6. Common pitfalls
Finally, here are the points people most often stumble over in practice.
- Double encoding: encoding an already-encoded string a second time. The
%in%20itself becomes%25, producing a broken string like%2520. The iron rule is to encode "the raw value exactly once." - Using
encodeURIon a value: because delimiters remain, a value containing&or=breaks the parameter boundaries. Always useencodeURIComponentfor values. - Mixing up
+and%20for a space: you might wrongly decode+as a space in a path, or expect%20in a form value and get a mismatch. Note that handling differs by context. - Encoding with a non-UTF-8 charset: older systems sometimes byte-encode with Shift_JIS or similar, which conflicts with UTF-8-assuming decoding and garbles text. As a rule, standardize on UTF-8.
- Over-encoding even unreserved characters:
-._~do not need encoding. Converting them anyway still works, but it makes the URL harder to read.
Frequently Asked Questions (FAQ)
What is URL encoding?
URL encoding (percent-encoding) is a scheme that replaces characters that cannot be used directly in a URL with a percent sign followed by two hexadecimal digits. It is defined in RFC 3986: the target characters are first converted to a UTF-8 byte sequence, and each byte is represented in the form %XX. For example, the character あ becomes %E3%81%82 and a space becomes %20. It is used to send data safely without confusing it with delimiters such as ?, &, and /.
What is the difference between encodeURI and encodeURIComponent?
encodeURI is meant to encode an entire URL, so it leaves delimiters such as :, /, ?, #, &, and = unencoded. encodeURIComponent, on the other hand, is meant to encode a component such as a query value or part of a path, and it converts those delimiters as well. As a rule, use encodeURIComponent when building the values of a query string. If you use encodeURI on a value, characters such as & and = remain and break the parameters.
Why does Japanese text turn into a string of %xx?
Because only a subset of ASCII can be used directly in a URL, non-ASCII characters such as Japanese cannot be placed as is. So each character is encoded into a UTF-8 byte sequence, and each byte is represented as %XX. Most Japanese characters are three bytes each, so a single character yields three %XX groups. For example, あ is three bytes (0xE3 0x81 0x82), so it becomes %E3%81%82.