What Is URL Encoding? Percent-Encoding Explained

URL encoding (percent-encoding) is the mechanism that replaces characters that cannot be used directly in a URL with % followed by two hexadecimal digits. The standard is defined in RFC 3986: the target characters are converted to a UTF-8 byte sequence, and each byte is written as %XX. For example, あ becomes %E3%81%82 and a space becomes %20. This article lays out, accurately, which characters are usable in a URL, why encoding is needed, how it works, the difference between encodeURI and encodeURIComponent, how forms handle it, and common pitfalls.

The bottom line first: use encodeURIComponent when building a component of a URL, such as a query value; use encodeURI when you only need to lightly tidy up an entire URL. Once you grasp that the unit of encoding is not the character but the UTF-8 byte, and that each byte becomes %XX, most of it falls into place.

1. Which characters are usable in a URL — reserved and unreserved (RFC 3986)

The characters usable in a URL (more precisely, a URI) are restricted to a limited set of ASCII. RFC 3986 broadly divides them into unreserved characters and reserved characters.

Unreserved characters: the letters A-Z a-z, the digits 0-9, and the four symbols - . _ ~. These may be used as is and do not need to be encoded.
Reserved characters: : / ? # [ ] @ (general delimiters) and ! $ & ' ( ) * + , ; = (sub-delimiters). Because these carry a special meaning as delimiters, you must encode them when you want to use them as data.

Any other character (a space, non-ASCII characters such as Japanese, control characters, and so on) cannot be written directly in a URL. Percent-encoding is the means of carrying these safely.

2. Why encoding is needed — delimiters and multibyte characters

There are two main reasons encoding becomes necessary.

To avoid collisions with delimiters: a URL expresses its structure with symbols such as ? (start of the query), & (parameter separator), / (path separator), and # (fragment). If these appear raw inside a value, they are mistaken for part of the structure. For example, if a search term is tea&coffee, writing ?q=tea&coffee causes coffee to be interpreted as a separate parameter.
To carry non-ASCII (multibyte) characters: a URL can only directly represent a subset of ASCII. Japanese, emoji, accented characters, and the like cannot be placed as is, so they must first be converted to a byte sequence and then expressed as %XX.

The key is to distinguish a symbol you want to use as a delimiter from a symbol you want to use as data. Even for the same &, you leave it raw when it is a parameter separator, but encode it to %26 when it is part of a value.

3. How percent-encoding works — % plus hex, per UTF-8 byte

The rule of percent-encoding is simple. Convert the character you want to encode into a UTF-8 byte sequence, and write each byte as % followed by two hexadecimal digits (uppercase recommended).

For example, a space is one byte, 0x20, so it is %20. The character あ is three bytes in UTF-8, 0xE3 0x81 0x82, so it becomes %E3%81%82, three %XX groups in a row.

Character	UTF-8 bytes	Percent-encoded
space	20	`%20`
`&`	26	`%26`
`/`	2F	`%2F`
`=`	3D	`%3D`
あ	E3 81 82	`%E3%81%82`
é (as in café)	C3 A9	`%C3%A9`

Unreserved characters (A-Z a-z 0-9 - . _ ~) mean the same thing whether or not they are encoded, so they are normally left as is.

%20 and + for a space are not the same. In a path, a space is fundamentally %20. Treating + as a space is a historical convention limited to query strings and form submission (application/x-www-form-urlencoded); in a path, + is just a literal +. Confusing them shifts the decoded result.

4. The difference between encodeURI and encodeURIComponent

JavaScript has two encoding functions, and the decisive difference is whether they leave delimiters intact.

encodeURI: meant to encode an entire URL. It leaves delimiters such as : / ? # [ ] @ & = + $ , ; intact. It is for lightly tidying up an already-assembled URL.
encodeURIComponent: meant to encode a component (part of a path, a query value, and so on). It converts those delimiters as well. Use it to embed a value safely.

Input character	`encodeURI`	`encodeURIComponent`
`/`	left as `/`	`%2F`
`?`	left as `?`	`%3F`
`&`	left as `&`	`%26`
`=`	left as `=`	`%3D`
space	`%20`	`%20`
あ	`%E3%81%82`	`%E3%81%82`

The rule is clear. Use encodeURIComponent to encode a query value or a single component, and use encodeURI only to tidy up an already-assembled, whole URL. If you use encodeURI on a value, & and = remain and break the parameters.

5. How query strings and form submission handle it

A query string takes the form ?key1=value1&key2=value2, separating parameters with & and joining a key to its value with =. That is exactly why you must encode each key and value with encodeURIComponent before assembling them.

Example: for key q and value tea & coffee, encode the value to tea%20%26%20coffee, giving ?q=tea%20%26%20coffee.
When an HTML form is submitted, the browser builds the body (or the URL) in the application/x-www-form-urlencoded format. This is the same key=value&key=value format as a query string.
In this format, a space is historically encoded as +. The receiving side restores + to a space and then decodes the %XX sequences.

In other words, understanding the dual nature — "in a path a space is %20, but in the form/query convention + is also possible" — lets you avoid mismatches when decoding.

6. Common pitfalls

Finally, here are the points people most often stumble over in practice.

Double encoding: encoding an already-encoded string a second time. The % in %20 itself becomes %25, producing a broken string like %2520. The iron rule is to encode "the raw value exactly once."
Using encodeURI on a value: because delimiters remain, a value containing & or = breaks the parameter boundaries. Always use encodeURIComponent for values.
Mixing up + and %20 for a space: you might wrongly decode + as a space in a path, or expect %20 in a form value and get a mismatch. Note that handling differs by context.
Encoding with a non-UTF-8 charset: older systems sometimes byte-encode with Shift_JIS or similar, which conflicts with UTF-8-assuming decoding and garbles text. As a rule, standardize on UTF-8.
Over-encoding even unreserved characters: - . _ ~ do not need encoding. Converting them anyway still works, but it makes the URL harder to read.

Free Tool Try it for real with the URL Encoder / Decoder Percent-encode and decode text in your browser. With encodeURIComponent-equivalent conversion, it is handy for building and checking query values.

Frequently Asked Questions (FAQ)

What is URL encoding?

URL encoding (percent-encoding) is a scheme that replaces characters that cannot be used directly in a URL with a percent sign followed by two hexadecimal digits. It is defined in RFC 3986: the target characters are first converted to a UTF-8 byte sequence, and each byte is represented in the form %XX. For example, the character あ becomes %E3%81%82 and a space becomes %20. It is used to send data safely without confusing it with delimiters such as ?, &, and /.

What is the difference between encodeURI and encodeURIComponent?

encodeURI is meant to encode an entire URL, so it leaves delimiters such as :, /, ?, #, &, and = unencoded. encodeURIComponent, on the other hand, is meant to encode a component such as a query value or part of a path, and it converts those delimiters as well. As a rule, use encodeURIComponent when building the values of a query string. If you use encodeURI on a value, characters such as & and = remain and break the parameters.

Why does Japanese text turn into a string of %xx?

Because only a subset of ASCII can be used directly in a URL, non-ASCII characters such as Japanese cannot be placed as is. So each character is encoded into a UTF-8 byte sequence, and each byte is represented as %XX. Most Japanese characters are three bytes each, so a single character yields three %XX groups. For example, あ is three bytes (0xE3 0x81 0x82), so it becomes %E3%81%82.

What Is URL Encoding? Percent-Encoding Explained

1. Which characters are usable in a URL — reserved and unreserved (RFC 3986)

2. Why encoding is needed — delimiters and multibyte characters

3. How percent-encoding works — % plus hex, per UTF-8 byte

4. The difference between encodeURI and encodeURIComponent

5. How query strings and form submission handle it

6. Common pitfalls

Related pages

Frequently Asked Questions (FAQ)