HTML escaping is the process of replacing characters that "have a special meaning in HTML" — such as & < > " ' — with entities (character references) that look the same when displayed. Neglect it and a string a user typed is interpreted as tags or scripts, leading to the serious vulnerability known as XSS (cross-site scripting). This article lays out, accurately, what to convert and why, how XSS works and its types, and how to defend correctly according to the output context.
1. What HTML escaping is — turning five special characters into entities
When the browser reads HTML, it treats certain characters as "symbols that express structure." For example, < is the signal "a tag begins here," and & is the signal "a character reference begins here." When you want to display those characters as themselves, you must replace them with entities. That is HTML escaping.
The representative characters targeted by HTML escaping are the following five. All code and symbols are shown in entity notation.
| Character | Name | Entity (named) | Numeric character reference |
|---|---|---|---|
& | Ampersand | & | & |
< | Less-than | < | < |
> | Greater-than | > | > |
" | Double quote | " | " |
' | Single quote | ' (or ') | ' |
& to & first. If you convert < to < before handling &, the & you just produced would be converted again (becoming something like &lt;) and break.
For instance, suppose a user types <b>bold</b>. If you output it without escaping, the browser treats <b> as a tag and actually renders "bold" in bold. If you escape it and output <b>bold</b>, the screen literally shows the text <b>bold</b>.
2. Why it is necessary — the danger of user input being interpreted as HTML
Most web apps display values that users enter (names, comments, search keywords, and so on). The problem is when that value is embedded directly into the HTML. The browser cannot distinguish "this is data" from "this is an instruction (a tag)," and it interprets the embedded string as part of the HTML.
Suppose the following string is posted into a comment field (shown in entity notation).
<script>alert(document.cookie)</script>
If this is output to the page without escaping, the browser executes <script> as a real script tag. As a result, the attacker's code runs in the browser of every other user who opens that page. This is the basic structure of XSS.
3. How XSS (cross-site scripting) works and its types
XSS is a vulnerability in which an attacker's string is executed as script or HTML within a page. Once executed, it enables theft of cookies and access tokens, defacement of the page, forging of forms, and actions performed while impersonating the user. XSS is broadly divided into three types by how the string gets in.
| Type | Path of the attack string | Characteristics |
|---|---|---|
| Reflected | Sent via a URL parameter, etc., and reflected straight back in the response | Requires luring the victim into following a trap link. Common in search results and error displays |
| Stored | Saved in a database, etc., and later served to other users | Occurs on bulletin boards and comment fields. Affects all viewers, so the impact is large |
| DOM-based | Never goes to the server; JavaScript in the browser passes it to a dangerous operation | Caused by handling of innerHTML, location, etc. Cannot be prevented by server-side escaping alone |
Reflected and stored XSS are caused by a missed escape at the moment the server outputs the string. DOM-based XSS occurs when the HTML from the server is correct but JavaScript in the browser passes a user-derived value, unmodified, into something like innerHTML. In other words, "fix only the server side and you are safe" is not true.
4. Preventing it with escaping — defenses by output context
The essence of XSS defense is to "escape according to where you output the value." Even for the same user input, the dangerous characters and the escaping required differ depending on whether you place it in the HTML body, in an attribute value, or inside JavaScript.
HTML body (element content)
Convert at minimum & < >. This blocks the start of a tag.
Attribute values
Always quote attributes, and in addition to the three body characters, convert the enclosing quote (" or '). If you do not escape the quote, an attacker can break out of the attribute and inject a new tag, as in "><script>.... For example, if the value is something like x" onmouseover="alert(1), they can escape the attribute and inject an event handler.
JavaScript context
As a rule, avoid embedding user input directly inside <script>. If you must pass it, use a form that is safe as JavaScript (for example, encode it on the server with the equivalent of JSON.stringify, and additionally neutralize sequences such as </script>) rather than HTML escaping. Reusing HTML escaping as-is does not make it safe.
URL context
When you place a user value into href or src, you need URL encoding plus scheme validation. A URL starting with javascript: executes script on click, so allow only permitted schemes such as http / https.
5. Types of entities — named and numeric character references
The "character reference" that represents an escaped value comes in two families. Both display as the same character.
- Named character references: use memorable names, as in
<>&". They are readable, but you can only use the names that are defined. - Numeric character references: use the character's code point, in decimal (
<) or hexadecimal (<). They can represent any character, including ones with no named reference.
For either form, the correct way to write it is with a trailing semicolon ;. Note that ' (the named reference for the single quote) is widely usable in HTML, but for compatibility reasons the numeric ' is sometimes preferred in practice. Rather than implementing escaping yourself, it is safer and more reliable to use the standard functions of your language or framework.
<script>. Do not try to prevent XSS with your own blacklist (a scheme that rejects specific strings). The proper approach is to always pass output through escaping (or a sanitization library).
6. In practice — framework auto-escaping and CSP
Modern template engines and UI frameworks (React, Vue, various server-side templates, and so on) HTML-escape by default when you embed a variable. Many XSS flaws arise when this auto-escaping is explicitly disabled by a "insert raw HTML" feature. The cardinal rule is to not use raw-HTML insertion (the equivalent of innerHTML) for untrusted values.
- Trust the auto-escaping: use the template's standard embedding syntax and avoid raw-HTML insertion. If you truly need it, pass the value through a dedicated sanitization library (which keeps only allowed tags and attributes).
- Choose output that matches the context: use dedicated escaping/encoding for attributes, URLs, and JavaScript respectively.
- Use CSP (Content-Security-Policy) as well: a defense-in-depth measure that uses an HTTP header to restrict where executable scripts may come from. Forbidding inline scripts can suppress execution even if an injection slips through. It is "insurance" that complements escaping, not a standalone defense.
Frequently Asked Questions (FAQ)
What is HTML escaping?
It is the process of replacing characters that have a special meaning in HTML with entities (character references) that look the same when displayed. Specifically, the five characters ampersand, less-than, greater-than, double quote, and single quote are converted into forms like & < > " '. As a result, the string is not interpreted as a tag or attribute and is displayed purely as text.
What is XSS (cross-site scripting)?
It is a vulnerability in which a string supplied by an attacker is interpreted and executed as script or HTML within a web page. When unescaped user input is output to the page as-is, the attacker's script tag runs, enabling theft of cookies and tokens, defacement of the page, and actions performed while impersonating the user. It is broadly divided into three types: reflected, stored, and DOM-based.
Which characters should I convert?
In HTML body text, convert at minimum the three characters & (ampersand), < (less-than), and > (greater-than); inside attribute values, also convert " (double quote) and ' (single quote). Converting & first is the established rule. However, the escaping required for safety changes with the output context (HTML body, attribute, inside JavaScript, or inside a URL), so context-appropriate escaping is essential.