Regular Expressions for Beginners — Common Patterns and Syntax

A regular expression (regex) is a small language for describing patterns in text with a short expression. It is widely used wherever you handle text: searching, extracting, replacing, and validating input. This article organizes the basic building blocks — character classes, metacharacters, quantifiers, groups, anchors, and flags — and then covers simple patterns for email, phone, URL and date, along with pitfalls such as ReDoS, all based on JavaScript syntax.

To start: with regex, being readable and correctly scoped matters more than being short. Rather than cramming a complex expression into one line, it is safer to break it up sensibly for the task and always verify it against real examples as you build. All code examples in this article use JavaScript syntax (/pattern/flags).

1. What a regular expression is

A regular expression represents a condition on text — such as "three digits in a row" or "something that looks like an email containing @" — as an expression built from symbols. In JavaScript you create one with a /pattern/flags literal, or with new RegExp("pattern", "flags").

For example, to simply look for the sequence "cat", you write /cat/. A part made of plain characters matches those characters themselves. The main uses are as follows.

From here, we look in turn at the meaning of the symbols (metacharacters) that make up a pattern. When you want to strip a symbol's special meaning and treat it as "just a character", place a \ (backslash) before it to escape it. For instance, to represent a literal ., write \..

2. Basic building blocks — character classes, metacharacters, quantifiers

The foundation of regex is the symbols that express "which characters" to match and "how many". Let us go through them in order.

Character classes [ ] and \d \w \s

A character class lists candidates inside [ ] and matches any one of them. Use - for a range and a leading ^ for negation.

Commonly used character classes have shorthand forms. Uppercasing them gives "the negation".

NotationMeaningNegation
\dOne digit (essentially 0-9)\D (non-digit)
\wOne word character (a-z A-Z 0-9 _)\W (non-word character)
\sOne whitespace character (space, tab, newline, etc.)\S (non-whitespace)

Metacharacters . ^ $

Metacharacters are symbols with special meaning. Here are the representative ones.

Quantifiers * + ? {n,m}

Quantifiers express how many times the preceding element repeats.

NotationMeaningExample
*0 or more timesab*c matches ac, abc, abbc
+1 or more timesab+c matches abc, abbc … (not ac)
?0 or 1 timecolou?r matches color and colour
{n}Exactly n times\d{4} is four digits
{n,m}At least n, at most m times\d{2,4} is two to four digits

Greedy and lazy matching

Quantifiers are greedy by default and match as much as possible while still satisfying the pattern. Adding ? right after them makes them lazy, matching as little as possible.

When "more is captured than you expected", greedy matching is almost always the cause. Add a ? to make it lazy (*?, +?), or use a character class that excludes the delimiter such as [^>]+, and you can scope it as intended.

3. Groups and capturing — ( ), backreferences, named

Parentheses ( ) bundle several elements into one unit. You can apply a quantifier to the whole group, and groups also capture (extract) the matched part.

Backreferences

Captured content can be backreferenced within the same pattern as \1, \2 … (numbered by the order the parentheses appear). It is used to express a repetition of a string that appeared earlier.

Named captures

Giving a name instead of a number improves readability. Capture with (?<name>...) and retrieve it by name from groups in the match result. A backreference within the same pattern is \k<name>, and in the replacement string it is $<name>.

Example: (?<year>\d{4})-(?<month>\d{2}) lets you retrieve the year and month via groups.year and groups.month.

4. Anchors and boundaries — ^ $ \b

Anchors match a "position" rather than a character itself. They are zero-width (they consume no characters).

For example, ^\d+$ expresses the condition "the whole thing is only digits" (one or more digits from start to end). Without anchors it merely means "contains a digit somewhere", which is insufficient for input validation.

When you want to search by whole words, \b is handy. \bcat\b matches the word cat but not part of category. Conversely, plain /cat/ also matches inside category — keep that in mind.

5. Flags — g i m s u

Flags are options that change the behavior of the whole pattern; in a literal you append them after the closing / (e.g. /abc/gi). Here are the representative ones.

FlagNameEffect
gglobalDoes not stop at the first match; targets all matches
iignoreCaseDoes not distinguish upper and lower case
mmultilineApplies ^ and $ to the start and end of each line
sdotAllMakes . match newlines as well
uunicodeHandles Unicode correctly (code-point units for emoji etc., the \u{...} notation)

For example, to "find all cat ignoring case", use /cat/gi. Flags can be combined and their order does not matter.

6. Common patterns — email, phone, URL, date

The table below collects simple patterns commonly seen in practice, combining the elements above. None of them are strict spec-compliant validations; they are practical rules of thumb only. When you truly need accurate validation (especially for email), pair them with a dedicated library or an actual delivery check.

TargetSimple pattern (example)Description
Email (simple)^[^\s@]+@[^\s@]+\.[^\s@]+$The minimal shape "non-whitespace/non-@ + @ + domain + . + TLD". Not strictly RFC-compliant
Phone (Japan, hyphen-separated)^0\d{1,4}-\d{1,4}-\d{4}$A simple form starting with 0, of digits and hyphens. Digit counts vary by region, so this is only a guide
URL (http/https)^https?:\/\/[^\s]+$http or https (s?) followed by a non-whitespace string. / is escaped as \/
Date (YYYY-MM-DD form)^\d{4}-\d{2}-\d{2}$Checks only the digit-count shape. Logical validity such as "month 13" must be checked separately
A pattern only looks at the "shape". For instance, the date pattern ^\d{4}-\d{2}-\d{2}$ also accepts 2026-13-99. Non-existent dates, or the real existence of an email, are semantic validity outside the scope of regex. Keep shape checking and value validation separate.

7. Pitfalls — over-complication and ReDoS

Regex is powerful, but writing too much makes it unreadable and invites performance problems. Finally, here are practical pitfalls.

In short, the basics of regex are "keep it small, keep it readable, and verify with real examples". Master the four — character classes and quantifiers, anchors, and flags — and you can cover most everyday patterns.

Free Tool Try it for real with the Regex Tester Enter a pattern and flags and check matches and capture results against your target text right in the browser. Verify what you wrote instantly.

Frequently Asked Questions (FAQ)

What is a regular expression?

A regular expression (regex) is a small language for describing patterns in text. It expresses conditions such as "three digits in a row" or "something that looks like an email containing @" as a short expression, and is used for searching, extracting, replacing, and validation. It is built into many programming languages and editors; in JavaScript you work with it through /pattern/flags literals or the RegExp object.

What is the difference between \d and \w?

\d matches a single digit (essentially 0-9). \w matches a single "word character": the alphanumerics (a-z, A-Z, 0-9) plus the underscore _. In other words, \w includes \d and additionally covers letters and the underscore, making it a broader character class. Uppercasing either one negates it (\D is any non-digit, \W is any non-word character).

What is greedy matching?

Quantifiers (* + ? {n,m}) are "greedy" by default and try to match as much as possible while still satisfying the pattern. For example, <.+> grabs everything from the first < to the last >. Adding ? right after them, as in *? +? ?? {n,m}?, makes them "lazy" so they match as little as possible. <.+?> stops at the first >.

← Back to the Tech Blog list