Regex C# Cheat Sheet



Intro

Regular expressions can be made case insensitive using (?i). In backreferences, the strings can be converted to lower or upper case using L or U (e.g. This requires PERL = TRUE. CC BY Ian Kopacka. ian.kopacka@ages.at Regular expressions can conveniently be created using rex::rex. The tables below are a reference to basic regex. While reading the rest of the site, when in doubt, you can always come back and look here. (It you want a bookmark, here's a direct link to the regex reference tables).I encourage you to print the tables so you have a cheat sheet on your desk for quick reference. Url Validation Regex Regular Expression - Taha match whole word nginx test. Any of a, b, or c ^abc not a, b, or c a-g character between a & g: Anchors ^abc$. It is looking for ' then '. I use this website to learn regex and test things out. It shows you live what is captured and has a cheat sheet that also describes your regex.

The following characters are reserved: []().^$|?*+{}. You’ll need to escape these characters in your patterns to match them in your input strings.

There’s a static method of the regex class that can escape text for you.

Ref:

Named Capture Groups

Regex

Because $Matches is of type [Hashtable] we can convert it directly to a [PSCustomObject]:

If you need the properties to be in a specific order this won’t work. But you can use a class for that instead:

Substitutions

The substitution is done by using the $ character before the group identifier.

Two ways to reference capturing groups are by Number and by Name.

  • By Number - Capturing Groups are numbered from left to right.

  • By Name - Capturing Groups can also be referenced by name.

The $& expression represents all the text matched.

WARNING
Since the $ character is used in string expansion, you’ll need to use literal strings with substitution, or escape the $ character when using double quotes.

Additionally, if you want to have the $ as a literal character, use $$ instead of the normal escape characters. When using double quotes, still escape all instances of $ to avoid incorrect substitution.

Unicode Code Point ranges

Explanation:

Expression

The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:

  • U+3040 - U+30FF: hiragana and katakana (Japanese only)
  • U+3400 - U+4DBF: CJK unified ideographs extension A (Chinese, Japanese, and Korean)
  • U+4E00 - U+9FFF: CJK unified ideographs (Chinese, Japanese, and Korean)
  • U+F900 - U+FAFF: CJK compatibility ideographs (Chinese, Japanese, and Korean)
  • U+FF66 - U+FF9F: half-width katakana (Japanese only)

As a regular expression, this would be expressed as:

This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.

Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.

Unicode regex’s let you use code-point ranges or: 1 scripts, [2] blocks, or [3] categories

Blocks are sequential:

U+3400 - U+4DBF is p{InCJK_Unified_Ideographs_Extension_A}U+4E00 - U+9FFF is p{InCJK_Unified_Ideographs}

quote (from below) Some languages are composed of multiple scripts. There is no Japanese Unicode script. Instead, Unicode offers the Hiragana, Katakana, Han, and Latin scripts that Japanese documents are usually composed of.

Here are some refs:

Regex Options

There are overloads of the static [Regex]::Match() method that allow to provide the desired [RegexOptions] programmatically:

Regex Reference

Options are ([System.Text.RegularExpressions.RegexOptions] | Get-Member -Static -MemberType Property):

Regex C# Cheat Sheet Free

  • Compiled
  • CultureInvariant
  • ECMAScript
  • ExplicitCapture
  • IgnoreCase
  • IgnorePatternWhitespace
  • Multiline
  • None
  • RightToLeft
  • Singleline

C# Regular Expression Reference

Ref: