Note: I prepared the following notes for a presentation/workshop that I recently gave at work to the UI Engineer community.
Terms
- Regular Expression – a pattern describing a certain amount of text
- It is case sensitive.
- Literal character – a character we want to find
- Metacharacter – a character which does the finding
Metacharacters
- All Metacharacters: . | () [] ^ $ ? * + {} \
- Descriptions
- .
- Official Name: period or dot or wild.
- Shorthand: period
- Matches: any character except newline
- Example: .
- |
- Official Name: alternator
- Shorthand: Pipe
- Matches: the lefthand or righthand value
- If the lefthand side is found, it will skip the righthand side.
- Example:
- find e or m
- e|m
- ()
- Official Name: capture group
- A slightly more official name might be “subexpression”.
- They can be nested to any depth.
- Matches: a group
- Example:
- Find ‘gray’ or ‘grey’
- RegEx: gr(a|e)y
- However, it also create a variable which some call a “backreference”.
- $1 through $9
- Example:
- Wrap all “grays/greys” in strong tag
- Find: (gr(a|e)y)
- RegEx:
<
strong
>$1</
strong
>
- Wrap all “grays/greys” in strong tag
- Example:
- Shorthand Name: none
- Official Name: capture group
- []
- Official name: Character Class
- Shorthand Name: Character Set
- Match: the first character, if it doesn’t exist try the next one, etc.
- Matches each character individually (i.e. it is not matching a string of characters).
- The order of the characters does not matter.
- If set to global, will match all characters.
- Examples:
- Find: all lowercase letters
- RegEx: [a-z]
- RegEx: [a-g1-4]
- Find: gray and grey
- RegEx: gr[ae]y
- Find: all lowercase letters
- Shorthand Character Classes (sometimes simply called “Character Classes”, but that’s less accurate)
- \d any number – shorthand for [0-9]
- \D anything but a number
- \s any space
- \S anything but a space
- \w any letter
- \W anything but a letter
- .
- Anchors
- ^
- Official Name:
- Left Anchor (if outside of square brackets)
- Caret (if inside square brackets)
- Shorthand Name: none
- Matches
- If it’s before a letter, it means the letter following must be at the beginning of a string.
- Find: S at beginning of line
- RegEx: ^S
- If it’s inside square brackets, it means “not”.
- Example: [^m-z]
- If it’s before a letter, it means the letter following must be at the beginning of a string.
- Official Name:
- $
- Offical Name: Right Anchor
- Shorthand: none
- Matches: If after a letter, it means the letter previous must be at the end of a string.
- Example
- Find: ! at end of string
- RegEx: !$
- ^
- Quantifiers – controls the number of times the preceding character is found
- ?
- Official name: question mark
- Shorhand Name: none
- Matches: 0 or 1 repetition of the preceding character
- Unsure, but careful
- I don’t know what I want! I just know I don’t want a lot of it.
- Example
- Find metacharacter with or without the “s”
- RegEx: metacharacters?
- Find all variations of Jennifer
- Jenn(ifer)?(s)?
- Jenn(ifer)?(s|y)?
- we’ll do further variation on this in the next section
- Find metacharacter with or without the “s”
- *
- Official Name: Asterisk or Star
- Matches: 0 or more repetition of the preceding character
- Shorthand Name: The Star
- I’m the star, I’ll take everything, even if it’s not there.
- “Even if it’s not there” means it won’t break the regular expression.
- I’m the star, I’ll take everything, even if it’s not there.
- Example
- Find metacharacter with or without the “s”
- RegEx: metacharacters*
- Find all variations of Jennifer and includ the last name (whether or not it exists).
- Jenn(ifer)?(s|y)?\s\w
- Find dollar amounts with or without dollars (i.e. could be sense only).
- \$\d*.
- Find metacharacter with or without the “s”
- +
- Official Name: Plus sign
- Matches: 1 or more repetitions of the preceeding character
- Shorthand Name: More friends!
- He always wants more friends, and won’t take no for an answer.
- Example
- Find: metacharacter with one or more s’s
- RegEx: metacharacters+
- {}
- Official Name: Repetition Operator
- Shorthand Name: quantifier, or curly brackets
- Matches: n number of repetitions of the preceding character
- Example
- Find: metacharacters where there are 3 “s” only
- RegEx: metacharacters{3}
- RegEx: metacharacters{2,3}
- RegEx: metacharacters{4,}
- Find a standard telephone number
- {n} [0-9]{3}-[0-9]{4} finds 123-4567 (a standard telephone number)
- Find: metacharacters where there are 3 “s” only
- ?
- \
- Official Name: Backslash
- Matches: forces the following metacharacter to be treated as a literal character.
- Example
- Find: dollar amounts
- RegEx: \$\d*.\d{2}
Resources
- Great introductory YouTube video on RegEx: https://www.youtube.com/watch?v=DRR9fOXkfRE
- Recommended RegEx Tester: https://regex101.com/#javascript
- Learn More about RegEx: http://www.regular-expressions.info/