7 min read

To make it easier to read we’ve broken down favourite regex characters into categories of how often she uses them (and you will too!).

Most Used Characters

Pipe: |

The pipe is the regex character equivalent to “or”. If you needed to figure out the amount of conversions you have gotten from Google, Bing or Yahoo you would create a segment like this:
regex characters explained - pipe regular expression
 
regex characters explained - pipe regular expressions goals
 

Dot: .

The dot regex is your wildcard because it can match any other character. The . can replace any number, letter, special character or a space. Alone, it’s nothing too special but when paired with an asterisk it’s pretty darn cool.
 

Asterisk: *

Basically, it tells GA to match zero or more of the characters placed before. It looks at the character before it (usually the dot) and tells GA that there may or may not be that character and an unlimited number of matches after.
Of course, advanced segments was created to enable you to not have to use regex characters to make segments. Between the “and/or” functions and how you can choose “contains” and “starts with” from the condition field you can regularly get away without using the dot and asterisk. But there are still ways in which the dot paired with the asterisk can be used in more advanced analytics.
This is particularly relevant if you’re using any subdomains. By default GA will only show the URI. That means you can’t effortlessly find which pages are from which subdomains. By creating the filter below in your Google Analytics you can overpass this problem. Here the .* means the Hostname and URI can use any characters. Combine the Hostname and the Request URI to replace the standard URI with the full URL:
regular expressions asterisk
 

Backslash: \

The backslash tells Google Analytics to see the next character as a normal character NOT a regex character. So if you wrote index\.aspx\?query=travel\+hotels I’m letting GA know that I want it to view the ., ? and + signs as regular characters Not as regex .
 

Occasional Use

Caret: ^

This is commonly used in segments and goals and tells GA that your selection needs to start with what you place after the caret. For example, if you wanted to view all your landing pages in one directory of your website, you could do this:
regular expressions caret
This character is only used occasionally because you can choose the “starts with” function from the “condition” menu when you make a new segment, however, GA doesn’t always provide that choice.
 

Dollar Symbol: $

The dollar character tells GA that the string has ended. For example, travel insurance$ matches best travel insurance but does NOT match travel insurance costs. You can also place a dollar sign at the end of a URL to ensure that URL with any query strings isn’t counted in. You could also use the $ at the close of a directory to match the category page only and exclude the subpages.
 

Question Mark: ?

Technically speaking it actually means zero or one of the characters before but Annie and I prefer to think of it as the character before it being optional.
To give you an example I needed to see the keywords that include travelling but as part of my audience is American I want to include the word traveling as well. Here’s how to use the question mark to return keywords that match both travelling and traveling:
 
regular expression regex question mark ?
This is also useful to use for any words that are commonly misspelled.
 

Parentheses or brackets: ( )

Used to create sets, these will most likely be used when creating rewrite filters. Let’s use Annie’s example here when she had to create a bucket for all the URLs that were generated when someone searched for a property on a client’s site. Here’s the regex chararacters she used to create a net wide enough to get all the needed pages:
(^/index\.html\?pclass.*)|(/index.html\?action=search.*)|(/index\.php\?cur_page=.*)|(/index\.html\?searchtext.*)|(realty/index\.html\?pclass.*)
 

Seldom Used Regex Characters

Square brackets: [ ]

This tells Google to match one of the characters that sits inside the square brackets. Therefore, che[ae]p equals cheap and cheep.  You can also place a dash inside to let Google know you want to pick from a range. [1-7] lets you choose any number between 1 and 7. For example, this method can be used when filtering out IP addresses. Let’s use Annie’s example below:
regex characters explained - square brackets
 

Plus Sign: +

This sign means “1 or more of the characters before it”. It’s similar to the asterisk but it needs at least one of the characters to match.
 

Curly Braces: { }

This is another one that’s rarely used but can come in handy with really complicated URL rewrites. Curly braces show how many times you might need a character to be repeated.
 

Testing your Regex Characters Filtering

One of the great things about Google Analytics is that each report has a filter, and that filter is sensitive to regex. Annie Cushing tried out many regex characters testers before realising that this is actually the perfect way to test regex specific to Google Analytics.
regex characters explained - testing your regular expressions regex
Open the report with the items you’ve written the regex for: your keyword report or traffic sources report for example. Then just paste your regex into the filter and if all your pages are there, you’re good to go! It has saved both Annie and everyone at In Marketing We Trust heaps of time.
Note: Back in 2011, Annie Cushing wrote an awesome post for Blue Glass titled, Regular Expressions – Don’t Use Google Analytics without them. We’ve taken our learnings from that post and updated them for analytics use in 2016. We regularly learn a lot from Ms. Cushing at In Marketing We Trust, (you can find out more about her here) but this post was particularly relevant and we think you’ll feel the same. From the post we’ve narrowed down our top 11 regular expressions for use in GA. Don’t use Google Analytics without them!

Leave a Reply