|
Regular Expressions |
Top Previous Next |
|
Regular expressions are a form of pattern-matching used in text processing for a variety of purposes. Especially users of Unix and Unix-compatible systems should be familiar with utilities such as grep or sed or with Perl programming language.
How you can use regular expressions in FollowUpExpert?
One example is setting up your autoresponder to only reply to messages with subjects along the lines of "Order #1245 for ACME" with varying order number. To match this, you can define a regular expression that matches any subject that starts with "Order #" followed by any number followed by a single space and "for ACME".
Another example is text extraction from incoming emails. Here's an example:
Email: joe@somedomain.com
You may often want to reply to the email address contained in the "Email:" field and not to the one the message is sent from. Basically, what you need to do is extract the address from the message text by matching "Email:" followed by any number of space characters followed by the email address you want to extract. This can also be done fairly easily using regular expressions.
Regular expressions consist of two types of characters: literals (for example: a, b, 1, 2) and operators (for example: |, *).
Literal characters match their equivalents in the text being processed (for example, a being part of a regular expression matches a in the text). All characters are treated as literals except for ., |, ?, +, (, ), {, }, [, ], ^, $ and \. The remaining characters are treated as operators and used for special purposes. To have an operator work as a literal (for instance, to match ? in the text being processed) prefix it with \ (for example, to match a question mark use the following regular expression: \?).
For example (in all following examples, regular expressions are marked with bold):
Acme GmbH will match "Acme GmbH" anywhere within the text being processed.
The dot operator . acts as a wildcard character, i.e. it matches any single character.
For example:
The following operators: *, + and ? are used immediately after a character or expression to have it included a number of times repeated. The asterix character * is used to have the expression included any number of times including zero, plus operator + is used to have it included at least once and the question mark operator ? - to optionally include the character once (have it included once or not included at all).
For example:
Ac*me will match "Ame", "Acme", "Accme" and so on. Ac+me will match "Acme", "Accme", "Accme" and so on but not "Ame". Ac?me will match "Ame" or "Acme".
You can put one or more characters into parentheses: ( and ) to have the whole subexpression repeated.
For example:
To explicitly specify the maximum and minimum number of repeats you can use the bounds operators: { and }. For instance, {2} means a character or expression included exactly twice, {3,5} - three to five times and {3,} at least three times with no upper limit.
For example:
To form an alternative, i.e. have either one subexpression matched or the other use the | operator.
For example:
To match a single character that is a member of a given set use the square brackets operators [ and ].
For example:
A[cC$]me will match "Acme", "ACme" and "A$me".
Sets can also contain character classes denoted using the syntax [:classname:] within a set, for instance [[:space:]] is a set containing all whitespace characters.
Available character classes:
You can use ^ and $ to match start and end of the line respectively, for example:
Note: The program matches the first possible part of the text or, if more than one match is found, the longest possible one. In case where there are multiple matches all starting at the same location and all of the same length the match with the longest first sub-expression is chosen. If that is the same for two or more matches, then the second sub-expression is taken into account and so on.
|