Perl regular expressions

DESCRIPTION

Regular expressions, extended patterns, backtracking, version 8 regular expressions, warning on \1 vs $1, repeated patterns matching zero-length substring, combining pieces together, creating custom re engines.

perlre - Perl regular expressions

This page describes the syntax of regular expressions in Perl.

If you haven't used regular expressions before, a quick-start introduction is available in perlrequick , and a longer tutorial introduction is available in perlretut .

For reference on how regular expressions are used in matching operations, plus various examples of the same, see discussions of m// , s/// , qr// and ?? in perlop/``Regexp Quote-Like Operators'' .

Matching operations can have various modifiers. Modifiers that relate to the interpretation of the regular expression inside are listed below. Modifiers that alter the way a regular expression is used by Perl are detailed in perlop/``Regexp Quote-Like Operators'' and perlop/``Gory details of parsing quoted constructs'' .

If use locale is in effect, the case map is taken from the current locale. See perllocale .

The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s without /m will force ``^'' to match only at the beginning of the string and ``$'' to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the ``.'' match any character whatsoever, while still allowing ``^'' and ``$'' to match, respectively, just after and just before newlines within the string.

These are usually written as ``the /x modifier'', even though the delimiter in question might not really be a slash. Any of these modifiers may also be embedded within the regular expression itself using the (?...) construct. See below.

The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x ), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early. See the C-comment deletion code in perlop .

The patterns used in Perl pattern matching derive from supplied in the Version 8 regex routines. (The routines are derived (distantly) from Henry Spencer's freely redistributable reimplementation of the V8 routines.) See Version 8 Regular Expressions for details.

In particular the following metacharacters have their standard egrep -ish meanings:

By default, the ``^'' character is guaranteed to match only the beginning of the string, the ``$'' character only the end (or before the newline at the end), and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by ``^'' or ``$''. You may, however, wish to treat a string as a multi-line buffer, such that the ``^'' will match after any newline within the string, and ``$'' will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $* , but this practice is now deprecated.)

To simplify multi-line substitutions, the ``.'' character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't. The /s modifier also overrides the setting of $* , in case you have some (badly behaved) older code that sets it in another module.

The following standard quantifiers are recognized:

(If a curly bracket occurs in any other context, it is treated as a regular character. In particular, the lower bound is not optional.) The ``*'' modifier is equivalent to {0,} , the ``+'' modifier to {1,} , and the ``?'' modifier to {0,1} . n and m are limited to integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this:

By default, a quantified subpattern is ``greedy'', that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ``?''. Note that the meanings don't change, just the ``greediness'':

Because patterns are processed as double quoted strings, the following also work:

If use locale is in effect, the case map used by \l , \L , \u and \U is taken from the current locale. See perllocale . For documentation of \N{name} , see charnames .

You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the corresponding variable, while escaping will cause the literal string \$ to be matched. You'll need to write something like m/\Quser\E\@\Qhost/ .

In addition, Perl defines the following:

A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word). If use locale is in effect, the list of alphabetic characters generated by \w is taken from the current locale. See perllocale . You may use \w , \W , \s , \S , \d , and \D within character classes, but if you try to use them as endpoints of a range, that's not a range, the ``-'' is understood literally. If Unicode is in effect, \s matches also ``\x{85}'', ``\x{2028}, and ''\x{2029}``, see perlunicode for more details about \pP , \PP , and \X , and perluniintro about Unicode in general. You can define your own \p and \P propreties, see perlunicode .

The POSIX character class syntax

is also available. The available classes and their backslash equivalents (if available) are as follows:

For example use [:upper:] to match all the uppercase characters. Note that the [] are part of the [::] construct, not part of the whole character class. For example:

matches zero, one, any alphabetic character, and the percentage sign.

The following equivalences to Unicode \p{} constructs and equivalent backslash character classes (if available), will hold:

For example [:lower:] and \p{IsLower} are equivalent.

If the utf8 pragma is not used but the locale pragma is, the classes correlate with the usual isalpha(3) interface (except for `word' and `blank').

The assumedly non-obviously named classes are:

You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. For example:

Perl respects the POSIX standard in that POSIX character classes are only supported within a character class. The POSIX character classes [.cc.] and [=cc=] are recognized but not supported and trying to use them will cause an error.

Perl defines the following zero-width assertions:

A word boundary ( \b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W . (Within character classes \b represents backspace rather than a word boundary, just as it normally does in any double-quoted string.) The \A and \Z are just like ``^'' and ``$'', except that they won't match multiple times when the /m modifier is used, while ``^'' and ``$'' will match at every internal line boundary. To match the actual end of the string and not ignore an optional trailing newline, use \z .

The \G assertion can be used to chain global matches (using m//g ), as described in perlop/``Regexp Quote-Like Operators'' . It is also useful when writing lex -like scanners, when you have several patterns that you want to match against consequent substrings of your string, see the previous reference. The actual location where \G will match can also be influenced by using pos() as an lvalue: see perlfunc/pos . Currently \G is only fully supported when anchored to the start of the pattern; while it is permitted to use it elsewhere, as in /(?<=\G..)./g , some such uses ( /.\G/g , for example) currently cause problems, and it is recommended that you avoid such usage for now.

The bracketing construct ( ... ) creates capture buffers. To refer to the digit'th buffer use \<digit> within the match. Outside the match use ``$'' instead of ``\''. (The \<digit> notation works in certain circumstances outside the match. See the warning below about \1 vs $1 for details.) Referring back to another part of the match is called a backreference .

There is no limit to the number of captured substrings that you may use. However Perl also uses \10, \11, etc. as aliases for \010, \011, etc. (Recall that 0 means octal, so \011 is the character at number 9 in your coded character set; which would be the 10th character, a horizontal tab under ASCII.) Perl resolves this ambiguity by interpreting \10 as a backreference only if at least 10 left parentheses have opened before it. Likewise \11 is a backreference only if at least 11 left parentheses have opened before it. And so on. \1 through \9 are always interpreted as backreferences.

Several special variables also refer back to portions of the previous match. $+ returns whatever the last bracket match matched. $& returns the entire matched string. (At one point $0 did also, but now it returns the name of the program.) $` returns everything before the matched string. $' returns everything after the matched string. And $^N contains whatever was matched by the most-recently closed group (submatch). $^N can be used in extended patterns (see below), for example to assign a submatch to a variable.

The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ( $+ , $& , $` , $' , and $^N ) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See perlsyn/``Compound Statements'' .)

NOTE : failed matches in Perl do not reset the match variables, which makes easier to write code that tests for a series of more specific cases and remembers the best match.

WARNING : Once Perl sees that you need one of $& , $` , or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression (?: ... ) instead.) But if you never use $& , $` or $' , then patterns without capturing parentheses will not be penalized. So avoid $& , $' , and $` if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, $& is not so costly as the other two.

Backslashed metacharacters in Perl are alphanumeric, such as \b , \w , \n . Unlike some other regular expression languages, there are no backslashed symbols that aren't alphanumeric. So anything that looks like \\, \(, \), \<, \>, \{, or \} is always interpreted as a literal character, not a metacharacter. This was once used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a string that you want to use for a pattern. Simply quote all non-``word'' characters:

(If use locale is set, then this depends on the current locale.) Today it is more common to use the quotemeta() function or the \Q metaquoting escape sequence to disable all metacharacters' special meanings like this:

Beware that if you put literal backslashes (those not inside interpolated variables) between \Q and \E , double-quotish backslash interpolation may lead to confusing results. If you need to use literal backslashes within \Q...\E , consult perlop/``Gory details of parsing quoted constructs'' .

Perl also defines a consistent extension syntax for features not found in standard tools like awk and lex . The syntax is a pair of parentheses with a question mark as the first thing within the parentheses. The character after the question mark indicates the extension.

The stability of these extensions varies widely. Some have been part of the core language for many years. Others are experimental and may change without warning or be completely removed. Check the documentation on an individual feature to verify its current status.

A question mark was chosen for this and for the minimal-matching construct because 1) question marks are rare in older regular expressions, and 2) whenever you see one, you should stop and ``question'' exactly what is going on. That's psychology...

These modifiers are restored at the end of the enclosing group. For example,

will match a repeated ( including the case !) word blah in any case, assuming x modifier, and no i modifier outside this group.

but doesn't spit out extra fields. It's also cheaper not to capture characters if you don't need to.

Any letters between ? and : act as flags modifiers as with (?imsx-imsx) . For example,

is equivalent to the more verbose

If you are looking for a ``bar'' that isn't preceded by a ``foo'', /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be ``foo''--and it's not, it's a ``bar'', so ``foobar'' will match. You would have to do something like /(?!foo)...bar/ for that. We say ``like'' because there's the case of your ``bar'' not having three characters before it. You could cover that this way: /(?:(?!foo)...|^.{0,2})bar/ . Sometimes it's still easier just to say:

For look-behind see below.

This zero-width assertion evaluates any embedded Perl code. It always succeeds, and its code is not interpolated. Currently, the rules to determine where the code ends are somewhat convoluted.

This feature can be used together with the special variable $^N to capture the results of submatches in variables without having to keep track of the number of nested parentheses. For example:

Inside the (?{...}) block, $_ refers to the string the regular expression is matching against. You can also use pos() to know what is the current position of matching withing this string.

The code is properly scoped in the following sense: If the assertion is backtracked (compare Backtracking ), all changes introduced after local ization are undone, so that

will set $res = 4 . Note that after the match, $cnt returns to the globally introduced value, because the scopes that restrict local operators are unwound.

This assertion may be used as a (?(condition)yes-pattern|no-pattern) switch. If not used in this way, the result of evaluation of code is put into the special variable $^R . This happens immediately, so $^R can be used from other (?{ code }) assertions inside the same regular expression.

The assignment to $^R above is properly localized, so the old value of $^R is restored if the assertion is backtracked; compare Backtracking .

For reasons of security, this construct is forbidden if the regular expression involves run-time interpolation of variables, unless the perilous use re 'eval' pragma has been used (see re ), or the variables contain results of qr// operator (see perlop/``qr/STRING/imosx'' ).

This restriction is because of the wide-spread and remarkably convenient custom of using run-time determined strings as patterns. For example:

Before Perl knew how to execute interpolated code within a pattern, this operation was completely safe from a security point of view, although it could raise an exception from an illegal pattern. If you turn on the use re 'eval' , though, it is no longer secure, so you should only do so if you are also using taint checking. Better yet, use the carefully constrained evaluation within a Safe compartment. See perlsec for details about both these mechanisms.

This is a ``postponed'' regular subexpression. The code is evaluated at run time, at the moment this subexpression may match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

The code is not interpolated. As before, the rules to determine where the code ends are currently somewhat convoluted.

The following pattern matches a parenthesized group:

An ``independent'' subexpression, one which matches the substring that a standalone pattern would match if anchored at the given position, and it matches nothing other than this substring . This construct is useful for optimizations of what would otherwise be ``eternal'' matches, because it will not backtrack (see Backtracking ). It may also be useful in places where the ``grab all you can, and do not give anything back'' semantic is desirable.

For example: ^(?>a*)ab will never match, since (?>a*) (anchored at the beginning of string, as above) will match all characters a at the beginning of string, leaving no a for ab to match. In contrast, a*ab will match the same as a+b , since the match of the subgroup a* is influenced by the following group ab (see Backtracking ). In particular, a* inside a*ab will match fewer characters than a standalone a* , since this makes the tail match.

An effect similar to (?>pattern) may be achieved by writing (?=(pattern))\1 . This matches the same substring as a standalone a+ , and the following \1 eats the matched string; it therefore makes a zero-length assertion into an analogue of (?>...) . (The difference between these two constructs is that the second one uses a capturing group, thus shifting ordinals of backreferences in the rest of a regular expression.)

Consider this pattern:

That will efficiently match a nonempty group with matching parentheses two levels deep or less. However, if there is no such group, it will take virtually forever on a long string. That's because there are so many different ways to split a long string into several substrings. This is what (.+)+ is doing, and (.+)+ is similar to a subpattern of the above pattern. Consider how the pattern above detects no-match on ((()aaaaaaaaaaaaaaaaaa in several seconds, but that each extra letter doubles this time. This exponential performance will make it appear that your program has hung. However, a tiny change to this pattern

which uses (?>...) matches exactly when the one above does (verifying this yourself would be a productive exercise), but finishes in a fourth the time when used on a similar string with 1000000 a s. Be aware, however, that this pattern currently triggers a warning message under the use warnings pragma or -w switch saying it "matches null string many times in regex" .

On simple groups, such as the pattern (?> [^()]+ ) , a comparable effect may be achieved by negative look-ahead, as in [^()]+ (?! [^()] ) . This was only 4 times slower on a string with 1000000 a s.

The ``grab all you can, and do not give anything back'' semantic is desirable in many situations where on the first sight a simple ()* looks like the correct solution. Suppose we parse text with comments being delimited by # followed by some optional (horizontal) whitespace. Contrary to its appearance, #[ \t]* is not the correct subexpression to match the comment delimiter, because it may ``give up'' some whitespace if the remainder of the pattern can be made to match that way. The correct answer is either one of these:

For example, to grab non-empty comments into $1, one should use either one of these:

Which one you pick depends on which of these expressions better reflects the above specification of comments.

Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or look-ahead/look-behind/evaluate zero-width assertion.

For example:

matches a chunk of non-parentheses, possibly included in parentheses themselves.

NOTE: This section presents an abstract approximation of regular expression behavior. For a more rigorous (and complicated) view of the rules involved in selecting a match among possible alternatives, see Combining pieces together .

A fundamental feature of regular expression matching involves the notion called backtracking , which is currently used (when needed) by all regular expression quantifiers, namely * , *? , + , +? , {n,m} , and {n,m}? . Backtracking is often optimized internally, but the general principle outlined here is valid.

For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part--that's why it's called backtracking.

Here is an example of backtracking: Let's say you want to find the word following ``foo'' in the string ``Food is on the foo table.'':

When the match runs, the first part of the regular expression ( \b(foo) ) finds a possible match right at the beginning of the string, and loads up $1 with ``Foo''. However, as soon as the matching engine sees that there's no whitespace following the ``Foo'' that it had saved in $1, it realizes its mistake and starts over again one character after where it had the tentative match. This time it goes all the way until the next occurrence of ``foo''. The complete regular expression matches this time, and you get the expected output of ``table follows foo.''

Sometimes minimal matching can help a lot. Imagine you'd like to match everything between ``foo'' and ``bar''. Initially, you write something like this:

Which perhaps unexpectedly yields:

That's because .* was greedy, so you get everything between the first ``foo'' and the last ``bar''. Here it's more effective to use minimal matching to make sure you get the text between a ``foo'' and the first ``bar'' thereafter.

Here's another example: let's say you'd like to match a number at the end of a string, and you also want to keep the preceding part of the match. So you write this:

That won't work at all, because .* was greedy and gobbled up the whole string. As \d* can match on an empty string the complete regular expression matched successfully.

Here are some variants, most of which don't work:

That will print out:

As you see, this can be a bit tricky. It's important to realize that a regular expression is merely a set of assertions that gives a definition of success. There may be 0, 1, or several different ways that the definition might succeed against a particular string. And if there are multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve.

When using look-ahead assertions and negations, this can all get even trickier. Imagine you'd like to find a sequence of non-digits not followed by ``123''. You might try to write that as

But that isn't going to match; at least, not the way you're hoping. It claims that there is no 123 in the string. Here's a clearer picture of why that pattern matches, contrary to popular expectations:

This prints

You might have expected test 3 to fail because it seems to a more general purpose version of test 1. The important difference between them is that test 3 contains a quantifier ( \D* ) and so can use backtracking, whereas test 1 will not. What's happening is that you've asked ``Is it true that at the start of $x, following 0 or more non-digits, you have something that's not 123?'' If the pattern matcher had let \D* expand to ``ABC'', this would have caused the whole pattern to fail.

The search engine will initially match \D* with ``ABC''. Then it will try to match (?!123 with ``123'', which fails. But because a quantifier ( \D* ) has been used in the regular expression, the search engine can backtrack and retry the match differently in the hope of matching the complete regular expression.

The pattern really, really wants to succeed, so it uses the standard pattern back-off-and-retry and lets \D* expand to just ``AB'' this time. Now there's indeed something following ``AB'' that is not ``123''. It's ``C123'', which suffices.

We can deal with this by using both an assertion and a negation. We'll say that the first part in $1 must be followed both by a digit and by something that's not ``123''. Remember that the look-aheads are zero-width expressions--they only look, but don't consume any of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds:

In other words, the two zero-width assertions next to each other work as though they're ANDed together, just as you'd use any built-in assertions: /^$/ matches only if you're at the beginning of the line AND the end of the line simultaneously. The deeper underlying truth is that juxtaposition in regular expressions always means AND, except when you write an explicit OR using the vertical bar. /ab/ means match ``a'' AND (then) match ``b'', although the attempted matches are made at different positions because ``a'' is not a zero-width assertion, but a one-width assertion.

WARNING : particularly complicated regular expressions can take exponential time to solve because of the immense number of possible ways they can use backtracking to try match. For example, without internal optimizations done by the regular expression engine, this will take a painfully long time to run:

And if you used * 's in the internal groups instead of limiting them to 0 through 5 matches, then it would take forever--or until you ran out of stack space. Moreover, these internal optimizations are not always applicable. For example, if you put {0,5} instead of * on the external group, no current optimization is applicable, and the match takes a long time to finish.

A powerful tool for optimizing such beasts is what is known as an ``independent group'', which does not backtrack (see < (?>pattern) > ). Note also that zero-length look-ahead/look-behind assertions will not backtrack to make the tail match, since they are in ``logical'' context: only whether they match is considered relevant. For an example where side-effects of look-ahead might have influenced the following match, see < (?>pattern) > .

In case you're not familiar with the ``regular'' Version 8 regex routines, here are the pattern-matching rules not described above.

Any single character matches itself, unless it is a metacharacter with a special meaning described here or above. You can cause characters that normally function as metacharacters to be interpreted literally by prefixing them with a ``\'' (e.g., ``\.'' matches a ``.'', not any character; ``\\'' matches a ``\''). A series of characters matches that series of characters in the target string, so the pattern blurfl would match ``blurfl'' in the target string.

You can specify a character class, by enclosing a list of characters in [] , which will match any one character from the list. If the first character after the ``['' is ``^'', the class matches any character not in the list. Within a list, the ``-'' character specifies a range, so that a-z represents all characters between ``a'' and ``z'', inclusive. If you want either ``-'' or ``]'' itself to be a member of a class, put it at the start of the list (possibly after a ``^''), or escape it with a backslash. ``-'' is also taken literally when it is at the end of the list, just before the closing ``]''. (The following all specify the same class of three characters: [-az] , [az-] , and [a\-z] . All are different from [a-z] , which specifies a class containing twenty-six characters, even on EBCDIC based coded character sets.) Also, if you try to use the character classes \w , \W , \s , \S , \d , or \D as endpoints of a range, that's not a range, the ``-'' is understood literally.

Note also that the whole range idea is rather unportable between character sets--and even within character sets they may cause results you probably didn't expect. A sound principle is to use only ranges that begin from and end at either alphabets of equal case ([a-e], [A-E]), or digits ([0-9]). Anything else is unsafe. If in doubt, spell out the character sets in full.

Characters may be specified using a metacharacter syntax much like that used in C: ``\n'' matches a newline, ``\t'' a tab, ``\r'' a carriage return, ``\f'' a form feed, etc. More generally, \ nnn , where nnn is a string of octal digits, matches the character whose coded character set value is nnn . Similarly, \x nn , where nn are hexadecimal digits, matches the character whose numeric value is nn . The expression \c x matches the character control- x . Finally, the ``.'' metacharacter matches any character except ``\n'' (unless you use /s ).

You can specify a series of alternatives for a pattern using ``|'' to separate them, so that fee|fie|foe will match any of ``fee'', ``fie'', or ``foe'' in the target string (as would f(e|i|o)e ). The first alternative includes everything from the last pattern delimiter (``('', ``['', or the beginning of the pattern) up to the first ``|'', and the last alternative contains everything from the last ``|'' to the next pattern delimiter. That's why it's common practice to include alternatives in parentheses: to minimize confusion about where they start and end.

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against ``barefoot'', only the ``foo'' part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.)

Also remember that ``|'' is interpreted as a literal within square brackets, so if you write [fee|fie|foe] you're really only matching [feio|] .

Within a pattern, you may designate subpatterns for later reference by enclosing them in parentheses, and you may refer back to the n th subpattern later in the pattern using the metacharacter \ n . Subpatterns are numbered based on the left to right order of their opening parenthesis. A backreference matches whatever actually matched the subpattern in the string being examined, not the rules for that subpattern. Therefore, (0|0x)\d*\s\1\d* will match ``0x1234 0x4321'', but not ``0x1234 01234'', because subpattern 1 matched ``0x'', even though the rule 0|0x could potentially match the leading 0 in the second number.

Some people get too used to writing things like:

This is grandfathered for the RHS of a substitute to avoid shocking the sed addicts, but it's a dirty habit to get into. That's because in PerlThink, the righthand side of an s/// is a double-quoted string. \1 in the usual double-quoted string means a control-A. The customary Unix meaning of \1 is kludged in for s/// . However, if you get into the habit of doing that, you get yourself into trouble if you then add an /e modifier.

Or if you try to do

You can't disambiguate that by saying \{1}000 , whereas you can fix it with ${1}000 . The operation of interpolation should not be confused with the operation of matching a backreference. Certainly they mean two different things on the left side of the s/// .

WARNING : Difficult material (and prose) ahead. This section needs a rewrite.

Regular expressions provide a terse and powerful programming language. As with most other power tools, power comes together with the ability to wreak havoc.

A common abuse of this power stems from the ability to make infinite loops using regular expressions, with something as innocuous as:

The o? can match at the beginning of 'foo' , and since the position in the string is not moved by the match, o? would match again and again because of the * modifier. Another common way to create a similar cycle is with the looping modifier //g :

or the loop implied by split().

However, long experience has shown that many programming tasks may be significantly simplified by using repeated subexpressions that may match zero-length substrings. Here's a simple example being:

Thus Perl allows such constructs, by forcefully breaking the infinite loop . The rules for this are different for lower-level loops given by the greedy modifiers *+{} , and for higher-level ones like the /g modifier or split() operator.

The lower-level loops are interrupted (that is, the loop is broken) when Perl detects that a repeated expression matched a zero-length substring. Thus

is made equivalent to

The higher level-loops preserve an additional state between iterations: whether the last match was zero-length. To break the loop, the following match after a zero-length match is prohibited to have a length of zero. This prohibition interacts with backtracking (see Backtracking ), and so the second best match is chosen if the best match is of zero length.

results in <><b><><a><><r><> . At each position of the string the best match given by non-greedy ?? is the zero-length match, and the second best match is what is matched by \w . Thus zero-length matches alternate with one-character-long matches.

Similarly, for repeated m/()/g the second-best match is the match at the position one notch further in the string.

The additional state of being matched with zero-length is associated with the matched string, and is reset by each assignment to pos(). Zero-length matches at the end of the previous match are ignored during split .

Each of the elementary pieces of regular expressions which were described before (such as ab or \Z ) could match at most one substring at the given position of the input string. However, in a typical regular expression these elementary pieces are combined into more complicated patterns using combining operators ST , S|T , S* etc (in these examples S and T are regular subexpressions).

Such combinations can include alternatives, leading to a problem of choice: if we match a regular expression a|ab against "abc" , will it match substring "a" or "ab" ? One way to describe which substring is actually matched is the concept of backtracking (see Backtracking ). However, this description is too low-level and makes you think in terms of a particular implementation.

Another description starts with notions of ``better''/``worse''. All the substrings which may be matched by the given regular expression can be sorted from the ``best'' match to the ``worst'' match, and it is the ``best'' match which is chosen. This substitutes the question of ``what is chosen?'' by the question of ``which matches are better, and which are worse?''.

Again, for elementary pieces there is no such question, since at most one match at a given position is possible. This section describes the notion of better/worse for combining operators. In the description below S and T are regular subexpressions.

If A is better match for S than A' , AB is a better match than A'B' .

If A and A' coincide: AB is a better match than AB' if B is better match for T than B' .

Ordering of two matches for S is the same as for S . Similar for two matches for T .

The above recipes describe the ordering of matches at a given position . One more rule is needed to understand how a match is determined for the whole regular expression: a match at an earlier position is always better than a match at a later position.

Overloaded constants (see overload ) provide a simple way to extend the functionality of the RE engine.

Suppose that we want to enable a new RE escape-sequence \Y| which matches at boundary between white-space characters and non-whitespace characters. Note that (?=\S)(?<!\S)|(?!\S)(?<=\S) matches exactly at these positions, so we want to have each \Y| in the place of the more complicated version. We can create a module customre to do this:

Now use customre enables the new escape in constant regular expressions, i.e., those without any runtime variable interpolations. As documented in overload , this conversion will work only over literal parts of regular expressions. For \Y|$re\Y| the variable part of this regular expression needs to be converted explicitly (but only if the special meaning of \Y| should be enabled inside $re):

This document varies from difficult to understand to completely and utterly opaque. The wandering prose riddled with jargon is hard to fathom in several places.

This document needs a rewrite that separates the tutorial content from the reference content.

perlrequick .

perlretut .

perlop/``Regexp Quote-Like Operators'' .

perlop/``Gory details of parsing quoted constructs'' .

perlfunc/pos .

perllocale .

perlebcdic .

Mastering Regular Expressions by Jeffrey Friedl, published by O'Reilly and Associates.

  • DESCRIPTION
  • Simple word matching
  • Using character classes
  • Matching this or that
  • Grouping things and hierarchical matching
  • Extracting matches
  • Matching repetitions
  • More matching
  • Search and replace
  • The split operator
  • use re 'strict'
  • Acknowledgments

perlrequick - Perl regular expressions quick start

# DESCRIPTION

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

# The Guide

This page assumes you already know things, like what a "pattern" is, and the basic syntax of using them. If you don't, see perlretut .

# Simple word matching

The simplest regex is simply a word, or more generally, a string of characters. A regex consisting of a word matches any string that contains that word:

In this statement, World is a regex and the // enclosing /World/ tells Perl to search a string for a match. The operator =~ associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match. In our case, World matches the second word in "Hello World" , so the expression is true. This idea has several variations.

Expressions like this are useful in conditionals:

The sense of the match can be reversed by using !~ operator:

The literal string in the regex can be replaced by a variable:

If you're matching against $_ , the $_ =~ part can be omitted:

Finally, the // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:

Regexes must match a part of the string exactly in order for the statement to be true:

Perl will always match at the earliest possible point in the string:

Not all characters can be used 'as is' in a match. Some characters, called metacharacters , are considered special, and reserved for use in regex notation. The metacharacters are

A metacharacter can be matched literally by putting a backslash before it:

In the last regex, the forward slash '/' is also backslashed, because it is used to delimit the regex.

Most of the metacharacters aren't always special, and other characters (such as the ones delimiting the pattern) become special under various circumstances. This can be confusing and lead to unexpected results. use re 'strict' can notify you of potential pitfalls.

Non-printable ASCII characters are represented by escape sequences . Common examples are \t for a tab, \n for a newline, and \r for a carriage return. Arbitrary bytes are represented by octal escape sequences, e.g., \033 , or hexadecimal escape sequences, e.g., \x1B :

Regexes are treated mostly as double-quoted strings, so variable substitution works:

With all of the regexes above, if the regex matched anywhere in the string, it was considered a match. To specify where it should match, we would use the anchor metacharacters ^ and $ . The anchor ^ means match at the beginning of the string and the anchor $ means match at the end of the string, or before a newline at the end of the string. Some examples:

# Using character classes

A character class allows a set of possible characters, rather than just a single character, to match at a particular point in a regex. There are a number of different types of character classes, but usually when people use this term, they are referring to the type described in this section, which are technically called "Bracketed character classes", because they are denoted by brackets [...] , with the set of characters to be possibly matched inside. But we'll drop the "bracketed" below to correspond with common usage. Here are some examples of (bracketed) character classes:

In the last statement, even though 'c' is the first character in the class, the earliest point at which the regex can match is 'a' .

The last example shows a match with an 'i' modifier , which makes the match case-insensitive.

Character classes also have ordinary and special characters, but the sets of ordinary and special characters inside a character class are different than those outside a character class. The special characters for a character class are -]\^$ and are matched using an escape:

The special character '-' acts as a range operator within character classes, so that the unwieldy [0123456789] and [abc...xyz] become the svelte [0-9] and [a-z] :

If '-' is the first or last character in a character class, it is treated as an ordinary character.

The special character ^ in the first position of a character class denotes a negated character class , which matches any character but those in the brackets. Both [...] and [^...] must match a character, or the match fails. Then

Perl has several abbreviations for common character classes. (These definitions are those that Perl uses in ASCII-safe mode with the /a modifier. Otherwise they could match many more non-ASCII Unicode characters as well. See "Backslash sequences" in perlrecharclass for details.)

\d is a digit and represents

\s is a whitespace character and represents

\w is a word character (alphanumeric or _) and represents

\D is a negated \d; it represents any character but a digit

\S is a negated \s; it represents any non-whitespace character

\W is a negated \w; it represents any non-word character

The period '.' matches any character but "\n"

The \d\s\w\D\S\W abbreviations can be used both inside and outside of character classes. Here are some in use:

The word anchor \b matches a boundary between a word character and a non-word character \w\W or \W\w :

In the last example, the end of the string is considered a word boundary.

For natural language processing (so that, for example, apostrophes are included in words), use instead \b{wb}

# Matching this or that

We can match different character strings with the alternation metacharacter '|' . To match dog or cat , we form the regex dog|cat . As before, Perl will try to match the regex at the earliest possible point in the string. At each character position, Perl will first try to match the first alternative, dog . If dog doesn't match, Perl will then try the next alternative, cat . If cat doesn't match either, then the match fails and Perl moves to the next position in the string. Some examples:

Even though dog is the first alternative in the second regex, cat is able to match earlier in the string.

At a given character position, the first alternative that allows the regex match to succeed will be the one that matches. Here, all the alternatives match at the first string position, so the first matches.

# Grouping things and hierarchical matching

The grouping metacharacters () allow a part of a regex to be treated as a single unit. Parts of a regex are grouped by enclosing them in parentheses. The regex house(cat|keeper) means match house followed by either cat or keeper . Some more examples are

# Extracting matches

The grouping metacharacters () also allow the extraction of the parts of a string that matched. For each grouping, the part that matched inside goes into the special variables $1 , $2 , etc. They can be used just as ordinary variables:

In list context, a match /regex/ with groupings will return the list of matched values ($1,$2,...) . So we could rewrite it as

If the groupings in a regex are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis, etc. For example, here is a complex regex and the matching variables indicated below it:

Associated with the matching variables $1 , $2 , ... are the backreferences \g1 , \g2 , ... Backreferences are matching variables that can be used inside a regex:

$1 , $2 , ... should only be used outside of a regex, and \g1 , \g2 , ... only inside a regex.

# Matching repetitions

The quantifier metacharacters ? , * , + , and {} allow us to determine the number of repeats of a portion of a regex we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:

a? = match 'a' 1 or 0 times

a* = match 'a' 0 or more times, i.e., any number of times

a+ = match 'a' 1 or more times, i.e., at least once

a{n,m} = match at least n times, but not more than m times.

a{n,} = match at least n or more times

a{,n} = match n times or fewer (Added in v5.34)

a{n} = match exactly n times

Here are some examples:

These quantifiers will try to match as much of the string as possible, while still allowing the regex to match. So we have

The first quantifier .* grabs as much of the string as possible while still having the regex match. The second quantifier .* has no string left to it, so it matches 0 times.

# More matching

There are a few more things you might want to know about matching operators. The global modifier /g allows the matching operator to match within a string as many times as possible. In scalar context, successive matches against a string will have /g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function. For example,

A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the /c , as in /regex/gc .

In list context, /g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regex. So

# Search and replace

Search and replace is performed using s/regex/replacement/modifiers . The replacement is a Perl double-quoted string that replaces in the string whatever is matched with the regex . The operator =~ is also used here to associate a string with s/// . If matching against $_ , the $_ =~ can be dropped. If there is a match, s/// returns the number of substitutions made; otherwise it returns false. Here are a few examples:

With the s/// operator, the matched variables $1 , $2 , etc. are immediately available for use in the replacement expression. With the global modifier, s///g will search and replace all occurrences of the regex in the string:

The non-destructive modifier s///r causes the result of the substitution to be returned instead of modifying $_ (or whatever variable the substitute was bound to with =~ ):

The evaluation modifier s///e wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring. Some examples:

The last example shows that s/// can use other delimiters, such as s!!! and s{}{} , and even s{}// . If single quotes are used s''' , then the regex and replacement are treated as single-quoted strings.

# The split operator

split /regex/, string splits string into a list of substrings and returns that list. The regex determines the character sequence that string is split with respect to. For example, to split a string into words, use

To extract a comma-delimited list of numbers, use

If the empty regex // is used, the string is split into individual characters. If the regex has groupings, then the list produced contains the matched substrings from the groupings as well:

Since the first character of $x matched the regex, split prepended an empty initial element to the list.

# use re 'strict'

New in v5.22, this applies stricter rules than otherwise when compiling regular expression patterns. It can find things that, while legal, may not be what you intended.

See 'strict' in re .

This is just a quick start guide. For a more in-depth tutorial on regexes, see perlretut and for the reference page, see perlre .

# AUTHOR AND COPYRIGHT

Copyright (c) 2000 Mark Kvale All rights reserved.

This document may be distributed under the same terms as Perl itself.

# Acknowledgments

The author would like to thank Mark-Jason Dominus, Tom Christiansen, Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful comments.

Perldoc Browser is maintained by Dan Book ( DBOOK ). Please contact him via the GitHub issue tracker or email regarding any issues with the site itself, search, or rendering of documentation.

The Perl documentation is maintained by the Perl 5 Porters in the development of Perl. Please contact them via the Perl issue tracker , the mailing list , or IRC to report any issues with the contents or format of the documentation.

Perl Tutorial

  • Perl Basics
  • Perl - Home
  • Perl - Introduction
  • Perl - Environment
  • Perl - Syntax Overview
  • Perl - Data Types
  • Perl - Variables
  • Perl - Scalars
  • Perl - Arrays
  • Perl - Hashes
  • Perl - IF...ELSE
  • Perl - Loops
  • Perl - Operators
  • Perl - Date & Time
  • Perl - Subroutines
  • Perl - References
  • Perl - Formats
  • Perl - File I/O
  • Perl - Directories
  • Perl - Error Handling
  • Perl - Special Variables
  • Perl - Coding Standard
  • Perl - Regular Expressions
  • Perl - Sending Email
  • Perl Advanced
  • Perl - Socket Programming
  • Perl - Object Oriented
  • Perl - Database Access
  • Perl - CGI Programming
  • Perl - Packages & Modules
  • Perl - Process Management
  • Perl - Embedded Documentation
  • Perl - Functions References
  • Perl Useful Resources
  • Perl - Questions and Answers
  • Perl - Quick Guide
  • Perl - Useful Resources
  • Perl - Discussion
  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary

PERL Regular Expressions

A regular expression is a string of characters that define the pattern or patterns you are viewing. The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as , , and .

The basic method for applying a regular expression is to use the pattern binding operators =~ and !~. The first operator is a test and assignment operator.

There are three regular expression operators within Perl

The forward slashes in each case act as delimiters for the regular expression (regex) that you are specifying. If you are comfortable with any other delimiter then you can use in place of forward slash.

The match operator, m//, is used to match a string or statement to a regular expression. For example, to match the character sequence "foo" against the scalar $bar, you might use a statement like this:

The m// actually works in the same fashion as the q// operator series.you can use any combination of naturally matching characters to act as delimiters for the expression. For example, m{}, m(), and m>< are all valid.

You can omit the m from m// if the delimiters are forward slashes, but for all other delimiters you must use the m prefix.

Note that the entire match expression.that is the expression on the left of =~ or !~ and the match operator, returns true (in a scalar context) if the expression matches. Therefore the statement:

Will set $true to 1 if $foo matches the regex, or 0 if the match fails.

In a list context, the match returns the contents of any grouped expressions. For example, when extracting the hours, minutes, and seconds from a time string, we can use:

Match Operator Modifiers

The match operator supports its own set of modifiers. The /g modifier allows for global matching. The /i modifier will make the match case insensitive. Here is the complete list of modifiers

i Makes the match case insensitive m Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary o Evaluates the expression only once s Allows use of . to match a newline character x Allows you to use white space in the expression for clarity g Globally finds all matches cg Allows the search to continue even after a global match fails

Matching Only Once

There is also a simpler version of the match operator - the ?PATTERN? operator. This is basically identical to the m// operator except that it only matches once within the string you are searching between each call to reset.

For example, you can use this to get the first and last elements within a list:

The Substitution Operator

The substitution operator, s///, is really just an extension of the match operator that allows you to replace the text matched with some new text. The basic form of the operator is:

The PATTERN is the regular expression for the text that we are looking for. The REPLACEMENT is a specification for the text or regular expression that we want to use to replace the found text with.

For example, we can replace all occurrences of .dog. with .cat. using

Another example:

Substitution Operator Modifiers

Here is the list of all modifiers used with substitution operator

i Makes the match case insensitive m Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary o Evaluates the expression only once s Allows use of . to match a newline character x Allows you to use white space in the expression for clarity g Replaces all occurrences of the found expression with the replacement text e Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text

Translation

Translation is similar, but not identical, to the principles of substitution, but unlike substitution, translation (or transliteration) does not use regular expressions for its search on replacement values. The translation operators are:

The translation replaces all occurrences of the characters in SEARCHLIST with the corresponding characters in REPLACEMENTLIST. For example, using the "The cat sat on the mat." string we have been using in this chapter:

Translation Operator Modifiers

Following is the list of operators related to translation

The /d modifier deletes the characters matching SEARCHLIST that do not have a corresponding entry in REPLACEMENTLIST. For example:

The last modifier, /s, removes the duplicate sequences of characters that were replaced, so:

More complex regular expressions

You don't just have to match on fixed strings. In fact, you can match on just about anything you could dream of by using more complex regular expressions. Here's a quick cheat sheet:

. a single character \s a whitespace character (space, tab, newline) \S non-whitespace character \d a digit (0-9) \D a non-digit \w a word character (a-z, A-Z, 0-9, _) \W a non-word character [aeiou] matches a single character in the given set [^aeiou] matches a single character outside the given set (foo|bar|baz) matches any of the alternatives specified

Quantifiers can be used to specify how many of the previous thing you want to match on, where "thing" means either a literal character, one of the metacharacters listed above, or a group of characters or metacharacters in parentheses.

* zero or more of the previous thing + one or more of the previous thing ? zero or one of the previous thing {3} matches exactly 3 of the previous thing {3,6} matches between 3 and 6 of the previous thing {3,} matches 3 or more of the previous thing

The ^ metacharacter matches the beginning of the string and the $ metasymbol matches the end of the string.

Here are some brief examples

Lets have alook at another example

Matching Boundaries

The \b matches at any word boundary, as defined by the difference between the \w class and the \W class. Because \w includes the characters for a word, and \W the opposite, this normally means the termination of a word. The \B assertion matches any position that is not a word boundary. For example:

Selecting Alternatives

The | character is just like the standard or bitwise OR within Perl. It specifies alternate matches within a regular expression or group. For example, to match "cat" or "dog" in an expression, you might use this:

You can group individual elements of an expression together in order to support complex matches. Searching for two people.s names could be achieved with two separate tests, like this:

Grouping Matching

From a regular-expression point of view, there is no difference between except, perhaps, that the former is slightly clearer.

However, the benefit of grouping is that it allows us to extract a sequence from a regular expression. Groupings are returned as a list in the order in which they appear in the original. For example, in the following fragment we have pulled out the hours, minutes, and seconds from a string.

As well as this direct method, matched groups are also available within the special $x variables, where x is the number of the group within the regular expression. We could therefore rewrite the preceding example as follows:

When groups are used in substitution expressions, the $x syntax can be used in the replacement text. Thus, we could reformat a date string using this:

Using the \G Assertion

The \G assertion allows you to continue searching from the point where the last match occurred.

For example, in the following code we have used \G so that we can search to the correct position and then extract some information, without having to create a more complex, single regular expression:

The \G assertion is actually just the metasymbol equivalent of the pos function, so between regular expression calls you can continue to use pos, and even modify the value of pos (and therefore \G) by using pos as an lvalue subroutine:

Regular Expression Variables

Regular expression variables include $ , which contains whatever the last grouping match matched; $& , which contains the entire matched string; $` , which contains everything before the matched string; and $' , which contains everything after the matched string.

The following code demonstrates the result:

  



perl regular expression assignment

Regular Expressions

Regular expressions (regexp) are what makes Perl an ideal language for "practical extraction and reporting" as its acronym implies.

A regular expression is a string of characters that defines a text pattern or patterns. A regexp can be used in a number of ways:

  • Searching for a string that matches a specified pattern and optionally replacing the pattern found with some other strings.
  • Counting the number of occurences of a pattern in a string.
  • Splitting a formatted string (e.g. a date like 01/06/2014 ) into components (e.g. into day, month and year).
  • Validating fields from a submitted HTML form by verifying if the data conforms to a particular format.

Matching a string pattern

Matching a string pattern is done by the m// operator and the =~ binding operator. The expression $string =~ m/$regexp/ returns true if the scalar $string matches the pattern defined by the value of the scalar $regexp .

The match operator supports its own set of optional modifiers, written after the m// operator. The modifiers are letters which indicate variations on the regexp processing. For example:

$string =~ m/$regexp/i

will make the match case insensitive.

You can use any combination of naturally matching characters to act as delimiters for the expression. For example, m{} , m() , m|| are all valid.

Metacharacters

Metacharacters serve specific purposes in a regular expression. If any of these metacharacters are to be embedded in the regular expression literally, you should quote them by prefixing it with a backslash (), similar to the idea of escaping in double-quoted string.

  • \ Quote next character
  • . Match any character except newline
  • ^ Match beginning of line
  • $ Match end of line
  • | separate between several possible patterns
  • [] Character class
  • () Grouping and save subpattern (backtracking)

For example:

  • m/google.com/ matches google.com and also googlexcom
  • m/google\.com/ matches google.com but not googlexcom
  • m/^google/ matches "google me" but not "please google me"
  • m/google$/ matches "let's google" but not "let's google now"
  • m/^google$/ matches only "google"
  • m/google|bing/ matches any string containing google or bing
  • m/bob[ar6]/ matches any string containg boba or bobr or bob6
  • m/bob[0-4]/ matches any string containg bob0 or bob1 or bob2 or bob3 or bob4
  • m/bob[b-e]/ matches any string containg bobbcor bobc or bobd or bobe

Replacing a matched string

Replacing a matched string with some other string is done by the substitute operator s/// . The basic form of the operator is s/REGEXP/REPLACEMENT/MODIFIER; . The REGEXP is the regular expression for the string that we are looking for. The REPLACEMENT is a specification for the text or regular expression that we want to use to replace the found text with. The MODIFIER is the optional substitute operator modifier letter.

Here is the list of some modifiers used with substitution operator:

  • i Makes the match case insensitive
  • o Evaluates the expression only once
  • g Replaces all occurrences of the found expression with the replacement text
  • e Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text

Backtracking

Parenthesised patterns have a useful property. When pattern matching is successful, the matching substrings corresponding to the parenthesised parts are saved, which allow you to use them in further operations. The matched value of the first parenthesised pattern is refered to as $1 , the second as $2 , and so on. For example:

More complex regular expressions

More complex reguar expressions allow matching to more than just fixed strings. Here's a list of patterns:

  • . Matches any single character except newline. Using the m modifier allows it to match newline as well.
  • [...] Matches any single character within the brackets.
  • [^...] Matches any single character not within brackets
  • * Matches 0 or more occurrences of preceding expression.
  • + Matches 1 or more occurrence of preceding expression.
  • ? Matches 0 or 1 occurrence of preceding expression.
  • {n} Matches exactly n number of occurrences of preceding expression.
  • {n,} Matches n or more occurrences of preceding expression.
  • {n, m} Matches at least n and at most m occurrences of preceding expression.
  • a|b Matches either a or b.
  • \w Matches word characters.
  • \W Matches nonword characters.
  • \s Matches whitespace. Equivalent to [\t\n\r\f] .
  • \S Matches nonwhitespace.
  • \d Matches digits. Equivalent to [0-9] .
  • \D Matches nondigits.
  • \A Matches beginning of string.
  • \Z Matches end of string. If a newline exists, it matches just before newline.
  • \z Matches end of string.
  • \G Matches point where last match finished.
  • \b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
  • \B Matches nonword boundaries.
  • \n, \t , etc. Matches newlines, carriage returns, tabs, etc.
  • \1...\9 Matches nth grouped subexpression.
  • \10 Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.

You are given a scalar value $my_text . Assign the value of a regular expression to scalar $match_my_text to be used to match the string "express".

perl regular expression assignment

Coding for Kids is an online interactive tutorial that teaches your kids how to code while playing!

Receive a 50% discount code by using the promo code:

Start now and play the first chapter for free, without signing up.

perl regular expression assignment

Beginner Perl Maven tutorial

  • Installing and getting started with Perl
  • The Hash-bang line, or how to make a Perl scripts executable on Linux
  • Perl Editor
  • How to get Help for Perl?
  • Perl on the command line
  • Core Perl documentation and CPAN module documentation
  • POD - Plain Old Documentation
  • Debugging Perl scripts
  • Common Warnings and Error messages in Perl
  • Prompt, read from STDIN, read from the keyboard in Perl
  • Automatic string to number conversion or casting in Perl
  • Conditional statements, using if, else, elsif in Perl
  • Boolean values in Perl
  • Numerical operators
  • String operators: concatenation (.), repetition (x)
  • undef, the initial value and the defined function of Perl
  • Strings in Perl: quoted, interpolated and escaped
  • Here documents, or how to create multi-line strings in Perl
  • Scalar variables
  • Comparing scalars in Perl
  • String functions: length, lc, uc, index, substr
  • Number Guessing game
  • Scope of variables in Perl
  • Short-circuit in boolean expressions
  • How to exit from a Perl script?
  • Standard output, standard error and command line redirection
  • Warning when something goes wrong
  • What does die do?
  • Writing to files with Perl
  • Appending to files
  • Open and read from text files
  • Don't Open Files in the old way
  • Reading and writing binary files in Perl
  • EOF - End of file in Perl
  • tell how far have we read a file
  • seek - move the position in the filehandle in Perl
  • slurp mode - reading a file in one step
  • Perl for loop explained with examples
  • Perl Arrays
  • Processing command line arguments - @ARGV in Perl
  • How to process command line arguments in Perl using Getopt::Long
  • Advanced usage of Getopt::Long for accepting command line arguments
  • Perl split - to cut up a string into pieces
  • How to read a CSV file using Perl?
  • The year of 19100
  • Scalar and List context in Perl, the size of an array
  • Reading from a file in scalar and list context
  • STDIN in scalar and list context
  • Sorting arrays in Perl
  • Sorting mixed strings
  • Unique values in an array in Perl
  • Manipulating Perl arrays: shift, unshift, push, pop
  • Reverse Polish Calculator in Perl using a stack
  • Using a queue in Perl
  • Reverse an array, a string or a number
  • The ternary operator in Perl
  • Loop controls: next, last, continue, break
  • min, max, sum in Perl using List::Util
  • qw - quote word
  • Subroutines and functions in Perl
  • Passing multiple parameters to a function in Perl
  • Variable number of parameters in Perl subroutines
  • Returning multiple values or a list from a subroutine in Perl
  • Understanding recursive subroutines - traversing a directory tree
  • Hashes in Perl
  • Creating a hash from an array in Perl
  • Perl hash in scalar and list context
  • exists - check if a key exists in a hash
  • delete an element from a hash
  • How to sort a hash in Perl?
  • Count the frequency of words in text using Perl
  • Introduction to Regexes in Perl 5
  • Regex character classes
  • Regex: special character classes
  • Perl 5 Regex Quantifiers
  • trim - removing leading and trailing white spaces with Perl

Perl 5 Regex Cheat sheet

  • What are -e, -z, -s, -M, -A, -C, -r, -w, -x, -o, -f, -d , -l in Perl?
  • Current working directory in Perl (cwd, pwd)
  • Running external programs from Perl with system
  • qx or backticks - running external command and capturing the output
  • How to remove, copy or rename a file with Perl
  • Reading the content of a directory
  • Traversing the filesystem - using a queue
  • Download and install Perl
  • Installing a Perl Module from CPAN on Windows, Linux and Mac OSX
  • How to change @INC to find Perl modules in non-standard locations
  • How to add a relative directory to @INC
  • How to replace a string in a file with Perl
  • How to read an Excel file in Perl
  • How to create an Excel file with Perl?
  • Sending HTML e-mail using Email::Stuffer
  • Perl/CGI script with Apache2
  • JSON in Perl
  • Simple Database access using Perl DBI and SQL
  • Reading from LDAP in Perl using Net::LDAP
  • Global symbol requires explicit package name
  • Variable declaration in Perl
  • Use of uninitialized value
  • Barewords in Perl
  • Name "main::x" used only once: possible typo at ...
  • Unknown warnings category
  • Can't use string (...) as an HASH ref while "strict refs" in use at ...
  • Symbolic references in Perl
  • Can't locate ... in @INC
  • Scalar found where operator expected
  • "my" variable masks earlier declaration in same scope
  • Can't call method ... on unblessed reference
  • Argument ... isn't numeric in numeric ...
  • Can't locate object method "..." via package "1" (perhaps you forgot to load "1"?)
  • Useless use of hash element in void context
  • Useless use of private variable in void context
  • readline() on closed filehandle in Perl
  • Possible precedence issue with control flow operator
  • Scalar value ... better written as ...
  • substr outside of string at ...
  • Have exceeded the maximum number of attempts (1000) to open temp file/dir
  • Use of implicit split to @_ is deprecated ...
  • Multi dimensional arrays in Perl
  • Multi dimensional hashes in Perl
  • Minimal requirement to build a sane CPAN package
  • Statement modifiers: reversed if statements
  • What is autovivification?
  • Formatted printing in Perl using printf and sprintf

Character Classes

Quantifiers, "quantifier-modifier" aka. minimal matching, grouping and capturing, extended (#text) embedded comment (adlupimsx-imsx) one or more embedded pattern-match modifiers, to be turned on or off. (:pattern) non-capturing group. (|pattern) branch test. (=pattern) a zero-width positive look-ahead assertion. (pattern) a zero-width negative look-ahead assertion. (<=pattern) a zero-width positive look-behind assertion. (<pattern) a zero-width negative look-behind assertion. ('name'pattern) (<name>pattern) a named capture group. \k<name> \k'name' named backreference. ({ code }) zero-width assertion with code execution. ({ code }) a "postponed" regular subexpression with code execution. other regex related articles.

  • Parsing dates using regular expressions
  • Check several regexes on many strings
  • Matching numbers using Perl regex
  • Understanding Regular Expressions found in Getopt::Std
  • Email validation using Regular Expression in Perl

Official documentation

Gabor Szabo

Published on 2015-08-19

Author: Gabor Szabo

perl regular expression assignment

  • Understanding dates using regular expressions

Saturday 12 March 2016

  • Perl Regular expression - Perl RegEx with examples

   A regular expression or RegEx is a string of characters that define the pattern or patterns you are viewing. The syntax of regular expressions in Perl is very similar to what you will find within other regular expression, supporting programs, such as  sed ,  grep , and  awk . 

perl regular expression assignment

  • PERL - Tutorial Part 1 - ElecDude
  • PERL TUTORIAL PART 2 
  • PERL TUTORIAL PART 3 – ELECDUDE
  • PERL TUTORIAL PART 4 - Working with files

THE MATCH OPERATOR

 to match a newline character
  or   
  or   
 N,M 
 <thingy>      
<thingy>    
 set_of_things 
set_of_things 
 some_expression 

THE SUBSTITUTION OPERATOR

Translation, 5 comments:.

this article helps in many ways.Thankyou so much. javascript training in chennai javascript training in OMR core java training in chennai core java Training in Velachery C++ Training in Chennai C C++ Training in Tambaram core java training in chennai core java Training in Adyar

This is very interesting and I like this type of article only. I have always read important article like this. it contain word is simple to understand everyone. C and C++ Training Institute in chennai | C and C++ Training Institute in anna nagar | C and C++ Training Institute in omr | C and C++ Training Institute in porur | C and C++ Training Institute in tambaram | C and C++ Training Institute in velachery

Quick Heal Total Security 2022 License Key is an antivirus created by Quick Heal Technologies. It is a lightweight cloud-based protection software. Quick Heal Antivirus Pro Product Key

Design new sites visually with the popular Site Designer app or edit the code for existing projects manually Coffee Web Form Builder

the Muslim assembly of nation spend it pray fervidly, giving liberally, memorizing the sacred writing, and having to pay attention to Hadiths. Jumma Mubarak 2022

Search Here...

Subscription

Enter your email address:

Delivered by FeedBurner

Stay tuned to new posts...

Recent Posts

Blog archive.

  • ►  November (1)
  • ►  October (2)
  • ►  June (1)
  • ►  April (1)
  • Perl string manipulation with examples
  • Glitch Free Clock Gating - verilog good clock gating
  • ►  December (1)
  • ►  October (1)
  • ►  August (1)
  • ►  March (2)
  • ►  February (4)
  • ►  January (5)
  • ►  December (4)
  • ►  August (3)
  • ►  July (3)
  • ►  May (1)
  • ►  April (6)
  • ►  March (7)
  • ►  February (7)
  • ►  January (3)
  • ►  October (7)
  • ►  September (6)
  • ►  August (8)
  • ►  June (6)
  • ►  May (3)
  • ►  January (7)
  • ►  August (5)
  • ►  July (12)
  • ►  June (18)
  • ►  May (5)
  • ►  April (7)
  • ►  March (4)
  • ►  February (9)
  • ►  January (4)
  • ►  December (7)
  • ►  November (19)

Source Code & Discussion Forum

Popular posts.

' border=

  • VERILOG TIMESCALE - TIMEFORMAT - EXAMPLE In this post, let us see the timescale feature and system tasks that are available in Verilog HDL with brief examples. `timescale ...

ED on Mobile!!!

perl regular expression assignment

  • Skip to main content
  • Skip to search
  • Skip to select language
  • Sign up for free
  • Português (do Brasil)

Regular expressions

Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp , and with the match() , matchAll() , replace() , replaceAll() , search() , and split() methods of String . This chapter describes JavaScript regular expressions. It provides a brief overview of each syntax element. For a detailed explanation of each one's semantics, read the regular expressions reference.

Creating a regular expression

You construct a regular expression in one of two ways:

  • Using a regular expression literal, which consists of a pattern enclosed between slashes, as follows: js const re = / ab + c / ; Regular expression literals provide compilation of the regular expression when the script is loaded. If the regular expression remains constant, using this can improve performance.
  • Or calling the constructor function of the RegExp object, as follows: js const re = new RegExp ( "ab+c" ) ; Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

Writing a regular expression pattern

A regular expression pattern is composed of simple characters, such as /abc/ , or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/ . The last example includes parentheses, which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in Using groups .

Using simple patterns

Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character combinations in strings only when the exact sequence "abc" occurs (all characters together and in that order). Such a match would succeed in the strings "Hi, do you know your abc's?" and "The latest airplane designs evolved from slabcraft." . In both cases the match is with the substring "abc" . There is no match in the string "Grab crab" because while it contains the substring "ab c" , it does not contain the exact substring "abc" .

Using special characters

When the search for a match requires something more than a direct match, such as finding one or more b's, or finding white space, you can include special characters in the pattern. For example, to match a single "a" followed by zero or more "b" s followed by "c" , you'd use the pattern /ab*c/ : the * after "b" means "0 or more occurrences of the preceding item." In the string "cbbabbbbcdebc" , this pattern will match the substring "abbbbc" .

The following pages provide lists of the different special characters that fit into each category, along with descriptions and examples.

Assertions include boundaries, which indicate the beginnings and endings of lines and words, and other patterns indicating in some way that a match is possible (including look-ahead, look-behind, and conditional expressions).

Distinguish different types of characters. For example, distinguishing between letters and digits.

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.

Indicate numbers of characters or expressions to match.

If you want to look at all the special characters that can be used in regular expressions in a single table, see the following:

Special characters in regular expressions.
Characters / constructs Corresponding article
, , , , , , , , , , , , , , , , , , , , |

, , , , , , ,

), , ), ,

*, +, ?, { }, { ,}, { , }

Note: A larger cheat sheet is also available (only aggregating parts of those individual articles).

If you need to use any of the special characters literally (actually searching for a "*" , for instance), you must escape it by putting a backslash in front of it. For instance, to search for "a" followed by "*" followed by "b" , you'd use /a\*b/ — the backslash "escapes" the "*" , making it literal instead of special.

Similarly, if you're writing a regular expression literal and need to match a slash ("/"), you need to escape that (otherwise, it terminates the pattern). For instance, to search for the string "/example/" followed by one or more alphabetic characters, you'd use /\/example\/[a-z]+/i —the backslashes before each slash make them literal.

To match a literal backslash, you need to escape the backslash. For instance, to match the string "C:\" where "C" can be any letter, you'd use /[A-Z]:\\/ — the first backslash escapes the one after it, so the expression searches for a single literal backslash.

If using the RegExp constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the same expression, which searches for "a" followed by a literal "*" followed by "b".

If escape strings are not already part of your pattern you can add them using String.prototype.replace() :

The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches. It is explained in detail below in Advanced Searching With Flags .

Why isn't this built into JavaScript? There is a proposal to add such a function to RegExp.

Using parentheses

Parentheses around any part of the regular expression pattern causes that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use. See Groups and backreferences for more details.

Using regular expressions in JavaScript

Regular expressions are used with the RegExp methods test() and exec() and with the String methods match() , matchAll() , replace() , replaceAll() , search() , and split() .

Method Description
Executes a search for a match in a string. It returns an array of information or on a mismatch.
Tests for a match in a string. It returns or .
Returns an array containing all of the matches, including capturing groups, or if no match is found.
Returns an iterator containing all of the matches, including capturing groups.
Tests for a match in a string. It returns the index of the match, or if the search fails.
Executes a search for a match in a string, and replaces the matched substring with a replacement substring.
Executes a search for all matches in a string, and replaces the matched substrings with a replacement substring.
Uses a regular expression or a fixed string to break a string into an array of substrings.

When you want to know whether a pattern is found in a string, use the test() or search() methods; for more information (but slower execution) use the exec() or match() methods. If you use exec() or match() and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp . If the match fails, the exec() method returns null (which coerces to false ).

In the following example, the script uses the exec() method to find a match in a string.

If you do not need to access the properties of the regular expression, an alternative way of creating myArray is with this script:

(See Using the global search flag with exec() for further info about the different behaviors.)

If you want to construct the regular expression from a string, yet another alternative is this script:

With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.

Results of regular expression execution.
Object Property or index Description In this example
The matched string and all remembered substrings.
The 0-based index of the match in the input string.
The original string.
The last matched characters.
The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in .)
The text of the pattern. Updated at the time that the regular expression is created, not executed.

As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:

However, if you have this script:

The occurrences of /d(b+)d/g in the two statements are different regular expression objects and hence have different values for their lastIndex property. If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.

Advanced searching with flags

Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.

Flag Description Corresponding property
Generate indices for substring matches.
Global search.
Case-insensitive search.
Allows and to match next to newline characters.
Allows to match newline characters.
"Unicode"; treat a pattern as a sequence of Unicode code points.
An upgrade to the mode with more Unicode features.
Perform a "sticky" search that matches starting at the current position in the target string.

To include a flag with the regular expression, use this syntax:

Note that the flags are an integral part of a regular expression. They cannot be added or removed later.

For example, re = /\w+\s/g creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.

You could replace the line:

and get the same result.

The m flag is used to specify that a multiline input string should be treated as multiple lines. If the m flag is used, ^ and $ match at the start or end of any line within the input string instead of the start or end of the entire string.

Using the global search flag with exec()

RegExp.prototype.exec() method with the g flag returns each match and its position iteratively.

In contrast, String.prototype.match() method returns all matches at once, but without their position.

Using unicode regular expressions

The u flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. An important feature that's enabled in unicode mode is Unicode property escapes . For example, the following regular expression might be used to match against an arbitrary unicode "word":

Unicode regular expressions have different execution behavior as well. RegExp.prototype.unicode contains more explanation about this.

Note: Several examples are also available in:

  • The reference pages for exec() , test() , match() , matchAll() , search() , replace() , split()
  • The guide articles: character classes , assertions , groups and backreferences , quantifiers

Using special characters to verify input

In the following example, the user is expected to enter a phone number. When the user presses the "Check" button, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number. If the number is invalid, the script informs the user that the phone number is not valid.

The regular expression looks for:

  • the beginning of the line of data: ^
  • followed by three numeric characters \d{3} OR | a left parenthesis \( , followed by three digits \d{3} , followed by a close parenthesis \) , in a non-capturing group (?:)
  • followed by one dash, forward slash, or decimal point in a capturing group ()
  • followed by three digits \d{3}
  • followed by the match remembered in the (first) captured group \1
  • followed by four digits \d{4}
  • followed by the end of the line of data: $

An online tool to learn, build, & test Regular Expressions.

An online regex builder/debugger

An online interactive tutorials, Cheat sheet, & Playground.

An online visual regex tester.

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

How to use Regex in a While If statement? Perl

I'm new to programming and I've run into an issue. We have to use Perl to write a script that opens a file, then loops through each line using a Regex - then print out the results. The opening of the file and the loop I have, but I can't figure out how to implement the Regex. It outputs 0 matched results, when the assignment outline suggests the number to be 338. If I don't use the Regex, it outputs 2987, which is the total number of lines - which is correct. So there's something incorrect with the Regex I just can't figure out. Any help would be greatly appreciated!

Here's what I have thus far:

GMB's user avatar

  • if ($filename=~ /(sshd)/) should be if (/sshd/) –  GMB Commented Jan 23, 2020 at 13:48
  • Wow, thanks! So you don't have to use a regex command in perl? Every resource I used to try and find an answer used "=~" following the Regex, that's not always necessary? –  Austin Commented Jan 23, 2020 at 13:58
  • 1 That's not what he meant. There are two operators involved in a regex match. /foo/ is an abbreviation for m/foo/ , the m// operator which does a regex match (as opposed to s/// which does a substitution, or tr/// which doesn't have to do with regex at all). The second operator that works in tandem with that is =~ , which binds the operation to a string it will operate on. If there is no string bound to it, it will operate on $_ , which is also what while(<>) assigns to if you don't assign it to something else (but you should). –  Grinnz Commented Jan 23, 2020 at 15:43

2 Answers 2

Consider this piece of code of yours:

You are indeed looping through the file lines, but you keep checking if the file name matches your regex. This is clearly not what you intend.

Parentheses around the regex seem superfluous (they are meat to capture, while you are only matching).

Since expression while (<fh>) assigns the content of the line to special variable $_ (which is the default argument for regexp matching), this can be shortened as:

  • Also perl -lne '$i++ if /sshd/ }{ print $i' C:\Users\sample.log.txt Or use END{} rather than eskimo kiss. –  stevesliva Commented Jan 23, 2020 at 14:03
  • And that's equivalent to grep -c sshd C:\Users\sample.log.txt ... but I assume this perl code will grow. –  stevesliva Commented Jan 23, 2020 at 14:05
  • 1 @stevesliva: indeed! But OP does not seem to be looking for a one-liner here, so I provided an answer as a perl script. –  GMB Commented Jan 23, 2020 at 14:07

OP code has some errors which I've correcte

Polar Bear's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged regex perl or ask your own question .

  • Featured on Meta
  • Upcoming sign-up experiments related to tags
  • The return of Staging Ground to Stack Overflow
  • Should we burninate the [lib] tag?
  • Policy: Generative AI (e.g., ChatGPT) is banned
  • What makes a homepage useful for logged-in users

Hot Network Questions

  • Simple Container Class
  • How do I get my D&D group to engage to a minimum
  • Would a spaceport on Ceres make sense?
  • Idiom for a situation where a problem has two simultaneous but unrelated causes?
  • What stops a plane from rolling when the ailerons are returned to their neutral position?
  • What does this symbol do? Box with 1
  • How can these passive RLC circuits change a sinusoid's frequency?
  • How many different kinds of fairy units?
  • A 90s (maybe) made-for-TV movie (maybe) about a group of trainees on a spaceship. There is some kind of emergency and all experienced officers die
  • Co-authors with little contribution
  • What is the original source of this Sigurimi logo?
  • Do IDE data lines need pull-up resistors?
  • What is this weapon used in The Peacemaker?
  • If the slope of a secant is always irrational, is the function linear?
  • Could space habitats have large transparent roofs?
  • Cancellation of the Deutschlandticket
  • How to Control StringContainsQ
  • Folk stories and notions in mathematics that are likely false, inaccurate, apocryphal, or poorly founded?
  • How can I take apart a bookshelf?
  • Separate unidirectional streams vs single bidirectional stream
  • Should mail addresses for logins be stored hashed to minimize impact of data loss?
  • Eye Floaters Optics
  • Have children's car seats not been proven to be more effective than seat belts alone for kids older than 24 months?
  • Collaborators write their departments for my (undergraduate) affiliation

perl regular expression assignment

COMMENTS

  1. perl

    I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it? I want to essentially combine lines 2 and 3 in a single line of co...

  2. perlre

    This page describes the syntax of regular expressions in Perl. If you haven't used regular expressions before, a tutorial introduction is available in perlretut. ... The additional state of being matched with zero-length is associated with the matched string, and is reset by each assignment to pos().

  3. Perl

    The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as sed, grep, and awk. The basic method for applying a regular expression is to use the pattern binding operators =~ and ! ~. The first operator is a test and assignment operator.

  4. perlre

    This page describes the syntax of regular expressions in Perl. For a description of how to use regular expressions in matching operations, plus various examples of the same, see ... assertions inside the same regular expression. The above assignment to $^R is properly localized, thus the old value of $^R is restored if the assertion is ...

  5. Perl regular expressions

    Modifiers that alter the way a regular expression is used by Perl are detailed in perlop/``Regexp Quote-Like Operators'' and perlop/``Gory details of parsing quoted constructs''. i ... assertions inside the same regular expression. The assignment to $^R above is properly localized, so the old value of $^R is restored if the assertion is ...

  6. perlrequick

    Simple word matching. The simplest regex is simply a word, or more generally, a string of characters. A regex consisting of a word matches any string that contains that word: "Hello World" =~ /World/; # matches. In this statement, World is a regex and the // enclosing /World/ tells Perl to search a string for a match.

  7. PERL Regular Expressions

    The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as sed, ... The first operator is a test and assignment operator. There are three regular expression operators within Perl. Match Regular Expression - m// Substitute Regular Expression - s///

  8. Regular Expressions

    A regular expression is a string of characters that defines a text pattern or patterns. A regexp can be used in a number of ways: Searching for a string that matches a specified pattern and optionally replacing the pattern found with some other strings. Counting the number of occurences of a pattern in a string.

  9. Regex

    Solution: hexa, octal, binary. Exercise: Roman numbers. 10. Regular Expressions - part 3. m/ for matching regexes. Case insensitive regexes using /i. multiple lines in regexes using /m. Single line regexes using /s. /x modifier for verbose regexes.

  10. PDF Regular Expressions in Perl

    • In Perl, we can use regular expressions to match (parts of) strings • This is done with the =~ operator • This operator evaluates to true if the expression matches the string and false otherwise • Note that the text between the / and / is processed as a double-quoted string

  11. Perl

    Regular Expression (Regex or RE) in Perl is when a special string describing a sequence or the search pattern in the given string. An Assertion in Regular Expression is when a match is possible in some way. The Perl's regex engine evaluates the given string from left to right, searching for the match of the sequence, and when we found the match seq

  12. Perl, Assign to variable from regex match

    Perl regular expression variables and matched pattern substitution. 3. Assigning the result of Perl regex operation to a second variable. 2. Regex to variable assignment fails in Perl. 38. Use variable as RegEx pattern. 2. Perl - Regex and matching variables. 0. Perl special variables for regex matches. 1.

  13. PDF Regular Expressions

    If a match is found for pattern1 within a referenced string (default $_), the relevant substring is replaced by the contents of pattern2, and the expression returns true. Modifiers: e, g, i, m, o, s, x. Transliteration - tr/// or y///. Syntax: tr/pattern1/pattern2/ y/pattern1/pattern2/. If any characters in pattern1 match those within a ...

  14. Perl 5 Regex Cheat sheet

    When learning regexes, or when you need to use a feature you have not used yet or don't use often, it can be quite useful to have a place for quick look-up. I hope this Regex Cheat-sheet will provide such aid for you. Introduction to regexes in Perl. a Just an 'a' character. Any character except new-line.

  15. perlrequick

    The simplest regex is simply a word, or more generally, a string of characters. A regex consisting of a word matches any string that contains that word: "Hello World" =~ /World/; # matches. In this statement, World is a regex and the // enclosing /World/ tells perl to search a string for a match. The operator =~ associates the string with the ...

  16. Perl

    Regular Expression (Regex or Regexp or RE) in Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to the host language and is not the same as in PHP, Python, etc. Sometimes it is termed as "Perl 5 Compatible Regular Expressions".To use the Regex, Binding operators like '=~'(Regex Operator) and '!~' (Negated Regex Operator) are ...

  17. Perl Regular expression

    In this post, Perl regex is illustrated with examples. The basic method for applying a regular expression is to use the pattern binding operators =~ and !~. The first operator is a test and assignment operator. The forward slashes in each case act as delimiters for the regular expression (regex) that you are specifying.

  18. perlre

    This page describes the syntax of regular expressions in Perl. For a description of how to use regular expressions in matching operations, plus various examples of the same, ... assertions inside the same regular expression. The assignment to $^R above is properly localized, so the old value of $^R is restored if the assertion is backtracked; ...

  19. Find "assignment" (=) operator in a string with regular expression

    (note: I'm using perl but can turn around formatting from many regex patterns if you have another favorite) I'm looking for the assignment operator in strings (code). It doesn't have the be the world's most robust, but needs to be better than "go find the first =".

  20. Regular expressions

    Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp, and with the match(), matchAll(), replace(), replaceAll(), search(), and split() methods of String. This chapter describes JavaScript regular expressions. It provides a brief overview of each ...

  21. regex

    I should explain as background to this question that I don't know any Perl, and have a violent allergy to regular expressions (we all have our weaknesses). I'm trying to figure out why a Perl program won't accept the data I'm feeding it. I don't need to understand this program in any depth - I'm just doing a timing comparison.

  22. How to use Regex in a While If statement? Perl

    We have to use Perl to write a script that opens a file, then loops through each line using a Regex - then print out the results. The opening of the file and the loop I have, but I can't figure out how to implement the Regex. It outputs 0 matched results, when the assignment outline suggests the number to be 338.