Signup/Sign In

C# Regular Expressions

Posted in Programming   LAST UPDATED: JANUARY 5, 2020

    In this article, we will discuss the regular expressions in C# programming language. As we know, day by the amount of data is rapidly increasing in terms of log files, documents, plain text files, database files, etc. and finding the useful information from the massively generated data is a tedious task. Also, finding information using various if-else and nested if-else loops is not a good idea in terms of time complexity. The regular expression is the pattern technique to solve the problems like, to find the occurrences of repetitive word/sentence in a file, to check the email address is in a valid format, to check the date format, etc.


    Introduction to C# Regular Expressions:

    The regular expression is also referred to as regex or RE and it is useful to extract the information from the text given in a plain text file, normal input text, log files, etc. by applying the proper patterns. A pattern may consist of alphabets, literals, numbers, special characters, operators, etc and starts with a @ verbatim string literal. The regular expression works on a character basis means extracting the information by matching each character. For example, checking the format of an email address. As we know, the email address contains alphabets, digits, special characters (sample pattern looks like [a-zA-Z0-9@._-]). This pattern accepts lower and upper case alphabets, digits, and special characters. If we try to give the Chinese characters to this pattern, then it won't accept because the pattern doesn’t contain any Chinese characters (it is considered as non-ASCII characters). Note: ASCII stands for American Standard Code for Information Interchange, and this is a standard set of characters understood by all computers.

    C# supports regular expressions through the classes in the System.Text.RegularExpressions namespace. This namespace provides regular expression functionality to perform matches and extract information from text. Following is the list of some characters and quantifiers, and its description.


    Characters Description
    . matches any character except newline
    ^ beginning of string
    $ end of string
    * match 0 or more times
    + match 1 or more times
    ? match 0 or 1 time
    | alternative
    ( ) grouping
    [ ] set of characters
    { } repetition modifier
    \ quote or special

    List of other characters

    Characters Description
    \b matches a word boundary where a word character is [a-zA-Z0-9_]
    \d matches a digit, from 0 to 9 [0-9]
    \s matches a whitespace character, that is a space, tab, newline, carriage return. [\t\n\r]
    \w matches a word character (alphanumeric or _) [0-9a-zA-Z_]
    \D matches any non-digits [^0-9]
    \S matches any non-whitespace character [^\t\n\r]
    \S+ matches several non-whitespace characters
    \W matches any non-word character [^0-9a-zA-Z_]

    List of quantifiers

    Quantifiers Description
    a* The occurrence of zero or more a’s
    a+ The occurrence of one or more a’s
    a? The occurrence of zero or one a
    a{m} The occurrence of exactly m a’s
    a{m,} The occurrence of at least m a’s
    a{m,n} The occurrence of at least m but at most n a’s

    Let's see the demonstration of a regular expression with predefined classes,

    Filename: Program.cs

    using System;
    using System.Text.RegularExpressions;
    
    namespace Studytonight
    {
        public class Program
        {
            public static void Main(string[] args)
            {
                Regex r = new Regex(@"\d+");
                Match m = r.Match("100 extracting the digits 99");
                if(m.Success)
                {
                    Console.WriteLine("Found digits "+m.Value);
                }
            }
        }
    }

    Output:

    Found digits 100

    In the above example, we have created an object of Regex class and defined a pattern to get only digits from a string. As we know that, the string contains two numbers 100 and 99, and with the help of the Match method, we are finding all the matches. When the m.Success property is true, the output will be the digits found in a specified string. Hence in our case, the output is 100 as it is found first in the given string. In the next example, we will see how to get the next match from a string.


    Let's take another example with NextMatch method,

    Filename: Program.cs

    using System;
    using System.Text.RegularExpressions;
    
    namespace Studytonight
    {
        public class Program
        {
            public static void Main(string[] args)
            {
                string s = "100 extracting the digits 99";
                Match m = Regex.Match(s, @"\d+");
                if(m.Success)
                {
                    Console.WriteLine("Found digits "+m.Value);
                    Console.WriteLine("Found next digits "+(m.NextMatch()).Value);
                }
            }
        }
    }

    Output:

    Found digits 100
    Found next digits 99

    In the above example, we defined a string which contains two numbers 100 and 99, and in the Match method, we have provided two parameters, the original string and the regex pattern to extract only digits from the specified string. Also, the first line only prints the single match found in a string which is 100, and the NextMatch method returns the next digits found in the given string. By any chance, if the next digit is not found in the given string, then the output will be nothing or just blank.


    Let's take an example with the IsMatch method,

    Filename: Program.cs

    using System;
    using System.Text.RegularExpressions;
    
    namespace Studytonight
    {
        public class Program
        {
            static bool IsValidString(string s)
            {
                return Regex.IsMatch(s, @"[a-zA-Z]");
            }
            public static void Main(string[] args)
            {
                Console.WriteLine(IsValidString("Studytonight"));
                Console.WriteLine(IsValidString("49-99"));
            }
        }
    }

    Output:

    True
    False

    In the above example, we have defined an IsValidString method, where we are accepting a string parameter value and passing it to the IsMatch method. The output will be a boolean value, true if the regular expression finds a match; otherwise, false. In our case, the first string gets matched with the specified pattern, as it contains only alphabets, and the second string contains digits and character so the output is false for the second string.


    Let's take the last example on regex characters,

    Filename: Program.cs

    using System;
    using System.Text.RegularExpressions;
    
    namespace Studytonight
    {
        public class Program
        {
            public static void Main(string[] args)
            {
                string s = "Studytonight";
                if(Regex.IsMatch(s,"^study",RegexOptions.IgnoreCase))
                {
                    Console.WriteLine("String start matched");
                }
                else if(Regex.IsMatch(s,"night$"))
                {
                    Console.WriteLine("String end matched");
                }
                else
                {
                    Console.WriteLine("String not matched");
                }
            }
        }
    }

    Output:

    String start matched

    In the above example, we are trying to match the starting and ending characters of a specified string with the help of the regex characters. The ^ char is used to match the starting characters, whereas $ is used to match the ending characters. If you closely observe the defined string s whose value is Studytonight starting with uppercase alphabet S. In the first if control statement, we are matching the starting characters of a string by ignoring the case like whether the string is in uppercase or lowercase. Hence the first if block got executed.

    Important note: If you remove the IgnoreCase property, then the next else if block will get executed because the given string started with the uppercase alphabet and if block requires all lowercase characters to get satisfied. Also, the if block and the else if block is having correct conditions to get executed, but as we know the control statements only execute one block with the first satisfied condition.




    Conclusion:

    We hope this article helped you to understand how we can use regular expressions in C# language. If you have any queries, then please let us know in the comment section. We are happy to solve your doubts.

    About the author:
    Subject Matter Expert of C# Programming at Studytonight.
    Tags:C# TutorialC# Regular ExpressionC# RegexC#
    IF YOU LIKE IT, THEN SHARE IT
     

    RELATED POSTS