Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing expander regex #11210

Merged
merged 3 commits into from
Jan 3, 2025
Merged

changing expander regex #11210

merged 3 commits into from
Jan 3, 2025

Conversation

SimaTian
Copy link
Member

@SimaTian SimaTian commented Jan 2, 2025

to use code generated version instead of compiled one where applicable according to this .NET article.
This nets a small but visible performance gain with only a minor code update.

@SimaTian SimaTian requested a review from JanKrivanek January 2, 2025 15:44
@JanKrivanek
Copy link
Member

Contributes to #7598

Copy link
Member

@JanKrivanek JanKrivanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

If you have any measurements you run in your env - feel free to attach here and brag more

src/Build/Evaluation/Expander.cs Outdated Show resolved Hide resolved
@YuliiaKovalova
Copy link
Member

It looks good!
I have found other places with Regex usage in MSBuild:
https://github.com/search?q=repo%3Adotnet%2Fmsbuild%20%20Lazy%3CRegex%3E&type=code

Do you plan to adapt them too to the new practice?

@SimaTian
Copy link
Member Author

SimaTian commented Jan 3, 2025

It looks good! I have found other places with Regex usage in MSBuild: https://github.com/search?q=repo%3Adotnet%2Fmsbuild%20%20Lazy%3CRegex%3E&type=code

Do you plan to adapt them too to the new practice?

Thanks.
I will take a look. The new pattern is easily applicable only when the pattern is known at a compile time. E.g. constant or with a limited enough pool of options to be reasonably enumerated by hand so it might not be as straightforward.
I started with this one as it is used quite a bit within the context of the expander.

I noticed this one as it was visible in profiler, that being said, unifying the rest is a good idea - I will see what I can do.
before:
regex-before

after:
regex-after

so the impact in context of MSBUild is in the range of ~0.1%, but still visible.
The speedup is much more visible when doing targeted testing:

static partial class Test
{
    internal const string itemTypeOrMetadataNameSpecification = @"[A-Za-z_][A-Za-z_0-9\-]*";

    // the portion of an item transform that is the function that we wish to execute on the item
    internal const string itemFunctionNameSpecification = @"[A-Za-z]*";

    private const string ItemMetadataSpecification = @"%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>" + itemTypeOrMetadataNameSpecification + @")\s*\.\s*)? (?<NAME>" + itemTypeOrMetadataNameSpecification + @") \s*\)";
    
    [GeneratedRegex(ItemMetadataSpecification)]
    private static partial Regex GeneratedPattern();

    public static Regex ItemMetadataPattern = new Regex(ItemMetadataSpecification,
                   RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled );

    public static string benchmark_regex(string expression)
    {
        return Test.ItemMetadataPattern.Replace(expression, new MatchEvaluator(matchEvaluator.ExpandSingleMetadata));
        //return Test.GeneratedPattern().Replace(expression, new MatchEvaluator(matchEvaluator.ExpandSingleMetadata));
    }
}

old, compiled regex:
regex-compiled

new, generated regex:
regex-generated

Now my test strings for matching are not all that representative so this targeted illustration isn't pefect:

string[] tests = { "some random string", "%(test.test)", "%(test.test.test)%(test.test)%(test.test)" , "asdfa.sdfasdfa", "%dasfsd.2dsfa", "%(dasfsd2.dsfa)" };
var match = 0;
for (int i = 0; i != 10000000; i++)
{
    foreach (var test in tests)
    {
        //Console.WriteLine(Test.benchmark_regex(test));
        if (Test.benchmark_regex(test).StartsWith("y", StringComparison.Ordinal));
        {
            match++;
        }
    }
}

Namely in Expander, we're already doing some fail-soon pre-scanning so some of the edge cases might be slightly different.

I'm pondering how much more we could gain by tweaking the Regex.Replace itself, but that is in the realm of speculation.

@SimaTian SimaTian merged commit 89b8461 into main Jan 3, 2025
10 checks passed
@SimaTian SimaTian deleted the regex-codegen-addition branch January 3, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants