LINQ – don’t be afraid of Aggregate

Aggregate is one of the most fun and powerful methods in LINQ. Sadly it’s also one of the most underused and “scary” ones. I hope that after reading this post you will understand Aggregate a bit more, know when to use it and won’t be afraid of doing so.

So let’s have a look at MSDN. You can find three aggregate method overloads and if you don’t have experience with this method not single one of them seems simple. Let’s see first one:

public static TSource Aggregate<TSource>(
	this IEnumerable<TSource> source,
	Func<TSource, TSource, TSource> func
)

public static TSource Aggregate<TSource>(

this IEnumerable<TSource> source,

Func<TSource, TSource, TSource> func

)

It looks everything but simple and it’s least complex one. As you can see it take an IEnumerable of TSource, and delegate accepting two parameters of TSource and returning TSource. Entire method will return TSource. So what exactly going on under the hood?

As method name suggests Aggregate will aggregate objects stored in some kind of collection and return a result of said aggregation which is single object. Actually we have methods that doing exactly the same but in more fixed and strict manner. Those method are in example Sum or Average and are commonly known and used. Role of Aggregate is exactly same, to produce single result based on input collection. Let’s “recreate” Sum method, and for now let’s use simple foreach loop to do that.

var collection = new int[] { 2, 4, 8, 16, 32 };

var sum = 0;
foreach (var x in collection)
    sum += x;

Console.WriteLine(sum); //62

var collection = new int[] { 2, 4, 8, 16, 32 };

var sum = 0;

foreach (var x in collection)

sum += x;

Console.WriteLine(sum); //62

This is pretty simple example so let’s do the same with Aggregate.

var collection = new int[] { 2, 4, 8, 16, 32 };

var result = collection.Aggregate((sum, x) => sum + x); //((((2 + 4) + 8) + 16) + 32)

Console.WriteLine(result); //62

var collection = new int[] { 2, 4, 8, 16, 32 };

var result = collection.Aggregate((sum, x) => sum + x); //((((2 + 4) + 8) + 16) + 32)

Console.WriteLine(result); //62

Example above does exactly the same as previous foreach loop. Take a closer look to variables names in those examples. There is a reason that they are exact same, it’s because their roles match. In both examples x is collection element and sum is variable that accumulates value through all iterations (and it’s offten called accumulator).

And during every iteration we have Func delegate that is being invoked. What role does it have and what does it’s parameters represent? It’s rather simple because first one is said accumulator, and second one is enumerator. What happens under the hood? Just take a look at math expression in comment just aside Aggregate method. The real question is, where does accumulator comes from on the first run, after all it weren’t initialized anywhere. So lets modify our aggregate a bit and print some values:

var result = collection.Aggregate((sum, x) => {
    Console.WriteLine($"Sum: {sum}");
    Console.WriteLine($"X: {x}");
    return sum + x;
    });

var result = collection.Aggregate((sum, x) => {

Console.WriteLine($"Sum: {sum}");

Console.WriteLine($"X: {x}");

return sum + x;

});

And our console will give us something like that:

Sum: 2
X: 4
Sum: 6
X: 8
Sum: 14
X: 16
Sum: 30
X: 32
62

Sum: 2

X: 4

Sum: 6

X: 8

Sum: 14

X: 16

Sum: 30

X: 32

Which shows us that accumulator value in first iteration is in fact first element in collection, which is being skipped during delegate execution. This behavior can be undesirable at many times and because of that we may want to use second overload which in my opinion show true power of Aggregate method and is proably my favourite and I use it most offten.

public static TAccumulate Aggregate<TSource, TAccumulate>(
	this IEnumerable<TSource> source,
	TAccumulate seed,
	Func<TAccumulate, TSource, TAccumulate> func
)

public static TAccumulate Aggregate<TSource, TAccumulate>(

this IEnumerable<TSource> source,

TAccumulate seed,

Func<TAccumulate, TSource, TAccumulate> func

)

In this overload we have new parameter called seed and generics is slightly different (return type and func parameter and return type). New parameter is nothing more than introduced previously accumulator, so now we can do something like that.

var collection = new int[] { 2, 4, 8, 16, 32 };

var sum = 0;
var result = collection.Aggregate(sum, (accumulator, x) => accumulator + x);

Console.WriteLine(result); //62

var collection = new int[] { 2, 4, 8, 16, 32 };

var sum = 0;

var result = collection.Aggregate(sum, (accumulator, x) => accumulator + x);

Console.WriteLine(result); //62

Why would we wan to do that? In above example it gives us absolutely nothing. So let’s change our example just a bit. Remember when I’ve said about changed generics? Let’s use that and create accumulator of different type than collection element.

var collection = new string[] { "Aggregate", " ", "is", " ", "fun", "!" };

var result = collection.Aggregate(new StringBuilder(), (sbAccumulator, x) => sbAccumulator.Append(x));

Console.WriteLine(result);

var collection = new string[] { "Aggregate", " ", "is", " ", "fun", "!" };

var result = collection.Aggregate(new StringBuilder(), (sbAccumulator, x) => sbAccumulator.Append(x));

Console.WriteLine(result);

And before we move to third and final overload of this great method let’s think a bit. What exactly can we gain with using Aggregate over simple loops? We can assembly one single object from few other object basing on most desirable parameters, we can select “best” object from collection or do more complex aggregations. And the most fun reason? I’m saving it for end of this post as a tease of next one.

So let’s move to next example and last, most complex overload.

public static TResult Aggregate<TSource, TAccumulate, TResult>(
	this IEnumerable<TSource> source,
	TAccumulate seed,
	Func<TAccumulate, TSource, TAccumulate> func,
	Func<TAccumulate, TResult> resultSelector
)

public static TResult Aggregate<TSource, TAccumulate, TResult>(

this IEnumerable<TSource> source,

TAccumulate seed,

Func<TAccumulate, TSource, TAccumulate> func,

Func<TAccumulate, TResult> resultSelector

)

Just have a look at that. . my head hurts of just looking at that. We have new parameter and new generic type. Our generic types are now collection element type, accumulator type and return type, andnew parameter is nothing more that delegate that projects ou accumulator to result type. You can even call constructor that accepts accumulator type as parameter.

As for last example imagine situation where you have Checkout method in Cart class that accepts list of products and sums prices and stores them in Value variable of CheckoutHelper. Our CheckoutHelper also calculates discount that is some % of entire value except value of any bought alcohol products. Aggregate is perfect solution for problems like that.

public Order Checkout()
{
    return Products.Aggregate
        (
            seed: new CheckoutHelper(this, Customer),
            func: (accumulator, product) =>
             {
                 accumulator.Value += product.Price;

                 if (product.Category == Category.Alcohol)
                     accumulator.NonDiscountableValue += product.Price;

                 return accumulator;
             },
            resultSelector: accumulator => new Order(accumulator)
        );
}

public Order Checkout()

{

return Products.Aggregate

(

seed: new CheckoutHelper(this, Customer),

func: (accumulator, product) =>

{

accumulator.Value += product.Price;

if (product.Category == Category.Alcohol)

accumulator.NonDiscountableValue += product.Price;

return accumulator;

resultSelector: accumulator => new Order(accumulator)

);

}

Of course this example is rather simple but the fact is Aggregate methods can grow and become rather big and unreadable with all those lambdas, especially if someone likes to name lambad parameters as x, y and z. Personally I like to wrap Aggregate in other methods (preferrably LINQ extensions) with names that means something like Checkout, MaxBy, CountIf or something like that. It is much easier to read that way and someone won’t be forced to wonder what all of those lines of code supposed to do. It’s also much more reusable and could be made generic which can be nice way to prepare extensions like mentioned simple MaxBy (TBH MoreLINQ library have excelent implementation of method like that and you can take a look at it here).

public static TSource MaxBy<TSource, TCompareBy> (this IEnumerable<TSource> source, Func<TSource, TCompareBy> compareBy, IComparer<TCompareBy> comparer = null)
{
    comparer = comparer ?? Comparer<TCompareBy>.Default;
    return source.Aggregate((bestElement, x) => comparer.Compare(compareBy(bestElement), compareBy(x)) > 0 ? bestElement : x );
}

public static TSource MaxBy<TSource, TCompareBy> (this IEnumerable<TSource> source, Func<TSource, TCompareBy> compareBy, IComparer<TCompareBy> comparer = null)

{

comparer = comparer ?? Comparer<TCompareBy>.Default;

return source.Aggregate((bestElement, x) => comparer.Compare(compareBy(bestElement), compareBy(x)) > 0 ? bestElement : x );

}

Remember when i wrote that I’ll tell you one more reason for using Aggregate? This reason is yet another method that can be used in perfect synergy with Aggregate, and I’ll write entire post dedicated for this synergy soon. I’m talking about AsParallel LINQ method which allow us parallel aggregations with just AsParallel and a bit of forethought while writing our Aggregate methods. If you don’t want to miss this post please follow me on Twitter or Facebook..